CS8791 Notes
CS8791 Notes
Computing as a service has seen a phenomenal growth in recent years. The primary motivation for this
growth has been the promise of reduced capital and operating expenses, and the ease of dynamically scaling and
deploying new services without maintaining a dedicated compute infrastructure. Hence, cloud computing has
begun to rapidly transform the way organizations view their IT resources. From a scenario of a single system
consisting of single operating system and single application, organizations have been moving into cloud
computing, where resources are available in abundance and the user has a wide range to choose from. Cloud
computing is a model for enabling convenient, on-demand network access to a shared pool of configurable
computing resources that can be rapidly provisioned and released with service provider interaction or minimal
management effort.
Cloud computing consists of three distinct types of computing services delivered remotely to clients via the
internet. Clients typically pay a monthly or annual service fee to providers, to gain access to systems that deliver
software as a service, platforms as a service and infrastructure as a service to subscribers. Clients who subscribe to
cloud computing services can reap a variety of benefits, depending on their particular business needs at a given
point in time. The days of large capital investments in software and IT infrastructure are now a thing of the past for
any enterprise that chooses to adopt the cloud computing model for procurement of IT services.
1.1.2 Types of service in cloud computing
SAAS( Software as a Service)
Saas (Software as a Service) provides clients with the ability to use software applications on a remote basis
via an internet web browser. Software as a service is also referred to as “software on demand”. Clients can access
SaaS applications from anywhere via the web because service providers host applications and their associated data
at their location. The primary benefit of SaaS, is a lower cost of use, since subscriber fees require a much smaller
investment than what is typically encountered under the traditional model of software delivery. Licensing fees,
installation costs, maintenance fees and support fees that are routinely associated with the traditional model of
software delivery can be virtually eliminated by subscribing to the SaaS model of software delivery. Examples of
SaaS include: Google Applications and internet based email applications like Yahoo! Mail, Hotmail and Gmail.
PAAS(Platform as a Service)
PaaS (Platform as a Service) provides clients with the ability to develop and publish customized
applications in a hosted environment via the web. It represents a new model for software development that is
rapidly increasing in its popularity. An example of PaaS is Salesforce.com. PaaS provides a framework for agile
software development, testing, deployment and maintenance in an integrated environment. Like SaaS, the primary
benefit of PaaS, is a lower cost of use, since subscriber fees require a much smaller investment than what is
typically encountered when implementing traditional tools for software development, testing and deployment. PaaS
providers handle platform maintenance and system upgrades, resulting in a more efficient and cost effective
solution for enterprise software development.
IAAS(Infrastructure as a Service)
IaaS (Infrastructure as a Service) allows clients to remotely use IT hardware and resources on a “pay-as-
you-go”• basis. It is also referred to as HaaS (hardware as a service). Major IaaS players include companies like
IBM, Google and Amazon.com. IaaS employs virtualization, a method of creating and managing infrastructure
resources in the “cloud”•. IaaS provides small start up firms with a major advantage, since it allows them to
gradually expand their IT infrastructure without the need for large capital investments in hardware and peripheral
systems.
Cloud computing decreases the hardware and software demand from the user’s side. The only thing that user must
be able to run is the cloud computing systems interface software, which can be as simple as Web browser, and the
Cloud network takes care of the rest. We all have experienced cloud computing at some instant of time, some of the
popular cloud services we have used or we are still using are mail services like Gmail, Hotmail or Yahoo etc.
While accessing e-mail service our data is stored on cloud server and not on our computer. The technology and
infrastructure behind the cloud is invisible. It is less important whether cloud services are based on HTTP, XML,
Ruby, PHP or other specific technologies as far as it is user friendly and functional. An individual user can connect
to cloud system from his/her own devices like desktop, laptop or mobile. Cloud computing harnesses small
business effectively having limited resources, it gives small businesses access to the technologies that previously
were out of their reach.
Pay-per-use Model: You only have to pay for the services you use, and nothing more!
24/7 Availability: It is always online! There is no such time that you cannot use your cloud service; you can use it
whenever you want.
Easily Scalable: It is very easy to scale up and down or turn it off as per customers’ needs. For instance, if your
website’s traffic increases only on Friday nights, you can opt for scaling up your servers that particular day of the
week and then scaling down for the rest of the week.
Security: Cloud computing offers amazing data security. Especially if the data is mission-critical, then that data
can be wiped off from local drives and kept on the cloud only for your access to stop it ending up in wrong hands.
Easily Manageable: You only have to pay subscription fees; all maintenance, up-gradation and delivery of
services are completely maintained by the Cloud Provider. This is backed by the Service-level Agreement (SLA).
While the happening to contemporary workstation systems administration happened inside mid1970s, not
any talk of something remotely looking like a thought like "cloud computing" occurred until at long last in the
ballpark of an a few years after the fact in 8f when John Back of Sun Microsystems begat the noteworthy motto,
"The real system is The true machine." As prophetic similarly Sun was a while back, fittings (essential Figure and
systems administration) had been none, of these influential none, of these commoditized enough to attain this
vision around then. Cloud computing was still leastways a decade off. Meanwhile, Sun's Unix-delightful working
framework and servers turned into the "new iron," swapping mainframes that been around for numerous eras.
Sun's machines utilized open systems administration models, for example TCP/IP. This empowered projects
running on only one machine to identify with running systems on different models. Such requisitions ordinarily
emulated the customer server building design model. Around this time period, Sir Tim Berners-Lee prescribed the
thought connected with imparting learning spared on numerous hosting space to be made accessible to the planet
through customer machines. Documents might be placed holding hypertext, course book with metadata made up of
area data as a referrals for the thing portrayed by means of that content.
Despite the fact that the exact history of cloud computing is not the existing (the first business and
customer cloud computing administrations sites – salesforce.com in addition to Google, were presented in 1999),
the story is trussed straight to the improvement of the Internet and business engineering, since cloud computing is
the method to the issue connected with how the Internet can help fortify business innovation. Business innovation
has a drawn out and intriguing history, one that is almost as long on the grounds that business itself, yet the specific
advancements that about straight impacted the of cloud computing begin with The true rise of workstations as
suppliers of genuine business results.
The internet's "youth" Whereby it came to be clear that Arpanet was a decently major ordeal and some
huge workstation organizations were made In the nineteen seventies, the thoughts and likewise components that
had been prescribed in the 50s and 60s were being created decisively. Additionally, a large portion of the world's
greatest machine organizations were begun, and the internet was conceived. Inside 1971, Intel, set up in the
previous decade, acquainted the earth with the precise first microchip, and Apple organization architect Ray
Tomlinson composed a system that permitted individuals to send correspondences starting with one machine then
onto the next, from this point forward sending the essential message that a great deal of individuals might recognize
similarly message.
The plant seeds were being seeded for the increment of the internet. The internet's global starting Where
the Internet as a spot for both business and correspondence carried its own weight. The specific 1990s joined the
planet in an unmatched way, beginning together with Cern's ease the World Wide-cut Web for general (that is, non-
business) use in 1991. In Michael became bonkers, a browser reputed to be Mosaic permitted pictures to be
demonstrated on Internet, and private organizations were permitted to work with Internet despite anything that
might have happened before, as well. When firms were on the web, they started to assume the business potential
outcomes that accompanied having the ability to achieve the planet in an immediate, and a percentage of the
greatest players online were established.
Marc Andreessen and John Clark established Netscape in 94’ and none excessively early, since 1995
sawing machine internet activity gave up to business endeavors like Netscape. In the meantime, stalwarts of the
true internet Amazon.com and likewise eBay were started by Jeff Bezos and Pierre Omidyan, severally. The
internet's "adulthood" notwithstanding cloud computing ascent Where the dab com house of cards blasts as a
pimple notwithstanding cloud computing goes to the real fore The remainder of the nineties and starting with the
2000s were an incredible opportunity to find or put resources into an internet-based organization. Cloud computing
had the right environment to lose, as multi-leaseholder architectures, profoundly overarching fast data transfer
capacity and regular programming interoperability details were created in this specific time. Salesforce.net
appeared in late 90s and was the essential webpage to convey business requisitions from the "typical" site – what
on earth is presently called cloud computing.
Amazon.com presented Amazon Web Services in 2004. This gave clients the capacity with a specific end
goal to store information and hang a huge amount of people to work with exceptionally little obligations, (for
example Hardware Turk), around some different administrations. Face book had been established in '04,
revolutionizing the methodology clients impart and the way they store their own particular information (their
photos and film), inadvertently making the cloud an individual administration.
Most as of late, cloud computing organizations have been pondering how they might make their stock all
the more join. In 2010 Salesforce.com started the cloud-based database at Database.com planned for engineers,
denoting the advancement of could computing administrations that might be utilized on any unit, run on practically
any stage and composed in diverse modifying dialect. Obviously, the long run of the internet and cloud computing
have in prior times demonstrated hard to compute, however so farseeing as organizations strive to unite the globe
and serve in which joined planet with new ways, there'll dependably be a need for both the internet and cloud
computing.
The whole real world runs in dynamic nature i.e. many things happen at a certain time but at different
places concurrently. This data is extensively huge to manage.
Real world data needs more dynamic simulation and modeling, and for achieving the same, parallel
computing is the key.
Parallel computing provides concurrency and saves time and money.
Complex, large datasets, and their management can be organized only and only using parallel computing’s
approach.
Ensures the effective utilization of the resources. The hardware is guaranteed to be used effectively
whereas in serial computation only some part of hardware was used and the rest rendered idle.
Also, it is impractical to implement real-time systems using serial computing.
Applications of Parallel Computing:
Data bases and Data mining.
Real time simulation of systems.
Science and Engineering.
Advanced graphics, augmented reality and virtual reality.
Limitations of Parallel Computing:
It addresses such as communication and synchronization between multiple sub-tasks and processes which
is difficult to achieve.
The algorithms must be managed in such a way that they can be handled in the parallel mechanism.
The algorithms or program must have low coupling and high cohesion.
But it’s difficult to create such programs.
More technically skilled and expert programmers can code a parallelism based program well.
Future of Parallel Computing: The computational graph has undergone a great transition from serial computing
to parallel computing. Tech giant such as Intel has already taken a step towards parallel computing by employing
multicore processors. Parallel computation will revolutionize the way computers work in the future, for the better
good. With all the world connecting to each other even more than before, Parallel Computing does a better role in
helping us stay that way. With faster networks, distributed systems, and multi-processor computers, it becomes
even more necessary.
A Distributed Operating System attempts to make this architecture seamless and transparent to the user to
facilitate the sharing of heterogeneous resources in an efficient, flexible and robust manner. Its aim is to shield the
user from the complexities of the architecture and make it appear to behave like a timeshared centralized
environment.
Communication is the central issue for distributed systems as all process interaction depends on it. Exchanging
messages between different components of the system incurs delays due to data propagation, execution of
communication protocols and scheduling. Communication delays can lead to inconsistencies arising between
different parts of the system at a given instant in time making it difficult to gather global information for decision
making and making it difficult to distinguish between what may be a delay and what may be a failure.
Fault tolerance is an important issue for distributed systems. Faults are more likely to occur in distributed systems
than centralized ones because of the presence of communication links and a greater number of processing elements,
any of which can fail. The system must be capable of reinitializing itself to a state where the integrity of data and
state of ongoing computation is preserved with only some possible performance degradation.
Distributed Computing
In daily life, an individual can use a computer to work with applications such as Microsoft Word, Microsoft
PowerPoint. Complex problems may not be accomplished by using a single computer. Therefore, the single
problem can be divided into multiple tasks and distributed to many computers. These computers can communicate
with other computers through the network. They all perform similarly to a single entity. The process of dividing a
single task among multiple computers is known as distributed computing. Each computer in a distributed system is
known as a node. A set of nodes is a cluster.
In distributed computing systems, multiple system processors can communicate with each other using messages
that are sent over the network. Such systems are increasingly available these days because of the availability at low
price of computer processors and the high-bandwidth links to connect them.
The following reasons explain why a system should be built distributed, not just parallel:
Scalability: As distributed systems do not have the problems associated with shared memory, with the increased
number of processors, they are obviously regarded as more scalable than parallel systems.
Reliability: The impact of the failure of any single subsystem or a computer on the network of computers defines
the reliability of such a connected system. Definitely, distributed systems demonstrate a better aspect in this area
compared to the parallel systems.
Data sharing: Data sharing provided by distributed systems is similar to the data sharing provided by distributed
databases. Thus, multiple organizations can have distributed systems with the integrated applications for data
exchange.
Resources sharing: If there exists an expensive and a special purpose resource or a processor, which cannot be
dedicated to each processor in the system, such a resource can be easily shared across distributed systems.
Heterogeneity and modularity: A system should be flexible enough to accept a new heterogeneous processor to
be added into it and one of the processors to be replaced or removed from the system without affecting the overall
system processing capability. Distributed systems are observed to be more flexible in this respect.
Geographic construction: The geographic placement of different subsystems of an application may be inherently
placed as distributed. Local processing may be forced by the low communication bandwidth more specifically
within a wireless network.
Economic: With the evolution of modern computers, high-bandwidth networks and workstations are available at
low cost, which also favors distributed computing for economic reasons.
MapReduce
As we discussed that map reduce is really a robust framework manage large amount of data. The map reduce
framework has to involve a lot of overhead when dealing with iterative map reduce.Twister is a great framework to
perform iterative map reduce.
Additional functionality:
1.) Static and variable Data : Any iterative algorithm requires a static and variable data. Variable data are
computed with static data (Usually the larger part of both) to generate another set of variable data. The process is
repeated till a given condition and constrain is met. In a normal map-reduce function using Hadoop or DryadLINQ
the static data are loaded uselessly every time the computation has to be performed. This is an extra overhead for
the computation. Even though they remain fixed throughout the computation they have to be loaded again and
again.
Twister introduces a “config” phase for both map and reduces to load any static data that is required. Loading static
data for once is also helpful in running a long running Map/Reduce task
2.) Fat Map task : To save the access a lot of data the map is provided with an option of configurable map task,
the map task can access large block of data or files. This makes it easy to add heavy computational weight on the
map side.
3.) Combine operation: Unlike GFS where the output of reducer are stored in separate files, Twister comes with a
new phase along with map reduce called combine that’s collectively adds up the output coming from all the
reducer.
4.) Programming extensions: Some of the additional functions to support iterative functionality of Twister are:
i) mapReduceBCast(Value value) for sending a single to all map tasks. For example, the “Value” can be a set of
parameters, a resource (file or executable) name, or even a block of data
TWISTER ARCHITECTURE
The Twister is designed to effectively support iterative MapReduce function. To reach this flexibility it reads data
from the local disk of the worker nodes and handle the intermediate data data in the distributed memory of the
workers mode.
The messaging infrastructure in twister is called broker network and it is responsible to perform data transfer using
publish/subscribe messaging.
On-Demand Self-Service
A consumer can unilaterally provision computing capabilities, such as server time and network storage, as
needed automatically without requiring human interaction with each service provider.
Broad network access
Capabilities are available over the network and accessed through standard mechanisms that promote use by
heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).
Resource pooling
The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model,
with different physical and virtual resources dynamically assigned and reassigned according to consumer
demand. There is a sense of location independence in that the customer generally has no control or
knowledge over the exact location of the provided resources but may be able to specify location at a higher
level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing,
memory, and network bandwidth.
Rapid elasticity
Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly
outward and inward commensurate with demand. To the consumer, the capabilities available for
provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
Measured service.
Cloud systems automatically control and optimize resource use by leveraging a metering capability at some
level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user
accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the
provider and consumer of the utilized service.
For clarity and convenience, Notations describes the correlated variables which are used in the following sections.
To elaborate the essence of cloud elasticity, we give the various states that are used in our discussion. Let denote
the number of VMs in service and let be the number of requests in the system.
(1) Just-in-Need State. A cloud platform is in a just-in-need state if i<j<3i is defined as the accumulated time in
all just-in-need states.
(2) Over-provisioning State. A cloud platform is in an over-provisioning state if 0<j<i. To is defined as the
accumulated time in all over-provisioning states.
(3) Under-provisioning State. A cloud platform is in an under-provisioning state if j>3i. Tu is defined as the
accumulated time in all under-provisioning states.
Notice that constants 1 and 3 in this paper are only for illustration purpose and can be any other values,
depending on how an elastic cloud platform is managed. Different cloud users and/or applications may prefer
different bounds of the hypothetical just-in-need states. The length of the interval between the upper (e.g., ) and
lower (e.g., ) bounds controls the re-provisioning frequency. Narrowing down the interval leads to higher re-
provision frequency for a fluctuating workload.
The just-in-need computing resource denotes a balanced state, in which the workload can be properly
handled and quality of service (QoS) can be satisfactorily guaranteed. Computing resource over-provisioning,
though QoS can be achieved, leads to extra but unnecessary cost to rent the cloud resources. Computing resource
under-provisioning, on the other hand, delays the processing of workload and may be at the risk of breaking QoS
commitment.
We present our elasticity definition for a realistic cloud platform and present mathematical foundation for
elasticity evaluation. The definition of elasticity is given from a computational point of view and we develop a
calculation formula for measuring elasticity value in virtualized clouds. Let Tm be the measuring time, which
includes all the periods in the just-in-need, over-provisioning, and under-provisioning states; that is, Tm=Tj+To+Tu
Definition : The elasticity E of a cloud perform is the percentage of time when the platform is in just-in-need
states; that is, E=Tj/Tm=1-To/Tm-Tu/Tm
Broadly defining, elasticity is the capability of delivering preconfigured and just-in-need virtual machines
adaptively in a cloud platform upon the fluctuation of the computing resources required. Practically it is determined
by the time needed from an under-provisioning or over-provisioning state to a balanced resource provisioning state.
Definition 1 provides a mathematical definition which is easily and accurately measurable. Cloud platforms with
high elasticity exhibit high adaptively, implying that they switch from an over-provisioning or an under-
provisioning state to a balanced state almost in real time. Other cloud platforms take longer time to adjust and
reconfigure computing resources. Although it is recognized that high elasticity can also be achieved via physical
host standby, we argue that, with virtualization-enabled computing resource provisioning, elasticity can be
delivered in a much easier way due to the flexibility of service migration and image template generation.
where Tm denotes the total measuring time, in which To is the over-provisioning time which accumulates each
single period of time that the cloud platform needs to switch from an over-provisioning state to a balanced state
and is the Tu under-provisioning time which accumulates each single period of time that the cloud platform needs
to switch from an under-provisioning state to a corresponding balanced state.
Let Pj ,Po ,and Pu be the accumulated probabilities of just-in-need states, over-provisioning states, and under-
provisioning states, respectively. If Tm is sufficiently long, we have
Equation (E1) can be used when elasticity is measured by monitoring a real system. Equation (P j) can be
used when elasticity is calculated by using our CTMC model. If elasticity metrics are well defined, elasticity of
cloud platforms could easily be captured, evaluated, and compared.
We would like to mention that the primary factors of elasticity, that is, the amount, frequency, and time of
resource re-provisioning, are all summarized in To and Tu (i.e.Po, and Pu). Elasticity can be increased by changing
these factors. For example, one can maintain a list of standby or underutilized compute nodes. These nodes are
prepared for the upcoming surge of workload, if there is any, to minimize the time needed to start these nodes.
Such a hot standby strategy increases cloud elasticity by reducing Tu .
The hypothetical just-in-need state, over-provisioning state, and under-provisioning state are listed in Table
For example:
Streaming Services. Netflix is dropping a new season of Mind hunter. The notification triggers a
significant number of users to get on the service and watch or upload the episodes. Resource-wise, it is an
activity spike that requires swift resource allocation.
E-commerce. Amazon has a Prime Day event with many special offers, sells-offs, promotions, and
discounts. It attracts an immense amount of customers on the service that is doing different activities.
Actions include searching for products, bidding, buying stuff, writing reviews, rating products. This
diverse activity requires a very flexible system that can allocate resources to one sector without dragging
down others.
Figure 1.7 Elasticity in cloud
Cloud elasticity is a popular feature associated with scale-out solutions (horizontal scaling), which allows
for resources to be dynamically added or removed when needed. Elasticity is generally associated with public cloud
resources and is more commonly featured in pay-per-use or pay-as-you-grow services. This means IT managers are
not paying for more resources than they are consuming and any given time. In virtualized environments cloud
elasticity could include the ability to dynamically deploy new virtual machines or shutdown inactive virtual
machines.
Elasticity as the scaling of system resources to increase or decrease capacity, whereby definitions and
Specifically state that the amount of provisioned re- in a broader sense, it could also contain manual steps. Without
a defined adaptation process, a scalable system cannot behave in an elastic manner, as scalability on its own does
not include temporal aspects. When evaluating elasticity, the following points need to be checked beforehand:
Autonomic Scaling: What adaptation process is used for autonomic scaling?
• Elasticity Dimensions: What is the set of resource types scaled as part of the adaptation process?
• Resource Scaling Units: For each resource type, in what unit is the amount of allocated resources varied?
• Scalability Bounds: For each resource type, what is the upper bound on the amount of resources that can be
allocated?
What is Cloud Scalability?
Scalability is one of the preeminent features of cloud computing. In the past, a system’s scalability relied on the
company’s hardware, and thus, was severely limited in resources. With the adoption of cloud computing,
scalability has become much more available and more effective.
Automatic scaling opened up numerous possibilities for the implementation of big data machine learning models
and data analytics to the fold.
Overall, Cloud Scalability covers expected and predictable workload demands and also handles rapid and
unpredictable changes in the scale of operation. The pay-as-you-expand pricing model makes possible the
preparation of the infrastructure and its spending budget in the long term without too much strain.
There are several types of cloud scalability:
Vertical, aka Scale-Up - the ability to handle an increasing workload by adding resources to the existing
infrastructure. It is a short term solution to cover immediate needs.
Horizontal, aka Scale-Out - the expansion of the existing infrastructure with new elements to tackle more
significant workload requirements. It is a long term solution aimed to cover present and future resource demands
with room for expansion.
Diagonal scalability is a more flexible solution that combines adding and removal of resources according to the
current workload requirements. It is the most cost-effective scalability solution by far.
1.8 ON-DEMAND PROVISIONING
On-demand computing is a delivery model in which computing resources are made available to the user as
needed. The resources may be maintained within the user's enterprise, or made available by a cloud service
provider. When the services are provided by a third-party, the term cloud computing is often used as a synonym for
on-demand computing.
The on-demand model was developed to overcome the common challenge to an enterprise of being able to
meet fluctuating demands efficiently. Because an enterprise's demand on computing resources can vary drastically
from one time to another, maintaining sufficient resources to meet peak requirements can be costly. Conversely, if
an enterprise tried to cut costs by only maintaining minimal computing resources, it is likely there will not be
sufficient resources to meet peak requirements.
The on-demand model provides an enterprise with the ability to scale computing resources up or down
with the click of a button, an API call or a business rule. The model is characterized by three attributes: scalability,
pay-per-use and self-service. Whether the resource is an application program that helps team members collaborate
or additional storage for archiving images, the computing resources are elastic, metered and easy to obtain.
Many on-demand computing services in the cloud are so user-friendly that non-technical end users can
easily acquire computing resources without any help from the organization's information technology (IT)
department. This has advantages because it can improve business agility, but it also has disadvantages because
shadow IT can pose security risks. For this reason, many IT departments carry out periodic cloud audits to identify
greynet on-demand applications and other rogue IT.
a) Local On-demand Resource Provisioning
The Engine for the Virtual Infrastructure
Virtualization of Cluster and HPC Systems Benefits
Open Nebula creates a distributed virtualization layer
Extend the benefits of VM Monitors from one to multiple resources
Decouple the VM (service) from the physical location
Transform a distributed physical infrastructure into a flexible and elastic virtual infrastructure, which
adapts to the changing demands of the VM Service workload
Resource Provisioning Strategies For efficiently making use of the Cloud Resources, resource provisioning
techniques are to be used.
There are many resource provisioning techniques both static and dynamic provisioning each having its own pros
and cons. The provisioning techniques are used to improve QoS parameters , minimize cost for cloud user and
maximize revenue for the Cloud Service Provider improve response time , deliver services to cloud user even in
presence of failures, improve performance reduces SLA violation, efficiently uses cloud resources reduces power
consumption
Optimal Resource Provisioning Efficiently provisions Cloud Applicable only for SaaS users and
for Cloud Computing Resources for SaaS users with a SaaS providers.
12
Environment limited budget and Deadline
thereby optimizing QoS.
CHAPTER-2
CLOUD ENABLING TECHNOLOGIES
2.1 SERVICE-ORIENTED ARCHITECTURE (SOA)
A service-oriented architecture (SOA) is essentially a collection of services. These services communicate
with each other. The communication can involve either simple data passing or it could involve two or more
services coordinating some activity. Some means of connecting services to each other is needed.
Service-Oriented Architecture (SOA) is a style of software design where services are provided to the other
components by application components, through a communication protocol over a network. Its principles are
independent of vendors and other technologies. In service oriented architecture, a number of services communicate
with each other, in one of two ways: through passing data or through two or more services coordinating an activity.
Services
If a service-oriented architecture is to be effective, we need a clear understanding of the term service. A service is a
function that is well-defined, self-contained, and does not depend on the context or state of other services.
See Service.
Connections
The technology of Web Services is the most likely connection technology of service-oriented architectures. The
following figure illustrates a basic service-oriented architecture. It shows a service consumer at the right sending a
service request message to a service provider at the left. The service provider returns a response message to the
service consumer. The request and subsequent response connections are defined in some way that is understandable
to both the service consumer and service provider. How those connections are defined is explained in Web Services
Explained. A service provider can also be a service consumer.
Web services which are built as per the SOA architecture tend to make web service more independent. The web
services themselves can exchange data with each other and because of the underlying principles on which they are
created, they don't need any sort of human interaction and also don't need any code modifications. It ensures that
the web services on a network can interact with each other seamlessly.
Benefit of SOA
Language Neutral Integration: Regardless of the developing language used, the system offers and invoke
services through a common mechanism. Programming language neutralization is one of the key benefits
of SOA's integration approach.
Component Reuse: Once an organization built an application component, and offered it as a service, the
rest of the organization can utilize that service.
Organizational Agility: SOA defines building blocks of capabilities provided by software and it offers
some service(s) that meet some organizational requirement; which can be recombined and integrated
rapidly.
Leveraging Existing System: This is one of the major use of SOA which is to classify elements or
functions of existing applications and make them available to the organizations or enterprise.
2.1.1 SOA Architecture
SOA architecture is viewed as five horizontal layers. These are described below:
Consumer Interface Layer: These are GUI based apps for end users accessing the applications.
Business Process Layer: These are business-use cases in terms of application.
Services Layer: These are whole-enterprise, in service inventory.
Service Component Layer: are used to build the services, such as functional and technical libraries.
Operational Systems Layer: It contains the data model.
SOA Governance
It is a notable point to differentiate between It governance and SOA governance. IT governance
focuses on managing business services whereas SOA governance focuses on managing Business services.
Furthermore in service oriented organization, everything should be characterized as a service in an
organization. The cost that governance put forward becomes clear when we consider the amount of risk
that it eliminates with the good understanding of service, organizational data and processes in order to
choose approaches and processes for policies for monitoring and generate performance impact.
SOA Architecture Protocol
SOA Security
With the vast use of cloud technology and its on-demand applications, there is a need for well - defined
security policies and access control.
With the betterment of these issues, the success of SOA architecture will increase.
Actions can be taken to ensure security and lessen the risks when dealing with SOE (Service Oriented
Environment).
We can make policies that will influence the patterns of development and the way services are used.
Moreover, the system must be set-up in order to exploit the advantages of public cloud with resilience.
Users must include safety practices and carefully evaluate the clauses in these respects.
Elements of SOA
2.1.2 SOA is based on some key principles which are mentioned below
1. Standardized Service Contract - Services adhere to a service description. A service must have some sort
of description which describes what the service is about. This makes it easier for client applications to
understand what the service does.
2. Loose Coupling – Less dependency on each other. This is one of the main characteristics of web services
which just states that there should be as less dependency as possible between the web services and the
client invoking the web service. So if the service functionality changes at any point in time, it should not
break the client application or stop it from working.
3. Service Abstraction - Services hide the logic they encapsulate from the outside world. The service should
not expose how it executes its functionality; it should just tell the client application on what it does and not
on how it does it.
4. Service Reusability - Logic is divided into services with the intent of maximizing reuse. In any
development company re-usability is a big topic because obviously one wouldn't want to spend time and
effort building the same code again and again across multiple applications which require them. Hence,
once the code for a web service is written it should have the ability work with various application types.
5. Service Autonomy - Services should have control over the logic they encapsulate. The service knows
everything on what functionality it offers and hence should also have complete control over the code it
contains.
6. Service Statelessness - Ideally, services should be stateless. This means that services should not withhold
information from one state to the other. This would need to be done from either the client application. An
example can be an order placed on a shopping site. Now you can have a web service which gives you the
price of a particular item. But if the items are added to a shopping cart and the web page navigates to the
page where you do the payment, the responsibility of the price of the item to be transferred to the payment
page should not be done by the web service. Instead, it needs to be done by the web application.
7. Service Discoverability - Services can be discovered (usually in a service registry). We have already seen
this in the concept of the UDDI, which performs a registry which can hold information about the web
service.
8. Service Composability - Services break big problems into little problems. One should never embed all
functionality of an application into one single service but instead, break the service down into modules
each with a separate business functionality.
9. Service Interoperability - Services should use standards that allow diverse subscribers to use the service.
In web services, standards as XML and communication over HTTP is used to ensure it conforms to this
principle.
2.1.3 Implementing Service-Oriented Architecture
When it comes to implementing service-oriented architecture (SOA), there is a wide range of
technologies that can be used, depending on what your end goal is and what you’re trying to
accomplish.
Typically, Service-Oriented Architecture is implemented with web services, which makes the
“functional building blocks accessible over standard internet protocols.”
An example of a web service standard is SOAP, which stands for Simple Object Access Protocol. In a
nutshell, SOAP “is a messaging protocol specification for exchanging structured information in the
implementation of web services in computer networks.
The importance of Service-Oriented Architecture
There are a variety of ways that implementing an SOA structure can benefit a business, particularly, those that
are based around web services. Here are some of the foremost:
The primary motivator for companies to switch to an SOA is the ability to reuse code for different
applications. By reusing code that already exists within a service, enterprises can significantly reduce the time
that is spent during the development process. Not only does the ability to reuse services decrease time
constraints, but it also lowers costs that are often incurred during the development of applications. Since SOA
allows varying languages to communicate through a central interface, this means that application engineers do
not need to be concerned with the type of environment in which these services will be run. Instead, they only
need to focus on the public interface that is being used.
Creates reusable code
The primary motivator for companies to switch to an SOA is the ability to reuse code for different
applications. By reusing code that already exists within a service, enterprises can significantly reduce the time
that is spent during the development process. Not only does the ability to reuse services decrease time
constraints, but it also lowers costs that are often incurred during the development of applications. Since SOA
allows varying languages to communicate through a central interface, this means that application engineers do
not need to be concerned with the type of environment in which these services will be run. Instead, they only
need to focus on the public interface that is being used.
Promotes interaction
A major advantage in using SOA is the level of interoperability that can be achieved when properly
implemented. With SOA, no longer will communication between platforms be hindered in operation by the
languages on which they are built. Once a standardized communication protocol has been put in place, the
platform systems and the varying languages can remain independent of each other, while still being able to
transmit data between clients and services. Adding to this level of interoperability is the fact that SOA can
negotiate firewalls, thus ensuring that companies can share services that are vital to operations.
Allows for scalability
When developing applications for web services, one issue that is of concern is the ability to increase the scale
of the service to meet the needs of the client. All too often, the dependencies that are required for applications
to communicate with different services inhibit the potential for scalability. However, with SOA this is not the
case. By using an SOA where there is a standard communication protocol in place, enterprises can drastically
reduce the level of interaction that is required between clients and services, and this reduction means that
applications can be scaled without putting added pressure on the application, as would be the case in a tightly-
coupled environment.
Reduced costs
In business, the ability to reduce costs while still maintaining a desired level of output is vital to success, and
this concept holds true with customized service solutions as well. By switching to an SOA-based system,
businesses can limit the level of analysis that is often required when developing customized solutions for
specific applications. This cost reduction is facilitated by the fact that loosely coupled systems are easier to
maintain and do not necessitate the need for costly development and analysis. Furthermore, the increasing
popularity in SOA means reusable business functions are becoming commonplace for web services which
drive costs lower.
2.2 REST AND SYSTEMS OF SYSTEMS
Representational State Transfer (REST) is an architecture principle in which the web services are viewed
as resources and can be uniquely identified by their URLs. The key characteristic of a RESTful Web service is
the explicit use of HTTP methods to denote the invocation of different operations.
Representational state transfer (REST) is a distributed system framework that uses Web protocols and
technologies. The REST architecture involves client and server interactions built around the transfer of
resources. The Web is the largest REST implementation.
REST may be used to capture website data through interpreting extensible markup language (XML) Web
page files with the desired data. In addition, online publishers use REST when providing syndicated content to
users by activating Web page content and XML statements. Users may access the Web page through the
website's URL, read the XML file with a Web browser, and interpret and use data as needed.
Basic REST constraints include:
Client and Server: The client and server are separated from REST operations through a uniform interface,
which improves client code portability.
Stateless: Each client request must contain all required data for request processing without storing client
context on the server.
Cacheable: Responses (such as Web pages) can be cached on a client computer to speed up Web Browsing.
Responses are defined as cacheable or not cacheable to prevent clients from reusing stale or inappropriate data
when responding to further requests.
Layered System: Enables clients to connect to the end server through an intermediate layer for improved
scalability.
2.2.1 The basic REST design principle uses the HTTP protocol methods for typical CRUD operations:
POST - Create a resource
GET - Retrieve a resource
PUT – Update a resource
DELETE - Delete a resource
The major advantages of REST-services are:
They are highly reusable across platforms (Java, .NET, PHP, etc) since they rely on basic HTTP protocol
They use basic XML instead of the complex SOAP XML and are easily consumable
REST-based web services are increasingly being preferred for integration with backend enterprise services.
In comparison to SOAP based web services, the programming model is simpler and the use of native XML
instead of SOAP reduces the serialization and deserialization complexity as well as the need for
additional third-party libraries for the same.
2.2.2 Overview of Architecture
In J2EE applications, the Java API or services are exposed as either Stateless Session Bean API (Session
Façade pattern) or as SOAP web services. In case of integration of these services with client applications using
non-Java technology like .NET or PHP etc, it becomes very cumbersome to work with SOAP Web Services
and also involves considerable development effort.
The approach mentioned here is typically intended for service integrations within the organization where
there are many services which can be reused but the inter-operability and development costs using SOAP
create a barrier for quick integrations. Also, in scenarios where a service is not intended to be exposed on the
enterprise ESB or EAI by the internal Governance organization, it becomes difficult to integrate 2 diverse-
technology services in a point-to-point manner.
For example – In a telecom IT environment:
Sending an SMS to the circle-specific SMSC’s which is exposed as a SOAP web service or an EJB API;
Or
Creating a Service Request in a CRM application exposed as a Database stored procedure (e.g. Oracle
CRM) exposed over ESB using MQ or JMS bindings; Or
Creating a Sales Order request for a Distributor from a mobile SMS using the SMSGateway.
If above services are to be used by a non-Java application, then the integration using SOAP web services
will be cumbersome and involve extended development.
This new approach has been implemented in the form of a framework so that it can be reused in other areas
where a Java Service can be exposed as a REST-like resource.
The architecture consists of a Front Controller which acts as the central point for receiving requests and
providing response to the clients. The Front Controller delegates the request processing to the ctionController
which contains the processing logic of this framework. The ActionController performs validation, maps the
request to the appropriate Action and invokes the action to generate response. Various Helper Services are
provided for request processing, logging and exception handling which can be used by the ActionController as
well as the individual Actions.
Service Client
This is a client application which needs to invoke the service. This component can be either Java-based or any
other client as long as it is able to support the HTTP methods
Common Components
These are the utility services required by the framework like logging, exception handling and any common
functions or constants required for implementation. Apache Commons logging with Log4j implementation is
used in the sample code.
Figure 2.3 REST-Like enablement framework
RESTServiceServlet
The framework uses the Front Controller pattern for centralized request processing and uses this Java Servlet
component for processing the input requests. It supports common HTTP methods like GET, PUT, POST and
DELETE.
RESTActionController
This component is the core framework controller which manages the core functionality of loading the services
and framework configuration, validation of requests and mapping the requests with configured REST actions
and executing the actions.
RESTConfiguration
This component is responsible for loading and caching the framework configuration as well as the various
REST services configuration at run-time. This component is used by the RESTActionController to identify the
correct action to be called for a request as well as validate the input request.
RESTMapping
This component stores the REST action mappings specified in the configuration file. The mapping primarily
consists of the URI called by client and the action class which does the processing.
ActionContext
This component encapsulates all the features required for execution of the REST action. It assists developers
in providing request and response handling features so that the developer has to only code the actual business
logic implementation. It hides the protocol specific request and response objects from the Action component
and hence allows independent testing of the same like a POJO. It also provides a handle to the XML Binding
Service so that Java business objects can be easily converted to XML and vice-versa based on the configured
XML Binding API. The RESTActionController configures this component dynamically and provides it to the
Action component.
The above diagram shows a very simplistic view of how a web service would actually work. The
client would invoke a series of web service calls via requests to a server which would host the actual
web service.
These requests are made through what is known as remote procedure calls. Remote Procedure
Calls(RPC) are calls made to methods which are hosted by the relevant web service.
As an example, Amazon provides a web service that provides prices for products sold online via
amazon.com. The front end or presentation layer can be in .Net or Java but either programming
language would have the ability to communicate with the web service.
The main component of a web service is the data which is transferred between the client and the
server, and that is XML. XML (Extensible markup language) is a counterpart to HTML and easy to
understand the intermediate language that is understood by many programming languages.
So when applications talk to each other, they actually talk in XML. This provides a common
platform for application developed in various programming languages to talk to each other.
Web services use something known as SOAP (Simple Object Access Protocol) for sending the XML
data between applications. The data is sent over normal HTTP.
The data which is sent from the web service to the application is called a SOAP message. The SOAP
message is nothing but an XML document. Since the document is written in XML, the client
application calling the web service can be written in any programming language.
2.3.1 Type of Web Service
There are mainly two types of web services.
SOAP web services.
RESTful web services.
In order for a web service to be fully functional, there are certain components that need to be in place.
These components need to be present irrespective of whatever development language is used for
programming the web service.
SOAP (Simple Object Access Protocol)
SOAP is known as a transport-independent messaging protocol. SOAP is based on transferring XML data
as SOAP Messages. Each message has something which is known as an XML document.
Only the structure of the XML document follows a specific pattern, but not the content. The best part of
Web services and SOAP is that its all sent via HTTP, which is the standard web protocol.
Each SOAP document needs to have a root element known as the <Envelope> element. The root element is
the first element in an XML document.
The "envelope" is in turn divided into 2 parts. The first is the header, and the next is the body.
The header contains the routing data which is basically the information which tells the XML document to
which client it needs to be sent to.
The body will contain the actual message.
The diagram below shows a simple example of the communication via SOAP.
WSDL (Web services description language)
The client invoking the web service should know where the web service actually resides.
Secondly, the client application needs to know what the web service actually does, so that it can invoke the
right web service.
This is done with the help of the WSDL, known as the Web services description language.
The WSDL file is again an XML-based file which basically tells the client application what the web
service does. By using the WSDL document, the client application would be able to understand where the
web service is located and how it can be utilized.
Web Service Example
An example of a WSDL file is given below.
<definitions>
<message name="TutorialRequest">
<part name="TutorialID" type="xsd:string"/>
</message>
<message name="TutorialResponse">
<part name="TutorialName" type="xsd:string"/>
</message>
<portType name="Tutorial_PortType">
<operation name="Tutorial">
<input message="tns:TutorialRequest"/>
<output message="tns:TutorialResponse"/>
</operation>
</portType>
<binding name="Tutorial_Binding" type="tns:Tutorial_PortType">
<soap:binding style="rpc"
transport="http://schemas.xmlsoap.org/soap/http"/>
<operation name="Tutorial">
<soap:operation soapAction="Tutorial"/>
<input>
<soap:body
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
namespace="urn:examples:Tutorialservice"
use="encoded"/>
</input>
<output>
<soap:body
encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"
namespace="urn:examples:Tutorialservice"
use="encoded"/>
</output>
</operation>
</binding>
</definitions>
The important aspects to note about the above WSDL declaration are as follows;
<message> - The message parameter in the WSDL definition is used to define the different data elements for each
operation performed by the web service. So in the example above, we have 2 messages which can be exchanged
between the web service and the client application, one is the "TutorialRequest", and the other is the
"TutorialResponse" operation. The TutorialRequest contains an element called "TutorialID" which is of the type
string. Similarly, the TutorialResponse operation contains an element called "TutorialName" which is also a type
string.
<portType> - This actually describes the operation which can be performed by the web service, which in our case
is called Tutorial. This operation can take 2 messages; one is an input message, and the other is the output message.
<binding> - This element contains the protocol which is used. So in our case, we are defining it to use http
(http://schemas.xmlsoap.org/soap/http). We also specify other details for the body of the operation, like the
namespace and whether the message should be encoded.
Universal Description, Discovery, and Integration (UDDI)
UDDI is a standard for describing, publishing, and discovering the web services that are provided by a
particular service provider. It provides a specification which helps in hosting the information on web
services.
Now we discussed in the previous topic about WSDL and how it contains information on what the Web
service actually does.
But how can a client application locate a WSDL file to understand the various operations offered by a web
service? So UDDI is the answer to this and provides a repository on which WSDL files can be hosted.
So the client application will have complete access to the UDDI, which acts as a database containing all the
WSDL files.
Just as a telephone directory has the name, address and telephone number of a particular person, the same
way the UDDI registry will have the relevant information for the web service.
2.3.2 Web Services Advantages
We already understand why web services came about in the first place, which was to provide a platform which
could allow different applications to talk to each other.
Exposing Business Functionality on the network - A web service is a unit of managed code that provides some
sort of functionality to client applications or end users. This functionality can be invoked over the HTTP protocol
which means that it can also be invoked over the internet. Nowadays all applications are on the internet which
makes the purpose of Web services more useful. That means the web service can be anywhere on the internet and
provide the necessary functionality as required.
Interoperability amongst applications - Web services allow various applications to talk to each other and share
data and services among themselves. All types of applications can talk to each other. So instead of writing specific
code which can only be understood by specific applications, you can now write generic code that can be understood
by all applications
A Standardized Protocol which everybody understands - Web services use standardized industry protocol for
the communication. All the four layers (Service Transport, XML Messaging, Service Description, and Service
Discovery layers) uses well-defined protocols in the web services protocol stack.
Reduction in cost of communication - Web services use SOAP over HTTP protocol, so you can use your existing
low-cost internet for implementing web services.
Web service Architecture
Every framework needs some sort of architecture to make sure the entire framework works as desired. Similarly, in
web services, there is an architecture which consists of three distinct roles as given below
Provider - The provider creates the web service and makes it available to client application who want to use it.
Requestor - A requestor is nothing but the client application that needs to contact a web service. The client
application can be a .Net, Java, or any other language based application which looks for some sort of functionality
via a web service.
Broker - The broker is nothing but the application which provides access to the UDDI. The UDDI, as discussed in
the earlier topic enables the client application to locate the web service.
The diagram below showcases how the Service provider, the Service requestor and Service registry interact
with each other.
Publish - A provider informs the broker (service registry) about the existence of the web service by using the
broker's publish interface to make the service accessible to clients
Find - The requestor consults the broker to locate a published web service
Bind - With the information it gained from the broker(service registry) about the web service, the requestor is able
to bind, or invoke, the web service.
2.4 PUBLISH-SUBSCRIBE MODEL
Pub/Sub brings the flexibility and reliability of enterprise message-oriented middleware to the cloud. At the
same time, Pub/Sub is a scalable, durable event ingestion and delivery system that serves as a foundation for
modern stream analytics pipelines. By providing many-to-many, asynchronous messaging that decouples senders
and receivers, it allows for secure and highly available communication among independently written applications.
Pub/Sub delivers low-latency, durable messaging that helps developers quickly integrate systems hosted on the
Google Cloud Platform and externally.
Publish-subscribe (pub/sub) is a messaging pattern where publishers push messages to subscribers. In
software architecture, pub/sub messaging provides instant event notifications for distributed applications, especially
those that are decoupled into smaller, independent building blocks. In laymen’s terms, pub/sub describes how two
different parts of a messaging pattern connect and communicate with each other.
In a topic-based system, messages are published to named channels (topics). The publisher is the one who creates
these channels. Subscribers subscribe to those topics and will receive messages from them whenever they appear.
In a content-based system, messages are only delivered if they match the constraints and criteria that are defined
by the subscriber.
2.5 BASICS OF VIRTUALIZATION
The term 'Virtualization' can be used in many respect of computer. It is the process of creating a
virtual environment of something which may include hardware platforms, storage devices, OS, network
resources, etc. The cloud's virtualization mainly deals with the server virtualization.
Virtualization is the ability which allows sharing the physical instance of a single application or
resource among multiple organizations or users. This technique is done by assigning a name logically to
all those physical resources & provides a pointer to those physical resources based on demand.
Over an existing operating system & hardware, we generally create a virtual machine which and
above it we run other operating systems or applications. This is called Hardware Virtualization. The
virtual machine provides a separate environment that is logically distinct from its underlying hardware.
Here, the system or the machine is the host & virtual machine is the guest machine.
Figure - The Cloud's Virtualization
There are several approaches or ways to virtualizes cloud servers.
These are:
Grid Approach: where the processing workloads are distributed among different physical servers, and
their results are then collected as one.
OS - Level Virtualization: Here, multiple instances of an application can run in an isolated form on a
single OS
Hypervisor-based Virtualization: which is currently the most widely used technique With hypervisor's
virtualization, there are various sub-approaches to fulfill the goal to run multiple applications & other
loads on a single physical host. A technique is used to allow virtual machines to move from one host to
another without any requirement of shutting down. This technique is termed as "Live Migration". Another
technique is used to actively load balance among multiple hosts to efficiently utilize those resources
available in a virtual machine, and the concept is termed as Distributed Resource Scheduling or Dynamic
Resource Scheduling.
2.5.1 VIRTUALIZATION
Virtualization is the process of creating a virtual environment on an existing server to run your
desired program, without interfering with any of the other services provided by the server or host
platform to other users. The Virtual environment can be a single instance or a combination of many such
as operating systems, Network or Application servers, computing environments, storage devices and
other such environments.
Virtualization in Cloud Computing is making a virtual platform of server operating system and
storage devices. This will help the user by providing multiple machines at the same time it also allows
sharing a single physical instance of resource or an application to multiple users. Cloud Virtualizations
also manage the workload by transforming traditional computing and make it more scalable, economical
and efficient.
TYPES OF VIRTUALIZATION
i. Operating System Virtualization
ii. Hardware Virtualization
iii. Server Virtualization
iv. Storage Virtualization
Virtualization Architecture
Benefits for Companies
Removal of special hardware and utility requirements
Effective management of resources
Increased employee productivity as a result of better accessibility
Reduced risk of data loss, as data is backed up across multiple storage locations
Benefits for Data Centers
Maximization of server capabilities, thereby reducing maintenance and operation costs
Smaller footprint as a result of lower hardware, energy and manpower requirements
Access to the virtual machine and the host machine or server is facilitated by a software known
as Hypervisor. Hypervisor acts as a link between the hardware and the virtual environment and
distributes the hardware resources such as CPU usage, memory allotment between the different
virtual environments.
Hardware Virtualization
Hardware virtualization also known as hardware-assisted virtualization or server virtualization
runs on the concept that an individual independent segment of hardware or a physical server, may
be made up of multiple smaller hardware segments or servers, essentially consolidating multiple
physical servers into virtual servers that run on a single primary physical server. Each small
server can host a virtual machine, but the entire cluster of servers is treated as a single device by
any process requesting the hardware. The hardware resource allotment is done by the hypervisor.
The main advantages include increased processing power as a result of maximized hardware
utilization and application uptime.
Subtypes:
Full Virtualization – Guest software does not require any modifications since the underlying
hardware is fully simulated.
Emulation Virtualization – The virtual machine simulates the hardware and becomes
independent of it. The guest operating system does not require any modifications.
Para virtualization – the hardware is not simulated and the guest software runs their own
isolated domains.
Software Virtualization
Software Virtualization involves the creation of an operation of multiple virtual environments on
the host machine. It creates a computer system complete with hardware that lets the guest
operating system to run. For example, it lets you run Android OS on a host machine natively
using a Microsoft Windows OS, utilizing the same hardware as the host machine does.
Subtypes:
Operating System Virtualization – hosting multiple OS on the native OS
In operating system virtualization in Cloud Computing, the virtual machine software installs in
the operating system of the host rather than directly on the hardware system. The most important
use of operating system virtualization is for testing the application on different platforms or
operating system. Here, the software is present in the hardware, which allows different
applications to run.
Application Virtualization – hosting individual applications in a virtual environment separate
from the native OS.
Service Virtualization – hosting specific processes and services related to a particular
application.
Server Virtualization
In server virtualization in Cloud Computing, the software directly installs on the server system
and use for a single physical server can divide into many servers on the demand basis and
balance the load. It can be also stated that the server virtualization is masking of the server
resources which consists of number and identity. With the help of software, the server
administrator divides one physical server into multiple servers.
Memory Virtualization
Physical memory across different servers is aggregated into a single virtualized memory pool. It
provides the benefit of an enlarged contiguous working memory. You may already be familiar
with this, as some OS such as Microsoft Windows OS allows a portion of your storage disk to
serve as an extension of your RAM.
Subtypes:
Application-level control – Applications access the memory pool directly
Operating system level control – Access to the memory pool is provided through an operating
system
Storage Virtualization
Multiple physical storage devices are grouped together, which then appear as a single storage
device. This provides various advantages such as homogenization of storage across storage
devices of multiple capacity and speeds, reduced downtime, load balancing and better
optimization of performance and speed. Partitioning your hard drive into multiple partitions is an
example of this virtualization.
Subtypes:
Block Virtualization – Multiple storage devices are consolidated into one
File Virtualization – Storage system grants access to files that are stored over multiple hosts
Data Virtualization
It lets you easily manipulate data, as the data is presented as an abstract layer completely
independent of data structure and database systems. Decreases data input and formatting errors.
Network Virtualization
In network virtualization, multiple sub-networks can be created on the same physical network,
which may or may not is authorized to communicate with each other. This enables restriction of
file movement across networks and enhances security, and allows better monitoring and
identification of data usage which lets the network administrator’s scale up the network
appropriately. It also increases reliability as a disruption in one network doesn’t affect other
networks, and the diagnosis is easier.
Hardware Virtualization
Hardware virtualization in Cloud Computing, used in server platform as it is flexible to use
Virtual Machine rather than physical machines. In hardware virtualizations, virtual machine
software installs in the hardware system and then it is known as hardware virtualization. It
consists of a hypervisor which use to control and monitor the process, memory, and other
hardware resources. After the completion of hardware virtualization process, the user can install
the different operating system in it and with this platform different application can use.
Storage Virtualization
In storage virtualization in Cloud Computing, a grouping is done of physical storage which is
from multiple network storage devices this is done so it looks like a single storage device. It can
implement with the help of software applications and storage virtualization is done for the
backup and recovery process. It is a sharing of the physical storage from multiple storage
devices.
2.5.2 Subtypes:
Internal network: Enables a single system to function like a network
External network: Consolidation of multiple networks into a single one, or segregation of a
single network into multiple ones.
Desktop Virtualization
This is perhaps the most common form of virtualization for any regular IT employee. The user’s
desktop is stored on a remote server, allowing the user to access his desktop from any device or
location. Employees can work conveniently from the comfort of their home. Since the data
transfer takes place over secure protocols, any risk of data theft is minimized.
Benefits of Virtualization
Virtualizations in Cloud Computing has numerous benefits, let’s discuss them one by one:
i. Security
During the process of virtualization security is one of the important concerns. The security can be
provided with the help of firewalls, which will help to prevent unauthorized access and will keep
the data confidential. Moreover, with the help of firewall and security, the data can protect from
harmful viruses malware and other cyber threats. Encryption process also takes place with
protocols which will protect the data from other threads. So, the customer can virtualize all the
data store and can create a backup on a server in which the data can store.
ii. Flexible operations
With the help of a virtual network, the work of it professional is becoming more efficient and
agile. The network switch implement today is very easy to use, flexible and saves time. With the
help of virtualization in Cloud Computing, technical problems can solve in physical systems. It
eliminates the problem of recovering the data from crashed or corrupted devices and hence saves
time.
iii. Economical
Virtualization in Cloud Computing, save the cost for a physical system such as hardware and
servers. It stores all the data in the virtual server, which are quite economical. It reduces the
wastage, decreases the electricity bills along with the maintenance cost. Due to this, the business
can run multiple operating system and apps in a particular server.
iv. Eliminates the risk of system failure
While performing some task there are chances that the system might crash down at the wrong
time. This failure can cause damage to the company but the virtualizations help you to perform
the same task in multiple devices at the same time. The data can store in the cloud it can retrieve
anytime and with the help of any device. Moreover, there is two working server side by side
which makes the data accessible every time. Even if a server crashes with the help of the second
server the customer can access the data.
v. Flexible transfer of data
The data can transfer to the virtual server and retrieve anytime. The customers or cloud provider
don’t have to waste time finding out hard drives to find data. With the help of virtualization, it
will very easy to locate the required data and transfer them to the allotted authorities. This
transfer of data has no limit and can transfer to a long distance with the minimum charge
possible. Additional storage can also provide and the cost will be as low as possible.
Which Technology to use?
Virtualization is possible through a wide range of Technologies which are available to use and
are also OpenSource. We prefer using XEN or KVM since they provide the best virtualization
experience and performance.
XEN
KVM
OpenVZ
Conclusion
With the help of Virtualization in Cloud Computing, companies can implement cloud
computing. This article proves that virtualization in Cloud computing is an important aspect in
cloud computing and can maintain and secure the data. virtualization lets you easily outsource
your hardware and eliminate any energy costs associated with its operation. Although it may not
work for everyone, however the efficiency, security and cost advantages are considerable for you
to consider employing it as part of your operations. But whatever type of virtualization you may
need, always look for service providers that provide straightforward tools to manage your
RESOURCES AND MONITOR USAGE
2.6 LEVELS OF VIRTUALIZATION IMPLEMENTATION.
a) Instruction Set Architecture Level.
b) Hardware Abstraction Level.
c) Operating System Level.
d) Library Support Level.
e) User-Application Level.
Virtualization at ISA (instruction set architecture) level
Virtualization is implemented at ISA (Instruction Set Architecture) level by transforming
physical architecture of system’s instruction set into software completely. The host machine is a
physical platform containing various components, such as process, memory, Input/output (I/O)
devices, and buses. The VMM installs the guest systems on this machine. The emulator gets the
instructions from the guest systems to process and execute. The emulator transforms those
instructions into native instruction set, which are run on host machine’s hardware. The
instructions include both the I/O-specific ones and the processor-oriented instructions. For an
emulator to be efficacious, it has to imitate all tasks that a real computer could perform.
Advantages:
It is a simple and strong method of conversion into virtual architecture. On a single
physical structure, this architecture makes simple to implement multiple systems on single
physical structure. The instructions given by the guest system is translated into instructions of the
host system .This architecture makes the host system to adjust to the changes in architecture of the
guest system. The binding between the guest system and the host is not rigid, but making it very
flexible. The infrastructure of this kind could be used for creating virtual machines on platform,
for example: X86 on any platform such as Sparc, X86, Alpha, etc.
Disadvantage: The instructions should be interpreted before being executed. And therefore the
system with ISA level of virtualization shows poor performance.
Virtualization is commonly hypervisor-based. The hypervisor isolates operating systems and applications
from the underlying computer hardware so the host machine can run multiple virtual machines (VM) as
guests that share the system's physical compute resources, such as processor cycles, memory space,
network bandwidth and so on.
Type 1 hypervisors, sometimes called bare-metal hypervisors, run directly on top of the host system
hardware. Bare-metal hypervisors offer high availability and resource management. Their direct access to
system hardware enables better performance, scalability and stability. Examples of type 1 hypervisors
include Microsoft Hyper-V, Citrix XenServer and VMware ESXi.
A type 2 hypervisor, also known as a hosted hypervisor, is installed on top of the host operating system,
rather than sitting directly on top of the hardware as the type 1 hypervisor does. Each guest OS or VM
runs above the hypervisor. The convenience of a known host OS can ease system configuration and
management tasks. However, the addition of a host OS layer can potentially limit performance and
expose possible OS security flaws. Examples of type 2 hypervisors include VMware Workstation, Virtual
PC and Oracle VM VirtualBox.
The hypervisor supports hardware-level virtualization like CPU, memory, disk and network interfaces.
A modern technology that helps teams simulate dependent services that are out of your control for testing,
service virtualization is a key enabler to any test automation project .
By creating stable and predictable test environments with service virtualization, your test automation will
be reliable and accurate, but there are several different approaches and tools available on the market.
What should you look for in a service virtualization solution to make sure that you’re maximizing your
return on investment?
Lightweight Service Virtualization Tools
Free or open-source tools are great tools to start with because they help you get started in a very ad hoc
way, so you can quickly learn the benefits of service virtualization. Some examples of lightweight tools
include Traffic Parrot, Mockito, or the free version of Parasoft Virtualize. These solutions are usually
sought out by individual development teams to “try out” service virtualization, brought in for a very
specific project or reason.
While these tool are great for understanding what service virtualization is all about and helping individual
users make the case for broader adoption across teams, the downside of these lightweight tools is that it's
often challenging for those users to garner full organizational traction because the tools lack the breadth
of capability and ease-of-use required for less technical users to be successful. Additionally, while these
tools are free in the short term, they become more expensive as you start to look into maintenance and
customization.
As opposed to trying to focus on generic pros and cons of different solutions, I always try and stress to
clients the importance of identifying what you uniquely need for your team and your projects. It's also
important to identify future areas of capabilities that you may not be ready for now, but will just be sitting
there in your service virtualization solution for when your test maturity and user adoption grows. So what
are those key capabilities?
Automation Capabilities:
CI integration
Build system plugins
Command-line execution
Open APIs for DevOps integration
Cloud support (EC2, Azure)
Management and Maintenance Support:
Governance
Environment management
Monitoring
A process for managing change
On-premise and browser-based access
Supported Technologies:
REST API virtualization
SOAP API virtualization
Asynchronous API messaging
MQ/JMS virtualization
IoT and microservice virtualization
Database virtualization
Webpage virtualization
File transfer virtualization
Mainframe and fixed-length
EDI virtualization
Fix, Swift, etc.
We see best Service Virtualization Tools. Some of the popular Service Virtualization Tools are as
follows:
IBM Rational Test Virtualization Server software enables early and frequent testing in the development
lifecycle. It removes dependencies by virtualizing part or all of an application or database so software
testing teams don’t have to wait for the availability of those resources to begin. Combined with
Integration Tester, you can achieve continuous software testing.
Features:
Virtualize services, software and applications.
Update, reuse and share virtualized environments
Get support for middleware technologies
Benefit from integration with other tools
Flexible pricing and deployment
MICRO FOCUS DATA SIMULATION SOFTWARE
Application simulation software to keep you on schedule and focused on service quality—not service
constraints.
Features:
Easily create simulations of application behavior.
Model the functional network and performance behavior of your virtual services by using step-by-
step wizards.
Modify data, network, and performance models easily.
Manage from anywhere with support for user roles, profiles, and access control lists.
Virtualize what matters: create simulations incorporating a wide array of message formats,
transport types, and even ERP application protocols to test everything from the latest web service
to a legacy system.
Easily configure and use virtual services in your daily testing practices. Service Virtualization
features fully integrate into LoadRunner, Performance Center, ALM, and Unified Functional
Testing.
2.8.2 BROADCOM SERVICE VIRTUALIZATION (FORMERLY CA SERVICE
VIRTUALIZATION)
Service Virtualization (formerly CA Service Virtualization) simulates unavailable systems across
the software development lifecycle (SDLC), allowing developers, testers, integration, and performance
teams to work in parallel for faster delivery and higher application quality and reliability. You’ll be able
to accelerate software release cycle times, increase quality and reduce software testing environment
infrastructure costs.
Features:
Accelerate time-to-market by enabling parallel software development, testing and validation.
Test earlier in the SDLC where it is less expensive and disruptive to solve application defects.
Reduce demand for development environments or pay-per-use service charges.
Smartbear ServiceVPro
Smartbear ServiceVPro is a Service API Mocking and Service Virtualization Tool. API virtualization
in ServiceV Pro helps you deliver great APIs on time and under budget, and does so for a fraction of the
cost typically associated with traditional enterprise service virtualization suites. Virtualize REST &
SOAP APIs, TCP, JDBC, and more to accelerate development and testing cycles.
Features:
Create virtual services from an API definition, record and use an existing service, or start from
scratch to to generate a virtual service.
Create, configure, and deploy your mock on local machines, or deploy inside a public or private
cloud to share. Analyze traffic & performance of each virtual service from a web UI.
Generate dynamic mock data instantly
Simulate Network Performance & Server-Side Behavior
Real-time Service Recording & Switching
TRICENTIS TOSCA TEST-DRIVEN SERVICE VIRTUALIZATION
Tricentis Test-Driven Service Virtualization simulates the behavior of dependent systems that are difficult
to access or configure so you can continuously test without delays.
Features:
Reuse Tests as Service Virtualization Scenarios
More Risk Coverage With Test-Driven Service Virtualization
Effortless Message Verification and Analysis
Create and Maintain Virtual Services with Ease
WireMock:
WireMock is a simulator for HTTP-based APIs. Some might consider it a service virtualization tool or
a mock server. It enables you to stay productive when an API you depend on doesn’t exist or isn’t
complete. It supports testing of edge cases and failure modes that the real API won’t reliably produce.
And because it’s fast it can reduce your build time from hours down to minutes.
Features:
Flexible Deployment: Run WireMock from within your Java application, JUnit test, Servlet
container or as a standalone process.
Powerful Request Matching: Match request URLs, methods, headers cookies and bodies using a
wide variety of strategies. First class support for JSON and XML.
Record and Playback: Get up and running quickly by capturing traffic to and from an existing
API.
Conclusion:
We have included most of the tools we have come across. If we missed any tool, please share in
the comments and we will include in our list of Service Virtualization Tools. You may also want
to check out our ultimate list of API Testing Tools that contains Popular API Testing Tools.
2.9 WHAT IS CPU VIRTUALIZATION
CPU virtualization involves a single CPU acting as if it were multiple separate CPUs. The most
common reason for doing this is to run multiple different operating systems on one machine. CPU
virtualization emphasizes performance and runs directly on the available CPUs whenever possible. The
underlying physical resources are used whenever possible and the virtualization layer runs instructions
only as needed to make virtual machines operate as if they were running directly on a physical machine.
When many virtual machines are running on an ESXi host, those virtual machines might compete for
CPU resources. When CPU contention occurs, the ESXi host time-slices the physical processors across
all virtual machines so each virtual machine runs as if it has its specified number of virtual processors.
To support virtualization, processors such as the x86 employ a special running mode and instructions,
known as hardware-assisted virtualization. In this way, the VMM and guest OS run in different modes
and all sensitive instructions of the guest OS and its applications are trapped in the VMM. To save
processor states, mode switching is completed by hardware. For the x86 architecture, Intel and AMD
have proprietary technologies for hardware-assisted virtualization.
Modern operating systems and processors permit multiple processes to run simultaneously. If there is no
protection mechanism in a processor, all instructions from different processes will access the hardware
directly and cause a system crash. Therefore, all processors have at least two modes, user mode and
supervisor mode, to ensure controlled access of critical hardware. Instructions running in supervisor mode
are called privileged instructions. Other instructions are unprivileged instructions. In a virtualized
environment, it is more difficult to make OSes and applications run correctly because there are more
layers in the machine stack. Example 3.4 discusses Intel’s hardware support approach.
At the time of this writing, many hardware virtualization products were available. The VMware
Workstation is a VM software suite for x86 and x86-64 computers. This software suite allows users to set
up multiple x86 and x86-64 virtual computers and to use one or more of these VMs simultaneously with
the host operating system. The VMware Workstation assumes the host-based virtualization. Xen is a
hypervisor for use in IA-32, x86-64, Itanium, and PowerPC 970 hosts. Actually, Xen modifies Linux as
the lowest and most privileged layer, or a hypervisor.
One or more guest OS can run on top of the hypervisor. KVM (Kernel-based Virtual Machine) is a Linux
kernel virtualization infrastructure. KVM can support hardware-assisted virtualization and
paravirtualization by using the Intel VT-x or AMD-v and VirtIO framework, respectively. The VirtIO
framework includes a paravirtual Ethernet card, a disk I/O controller, a balloon device for adjusting guest
memory usage, and a VGA graphics interface using VMware drivers.
Example 3.4 Hardware Support for Virtualization in the Intel x86 Processor
Since software-based virtualization techniques are complicated and incur performance overhead, Intel
provides a hardware-assist technique to make virtualization easy and improve performance. Figure 3.10
provides an overview of Intel’s full virtualization techniques. For processor virtualization, Intel offers the
VT-x or VT-i technique. VT-x adds a privileged mode (VMX Root Mode) and some instructions to
processors. This enhancement traps all sensitive instructions in the VMM automatically. For memory
virtualization, Intel offers the EPT, which translates the virtual address to the machine’s physical
addresses to improve performance. For I/O virtualization, Intel implements VT-d and VT-c to support
this.
Since software-based virtualization techniques are complicated and incur performance overhead, Intel
provides a hardware-assist technique to make virtualization easy and improve performance. Figure 3.10
provides an overview of Intel’s full virtualization techniques. For processor virtualization, Intel offers the
VT-x or VT-i technique. VT-x adds a privileged mode (VMX Root Mode) and some instructions to
processors. This enhancement traps all sensitive instructions in the VMM automatically. For memory
virtualization, Intel offers the EPT, which translates the virtual address to the machine’s physical
addresses to improve performance. For I/O virtualization, Intel implements VT-d and VT-c to support
this.
A CPU architecture is virtualizable if it supports the ability to run the VM’s privileged and unprivileged
instructions in the CPU’s user mode while the VMM runs in supervisor mode. When the privileged
instructions including control- and behavior-sensitive instructions of a VM are exe-cuted, they are trapped
in the VMM. In this case, the VMM acts as a unified mediator for hardware access from different VMs to
guarantee the correctness and stability of the whole system. However, not all CPU architectures are
virtualizable. RISC CPU architectures can be naturally virtualized because all control- and behavior-
sensitive instructions are privileged instructions. On the contrary, x86 CPU architectures are not primarily
designed to support virtualization. This is because about 10 sensitive instructions, such as SGDT and
SMSW, are not privileged instructions. When these instruc-tions execute in virtualization, they cannot be
trapped in the VMM.
On a native UNIX-like system, a system call triggers the 80h interrupt and passes control to the OS
kernel. The interrupt handler in the kernel is then invoked to process the system call. On a para-
virtualization system such as Xen, a system call in the guest OS first triggers the 80h interrupt nor-mally.
Almost at the same time, the 82h interrupt in the hypervisor is triggered. Incidentally, control is passed on
to the hypervisor as well. When the hypervisor completes its task for the guest OS system call, it passes
control back to the guest OS kernel. Certainly, the guest OS kernel may also invoke the hypercall while
it’s running. Although paravirtualization of a CPU lets unmodified applications run in the VM, it causes a
Small Performance Penalty.
Although x86 processors are not virtualizable primarily, great effort is taken to virtualize them. They are
used widely in comparing RISC processors that the bulk of x86-based legacy systems cannot discard
easily. Virtuali-zation of x86 processors is detailed in the following sections. Intel’s VT-x technology is
an example of hardware-assisted virtualization, as shown in Figure 3.11. Intel calls the privilege level of
x86 processors the VMX Root Mode. In order to control the start and stop of a VM and allocate a
memory page to maintain the CPU state for VMs, a set of additional instructions is added. At the time of
this writing, Xen, VMware, and the Microsoft Virtual PC all implement their hypervisors by using the
VT-x technology.
Generally, hardware-assisted virtualization should have high efficiency. However, since the transition
from the hypervisor to the guest OS incurs high overhead switches between processor modes, it
sometimes cannot outperform binary translation. Hence, virtualization systems such as VMware now use
a hybrid approach, in which a few tasks are offloaded to the hardware but the rest is still done in software.
In addition, para-virtualization and hardware-assisted virtualization can be combined to improve the
performance further.
Virtual memory virtualization is similar to the virtual memory support provided by modern operating
systems. In a traditional execution environment, the operating system maintains mappings of virtual
memory to machine memory using page tables, which is a one-stage mapping from virtual memory to
machine memory. All modern x86 CPUs include a memory management unit (MMU) and a translation
lookaside buffer (TLB) to optimize virtual memory performance. However, in a virtual execution
environment, virtual memory virtualization involves sharing the physical system memory in RAM and
dynamically allocating it to the physical memory of the VMs.
That means a two-stage mapping process should be maintained by the guest OS and the VMM,
respectively: virtual memory to physical memory and physical memory to machine memory.
Furthermore, MMU virtualization should be supported, which is transparent to the guest OS. The guest
OS continues to control the mapping of virtual addresses to the physical memory addresses of VMs. But
the guest OS cannot directly access the actual machine memory. The VMM is responsible for mapping
the guest physical memory to the actual machine memory. Figure shows the two-level memory mapping
procedure Since each page table of the guest OSes has a separate page table in the VMM corresponding
to it, the VMM page table is called the shadow page table. Nested page tables add another layer of
indirection to virtual memory. The MMU already handles virtual-to-physical translations as defined by
the OS. Then the physical memory addresses are translated to machine addresses using another set of
page tables defined by the hypervisor. Since modern operating systems maintain a set of page tables for
every process, the shadow page tables will get flooded. Consequently, the perfor-mance overhead and
cost of memory will be very high.
Since the efficiency of the software shadow page table technique was too low, Intel developed a
hardware-based EPT technique to improve it, as illustrated in Figure 3.13. In addition, Intel offers a
Virtual Processor ID (VPID) to improve use of the TLB. Therefore, the performance of memory
virtualization is greatly improved. In Figure 3.13, the page tables of the guest OS and EPT are all four-
level.
When a virtual address needs to be translated, the CPU will first look for the L4 page table pointed to by
Guest CR3. Since the address in Guest CR3 is a physical address in the guest OS, the CPU needs to
convert the Guest CR3 GPA to the host physical address (HPA) using EPT. In this procedure, the CPU
will check the EPT TLB to see if the translation is there. If there is no required translation in the EPT
TLB, the CPU will look for it in the EPT. If the CPU cannot find the translation in the EPT, an EPT
violation exception will be raised. When the GPA of the L4 page table is obtained, the CPU will calculate
the GPA of the L3 page table by using the GVA and the content of the L4 page table. If the entry
corresponding to the GVA in the L4
page table is a page fault, the CPU will generate a page fault interrupt and will let the guest OS kernel
handle the interrupt. When the PGA of the L3 page table is obtained, the CPU will look for the EPT to get
the HPA of the L3 page table, as described earlier. To get the HPA corresponding to a GVA, the CPU
needs to look for the EPT five times, and each time, the memory needs to be accessed four times. There-
fore, there are 20 memory accesses in the worst case, which is still very slow. To overcome this short-
coming, Intel increased the size of the EPT TLB to decrease the number of memory accesses.
4. I/O VIRTUALIZATION
I/O virtualization involves managing the routing of I/O requests between virtual devices and the shared
physical hardware. At the time of this writing, there are three ways to implement I/O virtualization: full
device emulation, para-virtualization, and direct I/O. Full device emulation is the first approach for I/O
virtualization. Generally, this approach emulates well-known, real-world devices.
All the functions of a device or bus infrastructure, such as device enumeration, identification, interrupts,
and DMA, are replicated in software. This software is located in the VMM and acts as a virtual device.
The I/O access requests of the guest OS are trapped in the VMM which interacts with the I/O devices.
The full device emulation approach is shown in Figure.
A single hardware device can be shared by multiple VMs that run concurrently. However, software
emulation runs much slower than the hardware it emulates [10,15]. The para-virtualization method of I/O
virtualization is typically used in Xen. It is also known as the split driver model consisting of a frontend
driver and a backend driver. The frontend driver is running in Domain U and the backend dri-ver is
running in Domain 0. They interact with each other via a block of shared memory. The frontend driver
manages the I/O requests of the guest OSes and the backend driver is responsible for managing the real
I/O devices and multiplexing the I/O data of different VMs. Although para-I/O-virtualization achieves
better device performance than full device emulation, it comes with a higher CPU overhead.
Direct I/O virtualization lets the VM access devices directly. It can achieve close-to-native performance
without high CPU costs. However, current direct I/O virtualization implementations focus on networking
for mainframes. There are a lot of challenges for commodity hardware devices. For example, when a
physical device is reclaimed (required by workload migration) for later reassign-ment, it may have been
set to an arbitrary state (e.g., DMA to some arbitrary memory locations) that can function incorrectly or
even crash the whole system. Since software-based I/O virtualization requires a very high overhead of
device emulation, hardware-assisted I/O virtualization is critical. Intel VT-d supports the remapping of
I/O DMA transfers and device-generated interrupts. The architecture of VT-d provides the flexibility to
support multiple usage models that may run unmodified, special-purpose, or “virtualization-aware” guest
OSes.
Another way to help I/O virtualization is via self-virtualized I/O (SV-IO) [47]. The key idea of SV-IO is
to harness the rich resources of a multicore processor. All tasks associated with virtualizing an I/O device
are encapsulated in SV-IO. It provides virtual devices and an associated access API to VMs and a
management API to the VMM. SV-IO defines one virtual interface (VIF) for every kind of virtua-lized
I/O device, such as virtual network interfaces, virtual block devices (disk), virtual camera devices, and
others. The guest OS interacts with the VIFs via VIF device drivers. Each VIF consists of two mes-sage
queues. One is for outgoing messages to the devices and the other is for incoming messages from the
devices. In addition, each VIF has a unique ID for identifying it in SV-IO.
The VMware Workstation runs as an application. It leverages the I/O device support in guest OSes, host
OSes, and VMM to implement I/O virtualization. The application portion (VMApp) uses a driver loaded
into the host operating system (VMDriver) to establish the privileged VMM, which runs directly on the
hardware. A given physical processor is executed in either the host world or the VMM world, with the
VMDriver facilitating the transfer of control between the two worlds. The VMware Workstation employs
full device emulation to implement I/O virtualization. Figure 3.15 shows the functional blocks used in
sending and receiving packets via the emulated virtual NIC.
Virtualizing a multi-core processor is relatively more complicated than virtualizing a uni-core processor.
Though multicore processors are claimed to have higher performance by integrating multiple processor
cores in a single chip, muti-core virtualiuzation has raised some new challenges to computer architects,
compiler constructors, system designers, and application programmers. There are mainly two difficulties:
Application programs must be parallelized to use all cores fully, and software must explicitly assign tasks
to the cores, which is a very complex problem.
Concerning the first challenge, new programming models, languages, and libraries are needed to make
parallel programming easier. The second challenge has spawned research involving scheduling
algorithms and resource management policies. Yet these efforts cannot balance well among performance,
complexity, and other issues. What is worse, as technology scales, a new challenge called dynamic
heterogeneity is emerging to mix the fat CPU core and thin GPU cores on the same chip, which further
complicates the multi-core or many-core resource management. The dynamic heterogeneity of hardware
infrastructure mainly comes from less reliable transistors and increased complexity in using the
transistors.
Wells, proposed a multicore virtualization method to allow hardware designers to get an abstraction of the
low-level details of the processor cores. This technique alleviates the burden and inefficiency of
managing hardware resources by software. It is located under the ISA and remains unmodified by the
operating system or VMM (hypervisor). Figure 3.16 illustrates the technique of a software-visible VCPU
moving from one core to another and temporarily suspending execution of a VCPU when there are no
appropriate cores on which it can run.
The emerging many-core chip multiprocessors (CMPs) provides a new computing landscape. Instead of
supporting time-sharing jobs on one or a few cores, we can use the abundant cores in a space-sharing,
where single-threaded or multithreaded jobs are simultaneously assigned to separate groups of cores for
long time intervals. This idea was originally suggested by Marty and Hill [39]. To optimize for space-
shared workloads, they propose using virtual hierarchies to overlay a coherence and caching hierarchy
onto a physical processor. Unlike a fixed physical hierarchy, a virtual hierarchy can adapt to fit how the
work is space shared for improved performance and performance isolation.
Today’s many-core CMPs use a physical hierarchy of two or more cache levels that statically determine
the cache allocation and mapping. A virtual hierarchy is a cache hierarchy that can adapt to fit the
workload or mix of workloads . The hierarchy’s first level locates data blocks close to the cores needing
them for faster access, establishes a shared-cache domain, and establishes a point of coherence for faster
communication. When a miss leaves a tile, it first attempts to locate the block (or sharers) within the first
level. The first level can also pro-vide isolation between independent workloads. A miss at the L1 cache
can invoke the L2 access.
The idea is illustrated in Figure Space sharing is applied to assign three workloads to three clusters of
virtual cores: namely VM0 and VM3 for database workload, VM1 and VM2 for web server workload,
and VM4–VM7 for middleware workload. The basic assumption is that each workload runs in its own
VM. However, space sharing applies equally within a single operating system. Statically distributing the
directory among tiles can do much better, provided operating sys-tems or hypervisors carefully map
virtual pages to physical frames. Marty and Hill suggested a two-level virtual coherence and caching
hierarchy that harmonizes with the assignment of tiles to the virtual clusters of VMs.
Figure illustrates a logical view of such a virtual cluster hierarchy in two levels. Each VM operates in a
isolated fashion at the first level. This will minimize both miss access time and performance interference
with other workloads or VMs. Moreover, the shared resources of cache capacity, inter-connect links, and
miss handling are mostly isolated between VMs. The second level maintains a globally shared memory.
This facilitates dynamically repartitioning resources without costly cache flushes. Furthermore,
maintaining globally shared memory minimizes changes to existing system software and allows
virtualization features such as content-based page sharing. A virtual hierarchy adapts to space-shared
workloads like multiprogramming and server consolidation. Figure 3.17 shows a case study focused on
consolidated server workloads in a tiled architecture. This many-core mapping scheme can also optimize
for space-shared multiprogrammed workloads in a single-OS environment.
Virtualization provides flexibility in disaster recovery. When servers are virtualized, they are
containerized into VMs, independent from the underlying hardware. Therefore, an organization does not
need the same physical servers at the primary site as at its secondary disaster recovery site.
Other benefits of virtual disaster recovery include ease, efficiency and speed. Virtualized platforms
typically provide high availability in the event of a failure. Virtualization helps meet recovery time
objectives (RTOs) and recovery point objectives (RPOs), as replication is done as frequently as needed,
especially for critical systems. DR planning and failover testing is also simpler with virtualized workloads
than with a physical setup, making disaster recovery a more attainable process for organizations that may
not have the funds or resources for physical DR.
In addition, consolidating physical servers with virtualization saves money because the virtualized
workloads require less power, floor space and maintenance. However, replication can get expensive,
depending on how frequently it's done.
Adding VMs is an easy task, so organizations need to watch out for VM sprawl. VMs operating without
the knowledge of DR staff may fall through the cracks when it comes time for recovery. Sprawl is
particularly dangerous at larger companies where communication may not be as strong as at a smaller
organization with fewer employees. All organizations should have strict protocols for deploying virtual
machines.
Virtual infrastructures can be complex. In a recovery situation, that complexity can be an issue, so it's
important to have a comprehensive DR plan.
A virtual disaster recovery plan has many similarities to a traditional DR plan. An organization should:
Decide which systems and data are the most critical for recovery, and document them.
Get management support for the DR plan
Complete a risk assessment and business impact analysis to outline possible risks and their
potential impacts.
Document steps needed for recovery.
Define RTOs and RPOs.
Test the plan.
As with a traditional DR setup, you should clearly define who is involved in planning and testing, and the
role of each staff member. That extends to an actual recovery event, as staff should be ready for their
tasks during an unplanned incident.
The organization should review and test its virtual disaster recovery plan on a regular basis, especially
after any changes have been made to the production environment. Any physical systems should also be
tested. While it may be complicated to test virtual and physical systems at the same time, it's important
for the sake of business continuity.
Virtual disaster recovery, though simpler than traditional DR, should retain the same standard goals of
meeting RTOs and RPOs, and ensuring a business can continue to function in the event of an unplanned
incident.
The traditional disaster recovery process of duplicating a data center in another location is often
expensive, complicated and time-consuming. While a physical disaster recovery process typically
involves multiple steps, virtual disaster recovery can be as simple as a click of a button for failover.
Rebuilding systems in the virtual world is not necessary because they already exist in another location,
thanks to replication. However, it's important to monitor backup systems. It's easy to "set it and forget it"
in the virtual world, which is not advised and is not as much of a problem with physical systems.
Like with physical disaster recovery, the virtual disaster recovery plan should be tested. Virtual disaster
recovery, however, provides testing capabilities not available in a physical setup. It is easier to do a DR
test in the virtual world without affecting production systems, as virtualization enables an organization to
bring up servers in an isolated network for testing. In addition, deleting and recreating DR servers is
much simpler than in the physical world.
Virtual disaster recovery is possible with physical servers through physical-to-virtual backup. This
process creates virtual backups of physical systems for recovery purposes.
For the most comprehensive data protection, experts advise having an offline copy of data. While virtual
disaster recovery vendors provide capabilities to protect against cyberattacks such as ransomware,
physical tape storage is the one true offline option that guarantees data is safe during an attack.
With ransom ware now a constant threat to business, virtual disaster recovery vendors are including
capabilities specific to recovering from an attack. Through point-in-time copies, an organization can roll
back its data recovery to just before the attack hit.
The convergence of backup and DR is a major trend in data protection. One example is instant
recovery also called recovery in place -- which allows a backup snapshot of a VM to run temporarily on
secondary storage following a disaster. This process significantly reduces RTOs.
Hyper-convergence, which combines storage, compute and virtualization, is another major trend. As a
result, hyper-converged backup and recovery has taken off, with newer vendors such as Cohesity and
Rubrik leading the charge. Their cloud-based hyper-converged backup and recovery systems are
accessible to smaller organizations, thanks to lower cost and complexity.
These newer vendors are pushing the more established players to do more with their storage and recovery
capabilities.
Major vendors
There are several data protection vendors that offer comprehensive virtual backup and disaster recovery.
Some key players include:
By using a virtualized environment you don’t have to worry about having completely redundant
hardware. Instead you can use almost any x86 platform as a backup solution, this allows you to save
money by repurposing existing hardware and also gives your company more agility when it comes to
hardware failure as almost any virtual server can be restarted on different hardware.
By having your system completely virtualized each of your server’s files are encapsulated in a single
image file. An image is basically a single file that contains all of server’s files, including system files,
programs, and data; all in one location. By having these images it makes managing your systems easy and
backups become as simple as duplicating the image file and restores are simplified to simply mounting
the image on a new server.
A key benefit to virtualization is reducing the hardware needed by utilizing your existing hardware more
efficiently. This frees up systems that can now be used to run other tasks or be used as a hardware
redundancy. This mixed with features like VMware’s High Availability, which restarts a virtual machine
on a different server when the original hardware fails, or for a more robust disaster recovery plan you can
use Fault Tolerance, which keeps both servers in sync with each other leading to zero downtime if a
server should fail.
Easily copy system data to recovery site
Having an offsite backup is a huge advantage if something were to happen to your specific location,
whether it be a natural disaster, a power outage, or a water pipe bursting, it is nice to have all your
information at an offsite location. Virtualization makes this easy by easily copying each virtual machines
image to the offsite location and with the easy customizable automation process, it doesn’t add any more
strain or man hours to the IT department.
Cloud computing consist of four different layers, each layer having their own functionalities, moreover
we can able to know the services provided by the cloud computing are also mentioned in the below
figure. Let us have a look to all the four layers with the help of diagram.
Hardware Layer
Physical resources of the cloud are managed in this layer, including physical servers, routers,
switches, power and cooling systems. In practice, the data centers are place where hardware layer are
implemented. A data center usually contains thousands of servers that are organized in racks and
interconnected through switches, routers or other fabrics. Typical issues at hardware layer include
hardware configuration, fault tolerance, traffic management, power and cooling resource management.
Infrastructure Layer
The basic purpose of infrastructure is to delivering basic storage and compute capabilities as
standardized services over the internet. It is also known as the virtualization layer. The infrastructure
layer creates cluster of storage and computing resources by partitioning the physical resources using
virtualization technologies such as Xen, KVM and VMware. This layer is an essential component of
cloud computing, since many key features, such as dynamic resource assignment, are only made available
through virtualization technologies .
The services provided by this layer to the consumer is to storage, networks, and other fundamental
computing resources, where different arbitrary software can be run or deploy by the consumer, which can
include operating systems and applications. The underlying cloud infrastructure are not manage by the
consumer but has control over operating systems, storage, and deployed applications and possibly limited
control of select networking components (e.g., host firewalls). IaaS refers to on-demand provisioning of
infrastructural resources, usually in terms of VMs. The cloud owner who offers IaaS is called an IaaS
provider. An example of IaaS provider includes Amazon EC2, GoGrid and Flexiscale.
Platform layer
It is built on top of the infrastructure layer. It consists of operating systems and application frameworks.
The main purpose of the platform layer is to minimize the burden of deploying applications directly into
VM containers. For example, Google App Engine operates at the platform layer to provide API support
for implementing storage, database and business logic of typical web applications.
Application layer At the highest level of the hierarchy, the application layer consists of the actual cloud
applications. Different from traditional applications, cloud applications can leverage the automatic-
scaling feature to achieve better performance, availability and lower operating cost
The advantages are:
We only need to understand the layers beneath the one we are working on;
Each layer is replaceable by an equivalent implementation, with no impact on the other layers;
Layers are optimal candidates for standardisation;
A layer can be used by several different higher-level layers.
The disadvantages are:
Layers can not encapsulate everything (a field that is added to the UI, most likely also needs to be
added to the DB);
Extra layers can harm performance, especially if in different tiers.
The 60s and 70s
Although software development started during the 50s, it was during the 60s and 70s that it was
effectively born as we know it today, as the activity of building applications that can be delivered,
deployed and used by others that are not the developers themselves.
At this point, however, applications were very different than today. There was no GUI (which only came
into existence in the early 90s, maybe late 80s), all applications were usable only through a CLI,
displayed in a dumb terminal who would just transmit whatever the user typed to the application which
was, most likely, being used from the same computer.
Applications were quite simple so weren’t built with layering in mind and they were deployed and used
on one computer making it effectively a one-tier application, although at some point the dumb client
might even have been remote. While these applications were very simple, they were not scalable, for
example, if we needed to update the software to a new version, we would have to do it on every computer
that would have the application installed.
User Interface (Presentation): The user interface, be it a web page, a CLI or a native desktop
application;
A native Windows application as the client (rich client), which the common user would use on
his desktop computer, that would communicate with the server in order to actually make things
happen. The client would be in charge of the application flow and user input validation;
Business logic (Domain): The logic that is the reason why the application exists;
An application server, which would contain the business logic and would receive requests from
the native client, act on them and persist the data to the data storage;
Data source: The data persistence mechanism (DB), or communication with other applications.
A database server, which would be used by the application server for the persistence of data.
With this shift in usability context, layering started to be a practise, although it only started to be a
common widespread practice during the 1990s (Fowler 2002) with the rise of client/server systems. This
was effectively a two-tier application, where the client would be a rich client application used as the
application interface, and the server would have the business logic and the data source.
This architecture pattern solves the scalability problem, as several users could use the application
independently, we would just need another desktop computer, install the client application in it and that
was it. However, if we would have a few hundred, or even just a few tenth of clients, and we would want
to update the application it would be a highly complex operation as we would have to update the clients
one by one.
Roughly between 1995 and 2005, with the generalised shift to a cloud context, the increase in application
users, application complexity and infrastructure complexity we end up seeing an evolution of the layering
scheme, where a typical implementation of this layering could be:
A native browser application, rendering and running the user interface, sending requests to the
server application;
An application server, containing the presentation layer, the application layer, the domain layer,
and the persistence layer;
A database server, which would be used by the application server for the persistence of data.
This is a three-tier architecture pattern, also known as n-tier. It is a scalable solution and solves the
problem of updating the clients as the user interface lives and is compiled on the server, although it is
rendered and ran on the client browser.
Layering after the early 2000s
In 2003, Eric Evans published his emblematic book Domain-Driven Design: Tackling Complexity in the
Heart of Software. Among the many key concepts published in that book, there was also a vision for the
layering of a software system:
User Interface
Responsible for drawing the screens the users use to interact with the application and translating the
user’s inputs into application commands. It is important to note that the “users” can be human but can
also be other applications, which corresponds entirely to the Boundary objects in the EBI Architecture by
Ivar Jacobson (more on this in a later post);
Application Layer
Orchestrates Domain objects to perform tasks required by the users. It does not contain business logic.
This relates to the Interactors in the EBIArchitecture by Ivar Jacobson, except that Jacobson’s interactors
were any object that was not related to the UI nor an Entity;
Domain Layer
This is the layer that contains all business logic, the Entities, Events and any other object type that
contains Business Logic. It obviously relates to the Entity object type of EBI. This is the heart of the
system;
Infrastructure
The technical capabilities that support the layers above, ie. persistence or messaging.
The NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud
provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an
organization) that participates in a transaction or process and/or performs tasks in cloud computing. Table
1 briefly lists the actors defined in the NIST cloud computing reference architecture. The general
activities of the actors are discussed in the remainder of this section, while the details of the architectural
elements are discussed.
Figure 2 illustrates the interactions among the actors. A cloud consumer may request cloud services from
a cloud provider directly or via a cloud broker. A cloud auditor conducts independent audits and may
contact the others to collect necessary information. The details will be discussed in the following sections
and presented in increasing level of details in successive diagrams.
Fig. Actors in cloud computing
Example Usage Scenario 3: For a cloud service, a cloud auditor conducts independent assessments of the
operation and security of the cloud service implementation. The audit may involve interactions with both
the Cloud Consumer and the Cloud Provider.
SaaS applications in the cloud and made accessible via a network to the SaaS consumers. The
consumers of SaaS can be organizations that provide their members with access to software applications,
end users who directly use software applications, or software application administrators who configure
applications for end users. SaaS consumers can be billed based on the number of end users, the time of
use, the network bandwidth consumed, the amount of data stored or duration of stored data.
Cloud consumers of PaaS can employ the tools and execution resources provided by cloud
providers to develop, test, deploy and manage the applications hosted in a cloud environment. PaaS
consumers can be application developers who design and implement application software, application
testers who run and test applications in cloud-based environments, application deployers who publish
applications into the cloud, and application administrators who configure and monitor application
performance on a platform.
PaaS consumers can be billed according to, processing, database storage and network resources
consumed by the PaaS application, and the duration of the platform usage. Consumers of IaaS have
access to virtual computers, network-accessible storage, network infrastructure components, and other
fundamental computing resources on which they can deploy and run arbitrary software. The consumers of
IaaS can be system developers, system administrators and IT managers who are interested in creating,
installing, managing and monitoring services for IT infrastructure operations. IaaS consumers are
provisioned with the capabilities to access these computing resources, and are billed according to the
amount or duration of the resources consumed, such as CPU hours used by virtual computers, volume and
duration of data stored, network bandwidth consumed, number of IP addresses used for certain intervals.
Cloud Provider A cloud provider is a person, an organization; it is the entity responsible for
making a service available to interested parties. A Cloud Provider acquires and manages the computing
infrastructure required for providing the services, runs the cloud software that provides the services, and
makes arrangement to deliver the cloud services to the Cloud Consumers through network access. For
Software as a Service, the cloud provider deploys, configures, maintains and updates the operation of the
software applications on a cloud infrastructure so that the services are provisioned at the expected service
levels to cloud consumers. The provider of SaaS assumes most of the responsibilities in managing and
controlling the applications and the infrastructure, while the cloud consumers have limited administrative
control of the applications. For PaaS, the Cloud Provider manages the computing infrastructure for the
platform and runs the cloud software that provides the components of the platform, such as runtime
software execution stack, databases, and other middleware components. The PaaS Cloud Provider
typically also supports the development, deployment and management process of the PaaS Cloud
Consumer by providing tools such as integrated development environments (IDEs), development version
of cloud software, software development kits (SDKs), deployment and management tools. The PaaS
Cloud Consumer has control over the applications and possibly some the hosting environment settings,
but has no or limited access to the infrastructure underlying the platform such as network, servers,
operating systems (OS), or storage.
For IaaS, the Cloud Provider acquires the physical computing resources underlying the service,
including the servers, networks, storage and hosting infrastructure. The Cloud Provider runs the cloud
software necessary to makes computing resources available to the IaaS Cloud Consumer through a set of
service interfaces and computing resource abstractions, such as virtual machines and virtual network
interfaces. The IaaS Cloud Consumer in turn uses these computing resources, such as a virtual computer,
for their fundamental computing needs Compared to SaaS and PaaS Cloud Consumers, an IaaS Cloud
Consumer has access to more fundamental forms of computing resources and thus has more control over
the more software components in an application stack, including the OS and network. The IaaS Cloud
Provider, on the other hand, has control over the physical hardware and cloud software that makes the
provisioning of these infrastructure services possible, for example, the physical servers, network
equipments, storage devices, host OS and hypervisors for virtualization.
Cloud computing spans a range of classifications, types and architecture models. The transformative
networked computing model can be categorized into three major types:
Public cloud
Private cloud
Hybrid cloud
Public Cloud: the cloud services are exposed to the public and can be used by anyone. Virtualization is
typically used to build the cloud services that are offered to the public. An example of a public cloud is
Amazon Web Services (AWS).
The public cloud refers to the cloud computing model with which the IT services are delivered
across the Internet. The service may be free, freemium, or subscription-based, charged based on the
computing resources consumed. The computing functionality may range from common services such as
email, apps and storage to the enterprise-grade OS platform or infrastructure environments used for
software development and testing.
The cloud vendor is responsible for developing, managing and maintaining the pool of computing
resources shared between multiple tenants from across the network. The defining features of a public
cloud solution include high elasticity and scalability for IT-enabled services delivered at a low cost
subscription-based pricing tier. As the most popular model of cloud computing services, the public cloud
offers vast choices in terms of solutions and computing resources to address the growing needs of
organizations of all sizes and verticals.
When to use the public cloud?
The public cloud is most suitable for situations with these needs:
Predictable computing needs, such as communication services for a specific number of users.
Apps and services necessary to perform IT and business operations.
Additional resource requirements to address varying peak demands.
Software development and test environments.
Advantages of public cloud
No investments required to deploy and maintain the IT infrastructure.
High scalability and flexibility to meet unpredictable workload demands.
Reduced complexity and requirements on IT expertise as the cloud vendor is responsible to
manage the infrastructure.
Flexible pricing options based on different SLA offerings.
The cost agility allows organizations to follow lean growth strategies and focus their investments
on innovation projects.
Limitations of public cloud
The total cost of ownership (TCO) can rise exponentially for large-scale usage, specifically for
mid size to large enterprises.
Not the most viable solution for security and availability sensitive mission-critical IT workloads
Low visibility and control into the infrastructure, which may not suffice to meet regulatory
Compliance.
Private Cloud: the cloud services used by a single organization, which are not exposed to the public. A
private cloud resides inside the organization and must be behind a firewall, so only the organization has
access to it and can manage it.
The private cloud refers to the cloud solution dedicated for use by a single organization. The data
center resources may be located on-premise or operated by a third-party vendor off-site. The computing
resources are isolated and delivered via a secure private network, and not shared with other customers.
Private cloud is customizable to meet the unique business and security needs of the organization.
With greater visibility and control into the infrastructure, organizations can operate compliance-sensitive
IT workloads without compromising on the security and performance previously only achieved with
dedicated on-premise data centers.
When to use the private cloud?
The private cloud is often suitable for:
Highly-regulated industries and government agencies.
Technology companies that require strong control and security over their IT workloads and the
underlying infrastructure.
Large enterprises that require advanced data center technologies to operate efficiently and cost-
Effectively.
Organizations that can afford to invest in high performance and availability technologies.
Advantages of private cloud
Dedicated and secure environments that cannot be accessed by other organizations.
Compliance to stringent regulations as organizations can run protocols, configurations and
measures to customize security based on unique workload requirements.
High scalability and efficiency to meet unpredictable demands without compromising on security
and performance.
High SLA performance and efficiency.Flexibility to transform the infrastructure based on ever-
changing business and IT needs of the Organization.
Limitations of private cloud
Expensive solution with a relatively high total cost of ownership as compared to public cloud
alternatives for short-term use cases.
Mobile users may have limited access to the private cloud considering the high security measures
in place
The infrastructure may not offer high scalability to meet unpredictable demands if the cloud data
center is limited to on-premise computing resources
Hybrid Cloud: the cloud services can be distributed among public and private clouds, where sensitive
applications are kept inside the organization’s network (by using a private cloud), whereas other services
can be hosted outside the organization’s network (by using a public cloud). Users can them
interchangeably use private as well as public cloud services in every day operations.
The hybrid cloud
The hybrid cloud refers to the cloud infrastructure environment that is a mix of public and private cloud
solutions. The resources are typically orchestrated as an integrated infrastructure environment. Apps and
data workloads can share the resources between public and private cloud deployment based on
organizational business and technical policies around security, performance, scalability, cost and
efficiency, among other aspects.
For instance, organizations can use private cloud environments for their IT workloads and
complement the infrastructure with public cloud resources to accommodate occasional spikes in network
traffic. As a result, access to additional computing capacity does not require the high CapEx of a private
cloud environment but is delivered as a short-term IT service via a public cloud solution. The
environment itself is seamlessly integrated to ensure optimum performance and scalability to changing
business needs.
When to use the hybrid cloud
Here’s who the hybrid cloud might suit best:
Organizations serving multiple verticals facing different IT security, regulatory and performance
Requirements.
Optimizing cloud investments without compromising on the value proposition of either public or
private cloud technologies
Improving security on existing cloud solutions such as SaaS offerings that must be delivered via
secure private networks.
Strategically approaching cloud investments to continuously switch and tradeoff between the best
cloud service delivery model available in the market Advantages of hybrid cloud.
Flexible policy-driven deployment to distribute workloads across public and private infrastructure
environments based on security, performance and cost requirements.
Scalability of public cloud environments is achieved without exposing sensitive IT workloads to
the inherent security risks.
High reliability as the services are distributed across multiple data centers across public and
private data centers.
Improved security posture as sensitive IT workloads run on dedicated resources in private clouds
while regular workloads are spread across inexpensive public cloud infrastructure to tradeoff for
cost investments
Limitations of hybrid cloud
It can get expensive.
Strong compatibility and integration is required between cloud infrastructure spanning different
locations and categories. This is a limitation with public cloud deployments, for which
organizations lack direct control over the infrastructure.
Additional infrastructure complexity is introduced as organizations operate and manage an
evolving mix of private and public cloud architecture.
IaaS Examples
AWS (Amazon Web Services)
Google Compute
Microsoft Azure
What is Platform as a Service (PaaS)
With platform-as-a-service or PaaS, the vendor gives its clients or customers the same server space and
flexibility, but with some additional tools to help build or customize applications more rapidly.
Furthermore, a PaaS vendor handles things like runtime, middleware, operating system, virtualization and
storage — although the client or customer manages their own applications and data.
PaaS describes [an offering made up of] both the infrastructure and software for building digital
applications. PaaS providers generally specialize in creating certain types of applications, like
eCommerce applications for example, Vogelpohl told CMSWire. He went on to explain how some PaaS
providers offer dedicated or virtualized hardware, and some hide the infrastructure layer from the
customer for ease of use. “PaaS is generally a good fit for organizations building a particular type of
application which would benefit from the additional features and management offered by the PaaS for
that type of application. PaaS can require a high degree of technical proficiency; however, PaaS providers
often include products and features that make it easier for non-technical customers to create digital
applications.
PaaS Examples
Google App Engine
Heroku
OutSystems
What is Software as a Service (SaaS)
Software-as-a-service basically handles all the technical stuff while at the same time providing an
application (or a suite of applications) that the client or customer can use to launch projects immediately
or at least, faster than they would do with an IaaS or PaaS solution, both of which require more technical
input from the client or customer. Coincidentally, most, if not all, SaaS vendors use IaaS or PaaS
Solutions to support their suite of applications, handling the technical elements so their customers don’t
have to. Whiteside told CMSWire that SaaS is the least hands-on of the three cloud computing solutions
and is good if you don't have developer resources but need to provide capabilities to end users. "You
won't have visibility or control of your infrastructure and are restricted by the capabilities and
configuration of the software tools. This can be restrictive when you want to integrate with other systems
you may own and run, but does allow you to get up and running quickly.
SaaS Examples
Google G Suite
Microsoft Office 365
Mailchimp
Private cloud storage service is provided by in-house storage resources deployed as a dedicated
environment protected behind an organization's firewall. Internally hosted private cloud storage
implementations emulate some of the features of commercially available public cloud services, providing
easy access and allocation of storage resources for business users, as well as object storage protocols.
Private clouds are appropriate for users who need customization and more control over their data, or who
have stringent data security or regulatory requirements.
Hybrid cloud storage is a mix of private cloud storage and third-party public cloud storage services with
a layer of orchestration management to integrate operationally the two platforms. The model offers
businesses flexibility and more data deployment options. An organization might, for example, store
actively used and structured data in an on-premises cloud, and unstructured and archival data in a public
cloud. A hybrid environment also makes it easier to handle seasonal or unanticipated spikes in data
creation or access by "cloud bursting" to the external storage service and avoiding having to add in-house
storage resources. In recent years, there has been increased adoption of the hybrid cloud model. Despite
its benefits, a hybrid cloud presents technical, business and management challenges. For example, private
workloads must access and interact with public cloud storage providers, so compatibility and reliable,
ample network connectivity are very important factors. An enterprise-level cloud storage system should
be scalable to suit current needs, accessible from anywhere and application-agnostic.
Cloud storage characteristics
Cloud storage is based on a virtualized infrastructure with accessible interfaces, near-instant
elasticity and scalability, multi-tenancy and metered resources. Cloud-based data is stored in logical pools
across disparate, commodity servers located on premises or in a data center managed by a third-party
cloud provider. Using the RESTful API, an object storage protocol stores a file and its associated
metadata as a single object and assigns it an ID number. When content needs to be retrieved, the user
presents the ID to the system and the content is assembled with all its metadata, authentication and
security.
In recent years, object storage vendors have added file system functions and capabilities to
their object storage software and hardware largely because object storage was not being adopted fast
enough. For example, a cloud storage gateway can provide a file system emulation front end to their
object storage; that arrangement often allows applications to access the data without actually supporting
an object storage protocol. All backup applications use the object storage protocol, which is one of the
reasons why online backup to a cloud service was the initial successful application for cloud storage.
Most commercial cloud storage services use vast numbers of hard drive storage systems mounted in
servers that are linked by a mesh-like network architecture. Service providers have also added high-
performance layers to their virtual storage offerings, typically comprising some type of solid state drives
(SSDs). High-performance clouds storage is generally most effective if the servers and applications
accessing the storage are also resident in the cloud environment. Companies that use public cloud storage
need to have the appropriate network access to the hosting service.
Benefits of cloud storage
Cloud storage provides many benefits that result in cost-savings and greater convenience for its users.
These benefits include:
Pay for what is used. With a cloud storage service, customers only pay for the storage they actually use
so there's no need for big capital expenses. While cloud storage costs are recurring rather than a one-time
purchase, they are so low that even as an ongoing expense they may still be less than the cost of
maintaining an in-house system.
Utility billing. Since customers only pay for the capacity they're using, cloud storage costs can decrease
as usage drops. This is in stark contrast to using an in-house storage system, which will likely be
overconfigured to handle anticipated growth; so, a company will pay for more than it needs initially, and
the cost of the storage will never decrease.
Global availability. Cloud storage is typically available from any system anywhere at any time; one does
not have to worry about operating system capability or complex allocation processes.
Ease of use. Cloud storage is easier to access and use, so developers, software testers and business users
can get up and running quickly without have to wait for IT to allocate and configure storage resources.
Offsite security. By its very nature, public cloud storage offers a way to move copies of data to a remote
site for backup and security purposes. Again, this represents a significant cost-savings when compared to
a company maintaining its own remote facility.
An in-house cloud storage system can offer some of the above ease-of-use features of a public cloud
service, but it will lack much of the storage capacity flexibility of a public service. Some hardware
vendors are trying to address this issue by allowing their customers to turn on and off capacity that has
already been installed in their arrays.
Drawbacks of cloud storage
There are some shortcomings to cloud storage -- particularly the public services -- that may deter
companies from using these services or limit how they use them.
Security is the single most cited factor that may make a company reluctant -- or at least cautious -- about
using public cloud storage. The concern is that once data leaves a company's premises, the company no
longer has control over how it's handled and stored. There are also concerns about storing data that is
regulated by specific compliance laws. Cloud providers address these concerns by making public the
steps they take to protect their customers' data, such as encryption for data in flight and at rest, physical
security and storing data at multiple locations.
Access to data stored in the cloud may also be an issue and could significantly increase the cost of using
cloud storage. A company may need to upgrade its connection to the cloud storage service to handle the
volume of data it expects to transmit; the monthly cost of an optical link can run into the thousands of
dollars.
A company may run into performance issues if its in-house applications need to access the data it has
stored in the cloud. In those cases, it will likely require either moving the servers and applications into the
same cloud or bringing the necessary data back in-house.
If a company requires a lot of cloud storage capacity and frequently moves its data back and forth, the
monthly costs can be quite high. Compared to deploying the storage in-house, the ongoing costs could
eventually surpass the cost of implementing and maintaining the on-premises system.
Cloud storage pros/cons
Advantages of private cloud storage include high reliability and security. But this approach to cloud
storage provides limited scalability and requires on-site resources and maintenance. Public cloud storage
offers high scalability and a pay-as-you-go model with no need for an on-premises storage infrastructure.
However, performance and security measures can vary by service provider. In addition, reliability
depends on service provider availability and internet connectivity.
Cloud storage and data migration
Migrating data from one cloud storage service to another is an often-overlooked area. Cloud
migrations have become more common due to market consolidation and price competition.Businesses
tend to switch cloud storage providers either because of price -- which must be substantially cheaper to
justify the cost and work of switching -- or when a cloud provider goes out of business or stops providing
storage services. With public cloud providers, it is usually just as easy to copy data out of the cloud as it
was to upload data to it. Available bandwidth can become a major issue, however. In addition, many
providers charge extra to download data.
To mitigate concerns about a provider going out of business, you could copy data to more than
one cloud storage service. While this increases cloud storage costs, it is often still cheaper than
maintaining data locally.
Should that not be the case, or if bandwidth becomes a major sticking point, find out if the original and
the new cloud storage service have a direct-connect relationship. This approach also removes the need of
cloud storage customers to use their data centers as a bridge or go-between such as using an on-premises
cache to facilitate the transfer of data between the two cloud storage providers.
3.6 Storage-as-a-Service
Storage as a service (SaaS) is a cloud business model in which a company leases or rents its
storage infrastructure to another company or individuals to store data. Small companies and individuals
often find this to be a convenient methodology for managing backups, and providing cost savings in
personnel, hardware and physical space. As an alternative to storing magnetic tapes offsite in a vault, IT
administrators are meeting their storage and backup needs by service level agreements (SLAs) with an
SaaS provider, usually on a cost-per-gigabyte-stored and cost-per-data-transferred basis. The client
transfers the data meant for storage to the service provider on a set schedule over the SaaS provider’s
wide area network or over the Internet.
The storage provider provides the client with the software required to access their stored data.
Clients use the software to perform standard tasks associated with storage, including data transfers and
data backups. Corrupted or lost company data can easily be restored. Storage as a service is prevalent
among small to mid-sized businesses, as no initial budget is required to set up hard drives, servers and IT
staff. SaaS is also marketed as an excellent technique to mitigate risks in disaster recovery by providing
long-term data storage and enhancing business stability. Storage as a service is fast becoming the method
of choice to all small and medium scale businesses. This is because storing files remotely rather than
locally boasts an array of advantages for professional users.
Who uses storage as a service and why?
Storage as a Service is usually used by small or mid-sized companies that lack the budget to
implement and maintain their own storage infrastructure.
Organizations use storage as a service to mitigate risks in disaster recovery, provide long-term retention
and enhance business continuity and availability.
How storage as a service works?
The company would sign a service level agreement (SLA) whereby the STaaS provider agreed to
rent storage space on a cost-per-gigabyte-stored and cost-per-data-transfer basis and the company's data
would be automatically transferred at the specified time over the storage provider's proprietary WAN or
the Internet. If the company ever loses its data, the network administrator could contact the STaaS
provider and request a copy of the data.
Advantage of Storage as Services
Cost– factually speaking, backing up data isn’t always cheap, especially when take the cost of equipment
into account. Additionally, there is the cost of the time it takes to manually complete routine backups.
Storage as a service reduces much of the cost associated with traditional backup methods, providing
ample storage space in the cloud for a low monthly fee.
Invisibility – Storage as a service is invisible, as no physical presence of it is seen in its deployment and
so it doesn’t take up valuable office space.
Security – In this service type, data is encrypted both during transmission and while at rest, ensuring no
unauthorized user access to files.
Automation – Storage as a service makes the tedious process of backing up easy to accomplish through
automation. Users can simply select what and when they want to backup, and the service does all the rest.
Accessibility – By going for storage as a service, users can access data from smart phones, netbooks to
desktops and so on.
Syncing – Syncing ensures your files are automatically updated across all of your devices. This way, the
latest version of a file a user saved on their desktop is available on your smart phone.
Sharing – Online storage services allow the users to easily share data with just a few clicks
Collaboration – Cloud storage services are also ideal for collaboration purposes. They allow multiple
people to edit and collaborate on a single file or document. Thus, with this feature users need not worry
about tracking the latest version or who has made what changes.
Data Protection – By storing data on cloud storage services, data is well protected by all kind of
catastrophes such as floods, earthquakes and human errors.
Disaster Recovery – as said earlier, data stored in cloud is not only protected from catastrophes by
having the same copy at several places, but can also favor disaster recovery to ensure business continuity.
Cloud Storage is a service where data is remotely maintained, managed, and backed up. The
service allows the users to store files online, so that they can access them from any location via the
Internet. According to a recent survey conducted with more than 800 business decision makers and users
worldwide, the number of organizations gaining competitive advantage through high cloud adoption has
almost doubled in the last few years and by 2017, the public cloud services market is predicted to exceed
$244 billion. Now, let’s look into some of the advantages and disadvantages of Cloud Storage.
Usability: All cloud storage services reviewed in this topic have desktop folders for Mac’s and PC’s.
This allows users to drag and drop files between the cloud storage and their local storage.
Bandwidth: You can avoid emailing files to individuals and instead send a web link to recipients through
your email.
Accessibility: Stored files can be accessed from anywhere via Internet connection.
Disaster Recovery: It is highly recommended that businesses have an emergency backup plan ready in
the case of an emergency. Cloud storage can be used as a back‐up plan by businesses by providing a
second copy of important files. These files are stored at a remote location and can be accessed through an
internet connection.
Cost Savings: Businesses and organizations can often reduce annual operating costs by using cloud
storage; cloud storage costs about 3 cents per gigabyte to store data internally. Users can see additional
cost savings because it does not require internal power to store information remotely.
Usability: Be careful when using drag/drop to move a document into the cloud storage folder. This will
permanently move your document from its original folder to the cloud storage location. Do a copy and
paste instead of drag/drop if you want to retain the document’s original location in addition to moving a
copy onto the cloud storage folder.
Bandwidth: Several cloud storage services have a specific bandwidth allowance. If an organization
surpasses the given allowance, the additional charges could be significant. However, some providers
allow unlimited bandwidth. This is a factor that companies should consider when looking at a cloud
storage provider.
Accessibility: If you have no internet connection, you have no access to your data.
Data Security: There are concerns with the safety and privacy of important data stored remotely. The
possibility of private data commingling with other organizations makes some businesses uneasy. If you
want to know more about those issues that govern data security and privacy, here is an interesting article
on the recent privacy debates.
Software: If you want to be able to manipulate your files locally through multiple devices, you’ll need to
download the service on all devices.
Many cloud storage providers offer a free plan for those who require the minimum out of their
service. Cloud storage providers offer much data security for business users.
If we are to compare the cloud storage providers, then all will look similar at first glance. Hence,
most of them compare the providers based on the prices and decide which one to select. The
features that you should look in the cloud storage providers include collaboration features,
usability, and security provided by the company.
Also, the support provided by these providers must also be considered. While selecting the cloud
storage provider, you must consider your platform for use like Windows, Mac, iPhones, Androids,
BlackBerry phones or mix. Big tech players have their own platforms for cloud storage as
Windows have OneDrive and Mac has iCloud.
File
Suitable for Storag
loud Storage Uploa
Best For business e space Platform Price:
Providers d
size plans
Limit
Dropbox Light data Freelancers, 2GB, Windows, Unlimi Plans for Individuals
users. solo 1TB, Mac OS, ted starts at $8.25/ month.
workers, 2TB, Linux, Plans for teams start at
teams, & 3TB, Android, $12.50/user/month
businesses Till iOS,
of any size. Unlimi Windows
ted. phone.
Google Drive Teams & Individuals 15GB, Windows, 5TB Free for 15GB.
Collaboratio & Teams. 100GB Mac OS, 200GB: $2.99 per month.
n , Android, 2TB: $9.99/month.
200GB iOS. 30TB: $299.99/month.
..
Till
Unlimi
ted.
Charging Model
1. Storage
2. Number of requests
3. Storage Management Pricing
4. Add metadata to see usage metrics.
Transfer Acceleration - Enables fast, easy and secure transfers of your files over long distances between
your end users and an S3 bucket.
Transfer acceleration takes advantages of Amazon cloud front’s globally distributed edge locations. As
the data arrives at an edge location, the data is routed to Amazon S3 over an optimized network path.
Think of transfer acceleration as a combination of S3 + CDN natively supported by this Service.
Basically, every user ends up going through the closest possible edge location which in turn talks to the
actual S3 bucket.
Recap - S3
S3 Storage Classes
1. S3 (Durable, immediately available and frequently accessed)
2. S3 – IA (durable, immediately available, infrequently accessed.
3. S3 Reduced Redundancy Storage (Used for data that is easily reproducible, such thumbnails)
Core fundamentals of S3 objects
1. Key: Name of the object These are store in alphabetic order
2. Value: The data itself Version ID: The version of the object
3. Meta Data: The various attributes of the data
Sub resources
1. ACL: Access control lists
2. Torrent: bit Torrent protocol
Cross region Replication
This basically means that if you have this turned on then for a bucket AWS will automatically make a
bucket available across 2 or more regions.
Example of an Amazon S3 Hosted Website Architecture
Securing your S3 Buckets
1. In-transit
2. SSL/TLS (using HTTPS)
3. At Rest
4. Server Side Encryption
5. S3 Managed keys – SSE-S3
6. Server side Encryption
7. Key Management Service – Managed Key
8. SST – KMS
9. Client Side Encryption
Advantages of AWS S3 Service
Scalability on Demand
If you want your application’s scalability varying according to the change in traffic, then AWS S3
is a very good option.
Scaling up or down is just mouse-clicks away when you use other attractive features of AWS.
Content Storage and Distribution
S3 in Amazon could be used as the foundation for a Content Delivery Network. Because Amazon
S3 is developed for content storage and distribution.
Big Data and Analytics on Amazon S3
Amazon QuickSight UI can be connected with Amazon S3, and then large amounts of data can be
analyzed with it.
Backup and Archive
Whether you need timely backups of your website, or store static files for once, or store versions
of files you are currently working on. S3 in Amazon has got you covered.
Disaster Recovery
Storing data in multiple availability zones in a region gives the user the flexibility to recover files,
which are lost, as soon as possible. Also, the cross-region replication technology can be used to
store in any number of Amazon’s worldwide data centers.
UNIT IV RESOURCE MANAGEMENT AND SECURITY IN CLOUD
4.5 Inter-cloud Resource Management
This section characterizes the various cloud service models and their extensions. The cloud service trends
are outlined. Cloud resource management and intercloud resource exchange schemes are reviewed. We
will discuss the defense of cloud resources against network threats
4.5.1 Extended Cloud Computing Services
There are six layers of cloud services, ranging from hardware, network, and collocation to infrastructure,
platform, and software applications. We already introduced the top three service layers as SaaS, PaaS,
and IaaS, respectively. The cloud platform provides PaaS, which sits on top of the IaaS infrastructure.
The top layer offers SaaS. These must be implemented on the cloud platforms provided. Although the
three basic models are dissimilar in usage, they are built one on top of another. The implication is that
one cannot launch SaaS applications with a cloud platform. The cloud platform cannot be built if compute
and storage infrastructures are not there.
Fig A stack of six layers of cloud services
The bottom three layers are more related to physical requirements. The bottommost layer
providesHardware as a Service (HaaS). The next layer is for interconnecting all the hardware
components, and is simply called Network as a Service (NaaS). Virtual LANs fall within the scope of
NaaS. The next layer up offers Location as a Service (LaaS), which provides a collocation service to
house, power, and secure all the physical hardware and network resources. Some authors say this layer
provides Security as a Service (“SaaS”). The cloud infrastructure layer can be further subdivided as Data
as a Service (DaaS)and Communication as a Service (CaaS) in addition to compute and storage in IaaS.
We will examine commercial trends in cloud services in subsequent sections. Here we will mainly cover
the top three layers with some success stories of cloud computing. cloud players are divided into three
classes: (1) cloud service providers and IT administrators, (2) software developers or vendors, and (3) end
users or business users. These cloud players vary in their roles under the IaaS, PaaS, and SaaS models.
The table entries distinguish the three cloud models as viewed by different players. From the software
vendors’ perspective, application performance on a given cloud platform is most important. From the
providers’ perspective, cloud infrastructure performance is the primary concern. From the end users’
perspective, the quality of services, including security, is the most important.
Table Cloud Differences in Perspectives of Providers, Vendors, and Users
Fig Three cases of cloud resource provisioning without elasticity: (a) heavy waste due to overprovisioning, (b)
underprovisioning and (c) under- and then overprovisioning C
Three resource-provisioning methods are presented in the following sections. The demand-driven method
provides static resources and has been used in grid computing for many years. The event-driven method
is based on predicted workload by time. The popularity-driven method is based on Internet traffic
monitored. We characterize these resource provisioning methods as follows.
4.5.2.3 Demand-Driven Resource Provisioning
This method adds or removes computing instances based on the current utilization level of the
allocated resources. The demand-driven method automatically allocates two Xeon processors for the user
application, when the user was using one Xeon processor more than 60 percent of the time for an
extended period. In general, when a resource has surpassed a threshold for a certain amount of time, the
scheme increases that resource based on demand. When a resource is below a threshold for a certain
amount of time, that resource could be decreased accordingly. Amazon implements such an auto-scale
feature in its EC2 platform. This method is easy to implement. The scheme does not work out right if the
workload changes abruptly. The x-axis is the time scale in milliseconds. In the beginning, heavy
fluctuations of CPU load are encountered. All three methods have demanded a few VM instances
initially. Gradually, the utilization rate becomes more stabilized with a maximum of 20 VMs (100 percent
utilization) provided for demand-driven provisioning in Figure 4.25(a). However, the event-driven
method reaches a stable peak of 17 VMs toward the end of the event and drops quickly in Figure 4.25(b).
The popularity provisioning shown in Figure 4.25(c) leads to a similar fluctuation with peak VM
utilization in the middle of the plot.
Fig. EC2 performance results on the AWS EC2 platform
4.5.2.4 Event-Driven Resource Provisioning
This scheme adds or removes machine instances based on a specific time event. The scheme
works better for seasonal or predicted events such as Christmastime in the West and the Lunar New Year
in the East. During these events, the number of users grows before the event period and then decreases
during the event period. This scheme anticipates peak traffic before it happens. The method results in a
minimal loss of QoS, if the event is predicted correctly. Otherwise, wasted resources are even greater due
to events that do not follow a fixed pattern.
4.5.2.5 Popularity-Driven Resource Provisioning
In this method, the Internet searches for popularity of certain applications and creates the instances by
popularity demand. The scheme anticipates increased traffic with popularity. Again, the scheme has a
minimal loss of QoS, if the predicted popularity is correct. Resources may be wasted if traffic does not
occur as expected. In Figure 4.25(c), EC2 performance by CPU utilization rate (the dark curve with the
percentage scale shown on the left) is plotted against the number of VMs provisioned (the light curves
with scale shown on the right, with a maximum of 20 VMs provisioned).
4.5.2.6 Dynamic Resource Deployment
The cloud uses VMs as building blocks to create an execution environment across multiple resource sites.
The InterGrid-managed infrastructure was developed by a Melbourne University group. Dynamic
resource deployment can be implemented to achieve scalability in performance. The InterGrid is a Java-
implemented software system that lets users create execution cloud environments on top of all
participating grid resources. Peering arrangements established between gateways enable the allocation of
resources from multiple grids to establish the execution environment. In Figure 4.26, a scenario is
illustrated by which an intergrid gateway (IGG) allocates resources from a local cluster to deploy
applications in three steps: (1) requesting the VMs, (2) enacting the leases, and (3) deploying the VMs as
requested. Under peak demand, this IGG interacts with another IGG that can allocate resources from a
cloud computing provider.
Fig Cloud resource deployment using an IGG (intergrid gateway) to allocate the VMs from a Local cluster to interact
with the IGG of a public cloud provider
A grid has predefined peering arrangements with other grids, which the IGG manages. Through
multiple IGGs, the system coordinates the use of InterGrid resources. An IGG is aware of the peering
terms with other grids, selects suitable grids that can provide the required resources, and replies to
requests from other IGGs. Request redirection policies determine which peering grid InterGrid selects to
process a request and a price for which that grid will perform the task. An IGG can also allocate resources
from a cloud provider. The cloud system creates a virtual environment to help users deploy their
applications. These applications use the distributed grid resources.
The InterGrid allocates and provides a distributed virtual environment (DVE). This is a virtual
cluster of VMs that runs isolated from other virtual clusters. A component called the DVE manager
performs resource allocation and management on behalf of specific user applications. The core
component of the IGG is a scheduler for implementing provisioning policies and peering with other
gateways. The communication component provides an asynchronous message-passing mechanism.
Received messages are handled in parallel by a thread pool.
4.5.2.7 Provisioning of Storage Resources
The data storage layer is built on top of the physical or virtual servers. As the cloud computing
applications often provide service to users, it is unavoidable that the data is stored in the clusters of the
cloud provider. The service can be accessed anywhere in the world. One example is e-mail systems. A
typical large e-mail system might have millions of users and each user can have thousands of e-mails and
consume multiple gigabytes of disk space. Another example is a web searching application. In storage
technologies, hard disk drives may be augmented with solid-state drives in the future. This will provide
reliable and high-performance data storage. The biggest barriers to adopting flash memory in data centers
have been price, capacity, and, to some extent, a lack of sophisticated query-processing techniques.
However, this is about to change as the I/O bandwidth of solid-state drives becomes too impressive to
ignore. A distributed file system is very important for storing large-scale data. However, other forms of
data storage also exist. Some data does not need the namespace of a tree structure file system, and
instead, databases are built with stored data files. In cloud computing, another form of data storage is
(Key, Value) pairs. Amazon S3 service uses SOAP to access the objects stored in the cloud. Table 4.8
outlines three cloud storage services provided by Google, Hadoop, and Amazon.
Table 4.8 Storage Services in Three Cloud Computing Systems
Many cloud computing companies have developed large-scale data storage systems to keep huge
amount of data collected every day. For example, Google’s GFS stores web data and some other data,
such as geographic data for Google Earth. A similar system from the open source community is the
Hadoop Distributed File System (HDFS) for Apache. Hadoop is the open source implementation of
Google’s cloud computing infrastructure. Similar systems include Microsoft’s Cosmos file system for the
cloud. Despite the fact that the storage service or distributed file system can be accessed directly, similar
to traditional databases, cloud computing does provide some forms of structure or semistructure database
processing capability. For example, applications might want to process the information contained in a
web page. Web pages are an example of semistructural data in HTML format. If some forms of database
capability can be used, application developers will construct their application logic more easily. Another
reason to build a databaselike service in cloud computing is that it will be quite convenient for traditional
application developers to code for the cloud platform.
Databases are quite common as the underlying storage device for many applications. Thus, such
developers can think in the same way they do for traditional software development. Hence, in cloud
computing, it is necessary to build databases like large-scale systems based on data storage or distributed
file systems. The scale of such a database might be quite large for processing huge amounts of data. The
main purpose is to store the data in structural or semi-structural ways so that application developers can
use it easily and build their applications rapidly. Traditional databases will meet the performance
bottleneck while the system is expanded to a larger scale. However, some real applications do not need
such strong consistency. The scale of such databases can be quite large. Typical cloud databases include
BigTable from Google, SimpleDB from Amazon, and the SQL service from Microsoft Azure.
4.5.3 Virtual Machine Creation and Management
In this section, we will consider several issues for cloud infrastructure management. First, we will
consider the resource management of independent service jobs. Then we will consider how to execute
third-party cloud applications. Cloud-loading experiments are used by a Melbourne research group on the
French Grid’5000 system. This experimental setting illustrates VM creation and management. This case
study example reveals major VM management issues and suggests some plausible solutions for
workload-balanced execution. Figure 4.27 shows the interactions among VM managers for cloud creation
and management. The managers provide a public API for users to submit and control the VMs.
FIGURE 4.27 Interactions among VM managers for cloud creation and management; the manager provides a public
API for users to submit and control the VMs
4.5.3.1 Independent Service Management
Independent services request facilities to execute many unrelated tasks. Commonly, the APIs
provided are some web services that the developer can use conveniently. In Amazon cloud computing
infrastructure, SQS is constructed for providing a reliable communication service between different
providers. Even the endpoint does not run while another entity has posted a message in SQS. By using
independent service providers, the cloud applications can run different services at the same time. Some
other services are used for providing data other than the compute or storage services.
4.5.3.2 Running Third-Party
Applications Cloud platforms have to provide support for building applications that are constructed by
third-party application providers or programmers. As current web applications are often provided by
using Web 2.0 forms (interactive applications with Ajax), the programming interfaces are different from
the traditional programming interfaces such as functions in runtime libraries. The APIs are often in the
form of services. Web service application engines are often used by programmers for building
applications. The web browsers are the user interface for end users. In addition to gateway applications,
the cloud computing platform provides the extra capabilities of accessing backend services or underlying
data. As examples, GAE and Microsoft Azure apply their own cloud APIs to get special cloud services.
The WebSphere application engine is deployed by IBM for Blue Cloud. It can be used to develop any
kind of web application written in Java. In EC2, users can use any kind of application engine that can run
in VM instances.
4.5.3.3 Virtual Machine Manager
The VM manager is the link between the gateway and resources. The gateway doesn’t share physical
resources directly, but relies on virtualization technology for abstracting them. Hence, the actual
resources it uses are VMs. The manager manage VMs deployed on a set of physical resources. The VM
manager implementation is generic so that it can connect with different VIEs. Typically, VIEs can create
and stop VMs on a physical cluster. The Melbourne group has developed managers for OpenNebula,
Amazon EC2, and French Grid’5000. The manager using the OpenNebula OS (www.opennebula.org) to
deploy VMs on local clusters. OpenNebula runs as a daemon service on a master node, so the VMM
works as a remote user. Users submit VMs on physical machines using different kinds of hypervisors,
such as Xen (www.xen.org), which enables the running of several operating systems on the same host
concurrently.
The VMM also manages VM deployment on grids and IaaS providers. The InterGrid supports Amazon
EC2. The connector is a wrapper for the command-line tool Amazon provides. The VM manager for
Grid’5000 is also a wrapper for its command-line tools. To deploy a VM, the manager needs to use its
template.
4.5.3.4 Virtual Machine Templates
A VM template is analogous to a computer’s configuration and contains a description for a VM with the
following static information:
• The number of cores or processors to be assigned to the VM
• The amount of memory the VM requires
• The kernel used to boot the VM’s operating system
• The disk image containing the VM’s file system
• The price per hour of using a VM
The gateway administrator provides the VM template information when the infrastructure is set
up. The administrator can update, add, and delete templates at any time. In addition, each gateway in the
InterGrid network must agree on the templates to provide the same configuration on each site. To deploy
an instance of a given VM, the VMM generates a descriptor from the template. This descriptor contains
the same fields as the template and additional information related to a specific VM instance. Typically the
additional information includes:
FIGURE 4.29 Cloud loading results at four gateways at resource sites in the Grid’5000 system
The increasing demands in cloud computing arena has resulted in more heterogeneous
infrastructure, making interoperability an area of concern. Due to this, it becomes a challenge for cloud
customers to select appropriate cloud service provider (CSP) and hence it ties them to a particular CSP.
This is where intercloud computing comes into play. Although intercloud computing is still in its infancy,
its purpose is to allow smooth interoperability between clouds, regardless of their underlying
infrastructure. This allows users to migrate their workloads across clouds easily. Cloud brokerage is a
promising aspect of intercloud computing .
Most of the data-intensive applications are now deployed on the clouds. These applications,
storage, and data resource are so diversely located that they have to reach even cross-continental
networks. Due to this, performance degradation in networks affects the performance of cloud systems and
user requests. To ensure service quality, especially for bulk-data transfer, resource reservation and
utilization become a critical issue .
Previous works mainly focus on integrated and collaborative uses of resources to meet application
requirements. They do not focus on bulk-data transfer consistency and efficiency. They assume that all
resources are connected by high-speed stable networks. Continuously growing cloud market faces new
challenges now. Even though users have well collaborated end systems and resources are allocated
according to their needs, still, bulk-data transfer for cross-continental users in remote places might create
performance bottleneck. For instance, multimedia services like IP Television (IPTV) rely on availability
of sufficient network resources and hence they have to be operated within the limitation of time
constraints
Fig. Resource management
Resource provisioning is required to allocate limited resource efficiently on the (partner cloud) resource
provider cloud. Provisioning is also needed to provide sense of ownership to the end user, for cloud
provider (partner/host) provisioning is required to address metering aspect.
Alliance service is responsible for provisioning remote resources. The resource provisioning will be done
on local Keystone project at hosting cloud. It also maintains provisioning information in local database.
The provisioning info will be used for the purpose of token generation and validation.
In the above picture user maintains his identity at one place (host cloud) and owns resources from remote
cloud(s) on a local project. This is another benefit of resource federation, where user can use a single
project in host cloud to scope all the remote resources across the cloud(s).
Resource Access Across Clouds
Resource access process start by getting an “X-Auth-Token” scoped to local Keystone project of "host"
cloud. Keystone service at HC will talk to local Alliance to get information about remote resources
associated with project
As part of token response client gets a service catalog containing endpoints to the remote (federated)
resources. Client uses the remote resource endpoint to access the resource it provides (X-Host-Cloud-Id)
host cloud identifier in request header and the X-Auth-Token{hc} got from host cloud.
Auth middle-ware protecting the resource at partner cloud intercepts the request and makes a call to
Keystone for token validation. Keystone delegates such the token validation request to Alliance service
which is not issued by it (foreign token) and have X-Host-Cloud-Id header associated.Alliance uses the
cloud identifier (X-Host-Cloud-Id) from the header to lookup the paired host cloud and it's peer Alliance
endpoint. Using the X-Auth-Token{hc}, it forms an InterCloud Federation Ticket and uses paired
Alliance endpoint to validate user token Alliance at HC will coordinate with local Keystone to validate
the token After successful inter-cloud token validation Alliance service provide the validate response to
Keystone service running at PC. Keystone will caches the token in locale system and respond to middle-
ware. Keystone will use the cached token for future token validations.
X-Auth-Token processing
Clients won't like to deal with multiple X-Auth-Tokens to access their resources across clouds (regions).
Following are the options to solve this issue.
PKI tokens
PKI tokens can be used by clients to access resources across clouds. There won't be inter-cloud token
validation required to validate the PKI tokens. PKI token are proven to be heavy, Federated Token can be
a better solution.
Federated Tokens
Instead of generating new X-Auth-Token{pc} by partner cloud, partner cloud may choose to use the same
X-Auth-Token{hc} issued by host cloud.
After successful inter-cloud token validation (as explained above) Alliance will cache the token locally
and utilize the same X-Auth-Token for future communication. This option can be set as part of cloud
pairing depending of level of trust between two cloud providers.
Note: Inter-cloud token validation is one time process or can be done multiple times over the period of
communication by clients.
Federated Tokens by Eager Propagation
To support federated tokens partner cloud has to do inter-cloud token validation and cache the validate
token response to make the future token validation more efficient. Another approach to solve the
performance of inter-cloud token validation is to propagate the tokens to partner cloud in push mode.
Host cloud will propagate token to the relevant partner using notification route.
SSO Across Cloud
In this mode, clients chooses to use PC's identity (Keystone) endpoint to make auth token request. Client
provide credentials, project_id and cloud_id to the PC's identity service. Keystone will coordinate with
Alliance service to get the token from remote cloud.
SSOut Across Cloud (or InterCloud Token Revocation)
Token revocation in an important aspect to maintain the security and system integrity. In case of resource
federation use case, tokens revocation become more important as an stale token can cause bigger harm
specially to the resource provider clouds.
Inter-cloud token revocation will allow token revocation across cloud, e.g. Host cloud can initiate the
token revocation for a token issued by itself or partner clouds can request/initiate the token revocation of
a federated token.
Alliance service is will be the interface between clouds to make the token revocation happen.
4.2 Resource Provisioning and Resource Provisioning Methods
NEED OF RESOURCE PROVISIONING
To increase the gratification and the chances or possibility of the users reaching the cloud, there is needed
to increase the large number of the requests or feedbacks that gratified from the cloud. Therefore, because
of these perspectives the profit becomes so higher to the cloud in the consequence. There is possibility to
appeal the customers of the cloud computing application to the cloud is to merge or short time for the
responding. To make the attraction of the customers or users with cloud, the cloud is to need for accepting
the resource provisioning technique which creates or generates the highest rate of the business deal. For
the developing of the higher qualities of the business deal, there is not needed to be settled with the lack
of period of the time. By giving the preference to the last, the trade-offs is to be sorted among the
transaction success and U-turn time. Hence the shortage of the time can be created as far as it is
conceivable for having the main goal to keep the high rate of the dealing success .
The applications can be used properly by applying the purpose of resource provisioning which
implies that to discover the reasonable resources for the appropriate workloads in time. The best
consequences can get by using the more effective resources. The reasonable and appropriate workload
discovery is one of the main goals that maintain the program of different workloads. For making the
quality of services more effective there is needed to satisfy the parts or units like utility, availability,
reliability, time, security, price and CPU etc. So the resource provisioning reflects the performance of the
time for the various workloads. All the presentations depend upon the kind or type of workload. There are
entirely two generic way of resource provisioning
Sl.
No.
Resource Provisioning Techniques
Merits
Challenges
1
Deadline-driven provisioning of resources
for scientific applications in hybrid clouds
with Aneka [5]
Able to efficiently allocate
resources from different sources
in order to reduce application
execution times.
Not suitable for HPC-data intensive
applications.
2
Dynamic provisioning in multi-tenant service
clouds [15]
Matches tenant functionalities
with client requirements.
Does not work for testing on real-life
cloud–based system and across several
domains.
3
Elastic Application Container: A
Lightweight Approach for Cloud Resource
Provisioning [19]
Outperforms in terms of
flexibility and resource
efficiency.
Not suitable for web applications and
supports only one type of programming
language, Java.
International Journal of Research
in Computer and
Communication Technology, Vol
3, Issue 3, March- 2014
ISSN (Online) 2278- 5841
ISSN (Print) 2320- 5156
www.ijrcct.org Page 398
4
Hybrid Cloud Resource Provisioning Policy
in the
Presence of Resource Failures [31]
Able to adopt user the
workload model to provide
flexibility in the choice of
strategy based on the desired
level of QoS, the needed
performance, and the available
budget.
Not suitable to run real experiments.
5
Provisioning of Requests for Virtual
Machine Sets with Placement Constraints in
IaaS Clouds [38]
Runtime efficient & can provide
an effective means of online
VM-to-PM mapping and also
Maximizes revenue.
Not practical for medium to large
problems.
6
Failure-aware resource provisioning for
hybrid Cloud infrastructure [11]
Able to improve the users’ QoS
about 32% in terms of deadline
violation rate and 57% in terms
of slowdown with a limited cost
on a public cloud.
Not able to run real experiments and
also not able to move VMs between
public and private clouds to deal with
resource failures in the local
infrastructures.
7
VM Provisioning Method to Improve the
Profit and SLA Violation of Cloud Service
Providers [27]
Reduces SLA violations &
Improves Profit.
Increases the problem of resource
allocation and load balancing among the
datacenters.
8
Risk Aware Provisioning and Resource
Aggregation based Consolidation of Virtual
Machines [21]
Significant amount of reduction
in the numbers required to host
1000 VMs and enables to turn
off unnecessary servers.
Takes into account only CPU
requirements of VMs.
9
Semantic based Resource Provisioning and
Scheduling in Inter-cloud Environment [20]
Enables the fulfillment of
customer requirements to the
maximum by providing
additional resources to the cloud
system participating in a
federated cloud environment
thereby solving the
interoperability problem.
QoS parameters like response time and
throughput has to be achieved for
interactive applications.
10
Design and implementation of adaptive
power-aware virtual machine provisioner
(APA-VMP) using swarm intelligence [7]
Efficient VM placement and
significant reduction in power.
Not suitable for conserving power in
modern data centers.
11
Adaptive resource provisioning for read
intensive multi-tier applications in the cloud
[2]
Automatic Identification and
resolution of bottlenecks in
multitier web application hosted
on a cloud.
Not suitable for n-tier clustered
application hosted on a cloud.
12
Optimal Resource Provisioning for Cloud
Computing Environment [13]
Efficiently provisions Cloud
Resources for SaaS users with a
limited budget and Deadline
thereby optimizing QoS.
Applicable only for SaaS users and
SaaS providers.
The procedure of cloud resource provisioning. Cloud consumer can interact with the help of cloud
portal and also presents the quality of services (QoS) requirements of workload after the complete
authentication procedure. Resource information centre (RIC) delivered the information that is based on
customer requirements and available resources are checked by the resource provisioning agent (RPA). It
is also helpful in provision of desired resources to the cloud application workload for running or
execution in the environment of cloud but after fulfill the condition that is demanded resources are
present in the resource pool.
Resource provisioning agent (RPA) applications for again submit the workload with another
quality of services requirement like a service level application article or document in the condition of if
the desired resource is unavailable as the requirements of quality of services. Resource scheduler
submitted the workloads after the appropriate provisioning of available resources. Resource scheduler
requests to submit and present the whole workloads for all the available provisioned resources. Then
resource provisioning agent again received the results and send these provisioning outputs and results to
the cloud consumer
A holistic cloud security program should account for ownership and accountability (internal/external) of
cloud security risks, gaps in protection/compliance, and identify controls needed to mature security and
reach the desired end state.
Network Segmentation
In multi-tenant environments, assess what segmentation is in place between your resources and those of
other customers, as well as between your own instances. Leverage a zone approach to isolate instances,
containers, applications, and full systems from each other when possible.
Leverage robust identity management and authentication processes to ensure only authorized users to
have access to the cloud environment, applications, and data. Enforce least privilege to restrict privileged
access and to harden cloud resources (for instance, only expose resources to the Internet as is necessary,
and de-activate unneeded capabilities/features/access). Ensure privileges are role-based, and that
privileged access is audited and recorded via session monitoring.
Once cloud instances, services, and assets are discovered and grouped, bring them under management
(i.e. managing and cycling passwords, etc.). Discovery and onboarding should be automated as much as
possible to eliminate shadow IT.
Never allow the use of shared passwords. Combine passwords with other authentication systems for
sensitive areas. Ensure password management best practices.
Vulnerability Management
Regularly perform vulnerability scans and security audits, and patch known vulnerabilities.
Encryption
Disaster Recovery
Be aware of the data backup, retention, and recovery policies and processes for your cloud vendor(s). Do
they meet your internal standards? Do you have break-glass strategies and solutions in place?
Increased infrastructure layers to manage and protect: Depending on the type of cloud model in use, there
are a large number of additional infrastructure layers such as gateways, firewalls, access routers, and
others, that need to be managed and protected, at the same time allowing access to the authorized users to
perform their tasks.
Multiple operating systems and applications per server: On virtualized commodity hardware, multiple
workloads on a physical server run concurrently, with multiple operating systems and even with same
operating systems but at different patch levels.
Elimination of physical boundaries between systems: As virtualization adoption increases, workloads are
co-located sharing the same physical infrastructure.
Access management: Access management refers to the processes and technologies used to control and
monitor network access. Access management features, such as authentication, authorization, trust and
security auditing, are part and parcel of the top ID management systems for both on-premises and cloud-
based systems.
Active Directory (AD): Microsoft developed AD as a user-identity directory service for Windows
domain networks. Though proprietary, AD is included in the Windows Server operating system and is
thus widely deployed.
Biometric authentication: A security process for authenticating users that relies upon the user’s unique
characteristics. Biometric authentication technologies include fingerprint sensors, iris and retina scanning,
and facial recognition.
Context-aware network access control: Context-aware network access control is a policy-based method
of granting access to network resources according to the current context of the user seeking access. For
example, a user attempting to authenticate from an IP address that hasn’t been whitelisted would be
blocked.
Credential: An identifier employed by the user to gain access to a network such as the user’s password,
public key infrastructure (PKI) certificate, or biometric information (fingerprint, iris scan).
De-provisioning: The process of removing an identity from an ID repository and terminating access
privileges.
Digital identity: The ID itself, including the description of the user and his/her/its access privileges.
(“Its” because an endpoint, such as a laptop or smartphone, can have its own digital identity.)
Entitlement: The set of attributes that specify the access rights and privileges of an authenticated security
principal.
Identity as a Service (IDaaS): Cloud-based IDaaS offers identity and access management functionality
to an organization’s systems that reside on-premises and/or in the cloud.
Identity lifecycle management: Similar to access lifecycle management, the term refers to the entire set
of processes and technologies for maintaining and updating digital identities. Identity lifecycle
management includes identity synchronization, provisioning, de-provisioning, and the ongoing
management of user attributes, credentials and entitlements.
Identity synchronization: The process of ensuring that multiple identity stores—say, the result of an
acquisition—contain consistent data for a given digital ID.
Lightweight Directory Access Protocol (LDAP): LDAP is open standards-based protocol for managing
and accessing a distributed directory service, such as Microsoft’s AD
Multi-factor authentication (MFA): MFA is when more than just a single factor, such as a user name
and password, is required for authentication to a network or system. At least one additional step is also
required, such as receiving a code sent via SMS to a smartphone, inserting a smart card or USB stick, or
satisfying a biometric authentication requirement, such as a fingerprint scan.
Password reset: In this context, it’s a feature of an ID management system that allows users to re-
establish their own passwords, relieving the administrators of the job and cutting support calls. The reset
application is often accessed by the user through a browser. The application asks for a secret word or a set
of questions to verify the user’s identity.
Privileged account management: This term refers to managing and auditing accounts and data access
based on the privileges of the user. In general terms, because of his or her job or function, a privileged
user has been granted administrative access to systems. A privileged user, for example, would be able set
up and delete user accounts and roles.Provisioning: The process of creating identities, defining their
access privileges and adding them to an ID repository.
Risk-based authentication (RBA): Risk-based authentication dynamically adjusts authentication
requirements based on the user’s situation at the moment authentication is attempted. For example, when
users attempt to authenticate from a geographic location or IP address not previously associated with
them, those users may face additional authentication requirements.
Security principal: A digital identity with one or more credentials that can be authenticated and
authorized to interact with the network.
Single sign-on (SSO): A type of access control for multiple related but separate systems. With a single
username and password, a user can access a system or systems without using different credentials.
User behavior analytics (UBA): UBA technologies examine patterns of user behavior and automatically
apply algorithms and analysis to detect important anomalies that may indicate potential security threats.
UBA differs from other security technologies, which focus on tracking devices or security events. UBA is
also sometimes grouped with entity behavior analytics and known as UEBA.
IAM vendors
The identity and access management vendor landscape is a crowded one, consisting of both pureplay
providers such as Okta and OneLogin and large vendors such as IBM, Microsoft and Oracle. Below is a
list of leading players based on Gartner’s Magic Quadrant for Access Management, Worldwide, which
was published in June 2017.
Atos (Evidan)
CA Technologies
Centrify
Covisint
ForgeRock
IBM Security Identity and Access Assurance
I-Spring Innovations
Micro Focus
Microsoft Azure Active Directory
Okta
OneLogin
Optimal idM
Oracle Identity Cloud Service
Ping
SecureAuth
Mapping standards to use cases Table 1 summarizes this section and shows which standards are relevant
for customers in the different use cases.
Application domain Other characteristics
IaaS
Paas
Saas
Facilities
Organization
Adoption
Usage/
Auditng
Certification/
Availability
Openness/
HTML/XML x x xxx x xxx
WSDL/SOAP x x xxx x xxx
OAuth/OpenID x xxx x xxx
SAML x xxx x xxx
OData x x x x xxx
OVF x xxx x xxx
OpenStack x x xx x xx
CAMP x x x xx
CIMI x x x xxx
ODCA SUoM x x x xx
SCAP x x x x xxx x xx
ISO 27001 x x xxx xxx xx
ITIL x xx xxx xx
SOC x x xx xxx xx
Tier Certification x xx xxx x
A.1 HTML/XML
Full title HyperText Markup Language (HTML) / eXtensible Markup Language (XML)
Description HTML is the markup language for web pages – it is used for displaying text, links,
images for human readers. HTML requires no further introduction.
XML is a mark-up language and structure for encoding data in a format that is
machine-readable. It is used for exchanging data between systems, for example in
web services, but it is also used in application programming interfaces (APIs), or to
store configuration files or other internal system data.
Hundreds of XML-based languages have been developed, including RSS, Atom,
SOAP, and XHTML. XML-based formats have become the default for many office-
productivity tools. XML has also been employed as the base language for
communication protocols, such as WSDL, and XMPP.
WSDL/SOAP
Full title Web Services Description Language (WSDL)
Description WSDL is an XML-based interface description language that is used for describing a
web service. A WSDL description provides a machine-readable description of how
the service can be called, what parameters it expects, and what data structures it
returns. SOAP, an underlying standard, is used as a wrapper for transporting WSDL
messages (for example over HTTP).
Link http://www.w3.org/2000/xp/Group/
Organisation World Wide Web Consortium :
XML Protocol Working Group; and
Web Services Description Working Group
Certification None - x
and compliance
Companies often implement WSDL on a voluntary basis, without any formal process
to check compliance (such as certification). There are tools to validate
interoperability and vendors sometimes participate in multi-vendor interoperability
workshops.
Adoption Globally – xxx
Thousands of companies use WSDL.
SAML/XACML
Full title Security Assertion Markup Language (SAML), Extensible Access Control Markup
Language (XACML)
Description SAML/XACML are XML-based languages and protocols for authentication and
authorisation (on the web and inside local networks) of users for accessing websites.
SAML/XACML supports the integration of websites and intranet servers with
authentication/authorisation services and products, providing SSO for users (aka
federation). SAML/XACML is used widely in enterprise software and e-
government, for example.
OAuth/OpenID are alternatives more widely used in social media.
Link http://saml.xml.org/wiki/saml-wiki-knowledgebase
Organisation Organization for the Advancement of Structured Information Standards (OASIS)
Application domain SaaS
As a framework that allows to access to an HTTP service, it works on the API/GUI
component of the cloud service model.
Openness Open - xxx
Development – Standard is discussed by OASIS Security Services
Technical Committee experts.
Availability – Document is freely available to download from OASIS
website.
Certification None - x
and compliance
Compliance to SAML and XACML is usually not formally audited or certified –
there are multi-vendor interoperability workshops.
OData
Full title Open Data Protocol
Link http://www.odata.org/
Organisation Microsoft developed the standard. It has been proposed for adoption by
Organization for the Advancement of Structured Information Standards (OASIS)
Description OData is a web protocol for querying and updating data. OData applies and builds
upon Web technologies such as HTTP, Atom Publishing Protocol and JSON to
provide access to information from a variety of applications, services, and
stores.OData can be used to expose and access information from a variety of
sources including, but not limited to, relational databases, file systems, content
management systems and traditional Web sites.
Application domain IaaS and SaaS - OData provides a (REST-full) API for managing data.
ISO 27001
Full title Data Center Site Infrastructure Tier Standard
Link http://uptimeinstitute.com/publications
Organisation The Uptime Institute
Description The standard is an objective basis for comparing the functionality, capacities, and
relative cost of a particular site infrastructure design topology against others, or to
compare group of sites.
Adoption/usage Widely adopted – There are 269 data centers certified from Tier II to Tier IV
(according to Uptime Institute website).
5.1 Hadoop
Apache Hadoop is an open source software framework used to develop data processing
applications which are executed in a distributed computing environment. Applications built using
HADOOP are run on large data sets distributed across clusters of commodity computers. Commodity
computers are cheap and widely available. These are mainly useful for achieving greater computational
power at low cost.
Similar to data residing in a local file system of a personal computer system, in Hadoop, data
resides in a distributed file system which is called as a Hadoop Distributed File system. The processing
model is based on 'Data Locality' concept wherein computational logic is sent to cluster nodes (server)
containing data. This computational logic is nothing, but a compiled version of a program written in a
high-level language such as Java. Such a program, processes data stored in Hadoop HDFS.
Hadoop EcoSystem and Components
Hadoop Architecture
Features Of 'Hadoop'
Network Topology In Hadoop
Apache Hadoop consists of two sub-projects –
Hadoop MapReduce: MapReduce is a computational model and software framework for writing
applications which are run on Hadoop. These MapReduce programs are capable of processing enormous
data in parallel on large clusters of computation nodes.
HDFS (Hadoop Distributed File System): HDFS takes care of the storage part of Hadoop applications.
MapReduce applications consume data from HDFS. HDFS creates multiple replicas of data blocks and
distributes them on compute nodes in a cluster. This distribution enables reliable and extremely rapid
computations.
Although Hadoop is best known for MapReduce and its distributed file system- HDFS, the term is also
used for a family of related projects that fall under the umbrella of distributed computing and large-scale
data processing.
NameNode and DataNodes
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master
server that manages the file system namespace and regulates access to files by clients. In addition, there
are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the
nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files.
Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The
NameNode executes file system namespace operations like opening, closing, and renaming files and
directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for
serving read and write requests from the file system’s clients. The DataNodes also perform block
creation, deletion, and replication upon instruction from the NameNode.
The NameNode and DataNode are pieces of software designed to run on commodity machines.
These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java
language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the
highly portable Java language means that HDFS can be deployed on a wide range of machines. A typical
deployment has a dedicated machine that runs only the NameNode software. Each of the other machines
in the cluster runs one instance of the DataNode software. The architecture does not preclude running
multiple DataNodes on the same machine but in a real deployment that is rarely the case.
The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The
NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way
that user data never flows through the NameNode.
The File System Namespace
HDFS supports a traditional hierarchical file organization. A user or an application can create
directories and store files inside these directories. The file system namespace hierarchy is similar to most
other existing file systems; one can create and remove files, move a file from one directory to another, or
rename a file. HDFS supports user quotas and access permissions. HDFS does not support hard links or
soft links. However, the HDFS architecture does not preclude implementing these features.
While HDFS follows naming convention of the FileSystem, some paths and names (e.g. /.reserved
and .snapshot ) are reserved. Features such as transparent encryption and snapshot use reserved paths.
The NameNode maintains the file system namespace. Any change to the file system namespace or its
properties is recorded by the NameNode. An application can specify the number of replicas of a file that
should be maintained by HDFS. The number of copies of a file is called the replication factor of that file.
This information is stored by the NameNode.
Data Replication
HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as
a sequence of blocks. The blocks of a file are replicated for fault tolerance. The block size and replication
factor are configurable per file. All blocks in a file except the last block are the same size, while users can
start a new block without filling out the last block to the configured block size after the support for
variable length block was added to append and hsync. An application can specify the number of replicas
of a file.
The replication factor can be specified at file creation time and can be changed later. Files in
HDFS are write-once (except for appends and truncates) and have strictly one writer at any time.
The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat
and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the
DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.
Replication
The placement of replicas is critical to HDFS reliability and performance. Optimizing replica
placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots
of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data
reliability, availability, and network bandwidth utilization. The current implementation for the replica
placement policy is a first effort in this direction. The short-term goals of implementing this policy are to
validate it on production systems, learn more about its behavior, and build a foundation to test and
research more sophisticated policies.
Large HDFS instances run on a cluster of computers that commonly spread across many racks.
Communication between two nodes in different racks has to go through switches. In most cases, network
bandwidth between machines in the same rack is greater than network bandwidth between machines in
different racks. The NameNode determines the rack id each DataNode belongs to via the process outlined
in Hadoop Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks. This
prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when
reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on
component failure. However, this policy increases the cost of writes because a write needs to transfer
blocks to multiple racks.
For the common case, when the replication factor is three, HDFS’s placement policy is to put one
replica on the local machine if the writer is on a datanode, otherwise on a random datanode in the same
rack as that of the writer, another replica on a node in a different (remote) rack, and the last on a different
node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write
performance. The chance of rack failure is far less than that of node failure; this policy does not impact
data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth
used when reading data since a block is placed in only two unique racks rather than three. With this
policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one
node, two thirds of replicas are on one rack, and the other third are evenly distributed across the
remaining racks. This policy improves write performance without compromising data reliability or read
performance.
If the replication factor is greater than 3, the placement of the 4th and following replicas are
determined randomly while keeping the number of replicas per rack below the upper limit (which is
basically (replicas - 1) / racks + 2).Because the NameNode does not allow DataNodes to have multiple
replicas of the same block, maximum number of replicas created is the total number of DataNodes at that
time.
After the support for Storage Types and Storage Policies was added to HDFS, the NameNode
takes the policy into account for replica placement in addition to the rack awareness described above. The
NameNode chooses nodes based on rack awareness at first, then checks that the candidate node have
storage required by the policy associated with the file. If the candidate node does not have the storage
type, the NameNode looks for another node. If enough nodes to place replicas cannot be found in the first
path, the NameNode looks for nodes having fallback storage types in the second path.
Replica Selection
To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request
from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node,
then that replica is preferred to satisfy the read request. If HDFS cluster spans multiple data centers, then
a replica that is resident in the local data center is preferred over any remote replica.
5.2 MapReduce
What is MapReduce?
Hadoop MapReduce (Hadoop Map/Reduce) is a software framework for distributed processing of large
data sets on computing clusters. It is a sub-project of the Apache Hadoop project. Apache Hadoop is an
open-source framework that allows to store and process big data in a distributed environment across
clusters of computers using simple programming models. MapReduce is the core component for data
processing in Hadoop framework. In layman’s term Mapreduce helps to split the input data set into a
number of parts and run a program on all data parts parallel at once. The term MapReduce refers to two
separate and distinct tasks. The first is the map operation, takes a set of data and converts it into another
set of data, where individual elements are broken down into tuples (key/value pairs). The reduce
operation combines those data tuples based on the key and accordingly modifies the value of the key.
Map Task
The Map task run in the following phases:-
a. RecordReader
The recordreader transforms the input split into records. It parses the data into records but does not parse
records itself. It provides the data to the mapper function in key-value pairs. Usually, the key is the
positional information and value is the data that comprises the record.
b. Map
In this phase, the mapper which is the user-defined function processes the key-value pair from the
recordreader. It produces zero or multiple intermediate key-value pairs. The decision of what will be the
key-value pair lies on the mapper function. The key is usually the data on which the reducer function does
the grouping operation. And value is the data which gets aggregated to get the final result in the reducer
function.
c. Combiner
The combiner is actually a localized reducer which groups the data in the map phase. It is optional.
Combiner takes the intermediate data from the mapper and aggregates them. It does so within the small
scope of one mapper. In many situations, this decreases the amount of data needed to move over the
network. For example, moving (Hello World, 1) three times consumes more network bandwidth than
moving (Hello World, 3). Combiner provides extreme performance gain with no drawbacks. The
combiner is not guaranteed to execute. Hence it is not of overall algorithm.
d. Partitioner
Partitioner pulls the intermediate key-value pairs from the mapper. It splits them into shards, one shard
per reducer. By default, partitioner fetches the hashcode of the key. The partitioner performs modulus
operation by a number of reducers: key.hashcode()%(number of reducers). This distributes the keyspace
evenly over the reducers. It also ensures that key with the same value but from different mappers end up
into the same reducer. The partitioned data gets written on the local file system from each map task. It
waits there so that reducer can pull it.
b. Reduce Task
The various phases in reduce task are as follows:
i. Shuffle and Sort
The reducer starts with shuffle and sort step. This step downloads the data written by partitioner to the
machine where reducer is running. This step sorts the individual data pieces into a large data list. The
purpose of this sort is to collect the equivalent keys together. The framework does this so that we could
iterate over it easily in the reduce task. This phase is not customizable. The framework handles everything
automatically. However, the developer has control over how the keys get sorted and grouped through a
comparator object.
ii. Reduce
The reducer performs the reduce function once per key grouping. The framework passes the function key
and an iterator object containing all the values pertaining to the key. We can write reducer to filter,
aggregate and combine data in a number of different ways. Once the reduce function gets finished it gives
zero or more key-value pairs to the output format. Like map function, reduce function changes from job
to job. As it is the core logic of the solution.
iii. Output Format
This is the final step. It takes the key-value pair from the reducer and writes it to the file by record writer.
By default, it separates the key and value by a tab and each record by a newline character. We can
customize it to provide richer output format. But none the less final data gets written to HDFS.
YARN
YARN or Yet Another Resource Negotiator is the resource management layer of Hadoop. The basic
principle behind YARN is to separate resource management and job scheduling/monitoring function into
separate daemons. In YARN there is one global ResourceManager and per-application
ApplicationMaster. An Application can be a single job or a DAG of jobs.Inside the YARN framework,
we have two daemons ResourceManager and NodeManager. The ResourceManager arbitrates resources
among all the competing applications in the system. The job of NodeManger is to monitor the resource
usage by the container and report the same to ResourceManger. The resources are like CPU, memory,
disk, network and so on.
i. Scheduler
Scheduler is responsible for allocating resources to various applications. This is a pure scheduler as it
does not perform tracking of status for the application. It also does not reschedule the tasks which fail due
to software or hardware errors. The scheduler allocates the resources based on the requirements of the
applications.
ii. Application Manager
Following are the functions of ApplicationManager
Accepts job submission.
Negotiates the first container for executing ApplicationMaster. A container incorporates elements
such as CPU, memory, disk, and network.
Restarts the ApplicationMaster container on failure.
Functions of ApplicationMaster:-
Negotiates resource container from Scheduler.
Tracks the resource container status.
Monitors progress of the application.
We can scale the YARN beyond a few thousand nodes through YARN Federation feature. This feature
enables us to tie multiple YARN clusters into a single massive cluster. This allows for using independent
clusters, clubbed together for a very large job.
iii. Features of Yarn
YARN has the following features:-
a. Multi-tenancy
YARN allows a variety of access engines (open-source or propriety) on the same Hadoop data set. These
access engines can be of batch processing, real-time processing, iterative processing and so on.
b. Cluster Utilization
With the dynamic allocation of resources, YARN allows for good use of the cluster. As compared to
static map-reduce rules in previous versions of Hadoop which provides lesser utilization of the cluster.
c. Scalability
Any data center processing power keeps on expanding. YARN’s ResourceManager focuses on scheduling
and copes with the ever-expanding cluster, processing petabytes of data.
d. Compatibility
MapReduce program developed for Hadoop 1.x can still on this YARN. And this is without any
disruption to processes that already work.
5.3 Virtual Box
VirtualBox is opensource software for virtualizing the X86 computing architecture. It acts as a
hypervisor, creating a VM (Virtual Machine) in which the user can run another OS (operating system).
The operating system in which VirtualBox runs is called the "host" OS. The operating system running in
the VM is called the "guest" OS. VirtualBox supports Windows, Linux, or macOS as its host OS. When
configuring a virtual machine, the user can specify how many CPU cores, and how much RAM and disk
space should be devoted to the VM. When the VM is running, it can be "paused." System execution is
frozen at that moment in time, and the user can resume using it later.
Why Is VirtualBox Useful?
One:
VirtualBox allows you to run more than one operating system at a time. This way, you can run
software written for one operating system on another (for example, Windows software on Linux or a
Mac) without having to reboot to use it (as would be needed if you used partitioning and dual-booting).
You can also configure what kinds of “virtual” hardware should be presented to each such operating
system, and you can install an old operating system such as DOS or OS/2 even if your real computer’s
hardware is no longer supported by that operating system.
Two:
Sometimes, you may want to try out some new software, but would rather not chance it mucking
up the pretty decent system you’ve got right now. Once installed, a virtual machine and its virtual hard
disks can be considered a “container” that can be arbitrarily frozen, woken up, copied, backed up, and
transported between hosts.
By using a VirtualBox feature called “snapshots”, you can save a particular state of a virtual machine and
revert back to that state, if necessary. This way, you can freely experiment with a computing
environment. If something goes wrong (e.g. after installing misbehaving software or infecting the guest
with a virus), you can easily switch back to a previous snapshot and avoid the need of frequent backups
and restores.
Three:
Software vendors can use virtual machines to ship entire software configurations. For example,
installing a complete mail server solution on a real machine can be a tedious task (think of rocket
science!). With VirtualBox, such a complex setup (then often called an “appliance”) can be packed into a
virtual machine. Installing and running a mail server becomes as easy as importing such an appliance into
VirtualBox.
Along these same lines, I find the “clone” feature of virtual box just awesome! By cloning virtual
machines, I’m able to move them from one machine to another along with all saved snapshots. If you try
to imagine what it would involve to do something similar with physical machines, you will immediately
see the power of this feature. Do have a look at my tutorial on moving virtual machines with snapshots.
Four:
On an enterprise level, virtualization can significantly reduce hardware and electricity costs. Most
of the time, computers today only use a fraction of their potential power and run with low average system
loads. A lot of hardware resources as well as electricity is thereby wasted. So, instead of running many
such physical computers that are only partially used, one can pack many virtual machines onto a few
powerful hosts and balance the loads between them.
VirtualBox Terminology
When dealing with virtualization, it helps to acquaint oneself with a bit of crucial terminology,
especially the following terms:
Host Operating System (Host OS):
The operating system of the physical computer on which VirtualBox was installed. There are
versions of VirtualBox for Windows, Mac OS X, Linux and Solaris hosts.
Guest Operating System (Guest OS):
The operating system that is running inside the virtual machine.
Virtual Machine (VM):
We’ve used this term often already. It is the special environment that VirtualBox creates for your guest
operating system while it is running. In other words, you run your guest operating system “in” a VM.
Normally, a VM will be shown as a window on your computers desktop, but depending on which of the
various frontends of VirtualBox you use, it can be displayed in full screen mode or remotely on another
computer.
The App Engine requires that apps be written in Java or Python, store data in Google BigTable
and use the Google query language. Non-compliant applications require modification to use App
Engine.
Google App Engine provides more infrastructure than other scalable hosting services such as
Amazon Elastic Compute Cloud (EC2). The App Engine also eliminates some system
administration and developmental tasks to make it easier to write scalable applications.
Google App Engine is free up to a certain amount of resource usage. Users exceeding the per-day
or per-minute usage rates for CPU resources, storage, number of API calls or requests and
concurrent requests can pay for more of these resources.
Modern web applications
Quickly reach customers and end users by deploying web apps on App Engine. With zero-config
deployments and zero server management, App Engine allows you to focus on writing code. Plus, App
Engine automatically scales to support sudden traffic spikes without provisioning, patching, or
monitoring.
Below is a sample reference architecture for building a simple web app using App Engine and Google
Cloud.
Scalable mobile back ends
Whether you’re building your first mobile app or looking to reach existing users via a mobile experience,
App Engine automatically scales the hosting environment for you. Plus, seamless integration with
Firebase provides an easy-to-use frontend mobile platform along with the scalable and reliable backend.
Below is sample reference architecture for a typical mobile app built using both Firebase and App Engine
along with other services in Google Cloud.
Features
Popular languages
Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—or bring your own
language runtime.
Open and flexible
Custom runtimes allow you to bring any library and framework to App Engine by supplying a
Docker container.
Fully managed
A fully managed environment lets you focus on code while App Engine manages infrastructure
concerns.
Powerful application diagnostics
Use Cloud Monitoring and Cloud Logging to monitor the health and performance of your app and
Cloud Debugger and Error Reporting to diagnose and fix bugs quickly.
Application versioning
Easily host different versions of your app, easily create development, test, staging, and production
environments.
Traffic splitting
Route incoming requests to different app versions, A/B test, and do incremental feature rollouts.
Application security
Help safeguard your application by defining access rules with App Engine firewall and leverage
managed SSL/TLS certificates* by default on your custom domain at no additional cost.
Services ecosystem
Tap a growing ecosystem of Google Cloud services from your app including an excellent suite of
cloud developer tools.
You can be sure that your app will be available to users worldwide at all
times since Google has several hundred servers globally. Google’s security
and privacy policies are applicable to the apps developed using Google’s
infrastructure
Quick to Start
With no product or hardware to purchase and maintain, you can prototype and deploy the app to your
users without taking much time.
Easy to Use
Google App Engine (GAE) incorporates the tools that you need to develop, test, launch, and update the
applications.
Scalability
For any app’s success, this is among the deciding factors. Google creates its own apps using GFS,
Big Table and other such technologies, which are available to you when you utilize the Google
app engine to create apps.
You only have to write the code for the app and Google looks after the testing on account of the
automatic scaling feature that the app engine has. Regardless of the amount of data or number of
users that your app stores, the app engine can meet your needs by scaling up or down as required.
o The good thing about Google App Engine as a manageable platform is that it has made it feasible
for our engineers to effortlessly scale up their applications with no-operations skill. It,
additionally, sets us up with the best practices as far as logging, security and releasing
management is concerned.
Performance and Reliability
Google is among the leaders worldwide among global brands. So, when you discuss performance and
reliability you have to keep that in mind. In the past 15 years, the company has created new benchmarks
based on its services’ and products’ performance. The app engine provides the same reliability and
performance as any other Google product.
Cost Savings
You don’t have to hire engineers to manage your servers or to do that yourself. You can invest the money
saved into other parts of your business.
Platform Independence
You can move all your data to another environment without any difficulty as there are not many
dependencies on the app engine platform.
5.5 Programming Environment for Google
Creating a Google Cloud Platform project
To use Google's tools for your own site or app, you need to create a new project on Google Cloud
Platform. This requires having a Google account.
Go to the App Engine dashboard on the Google Cloud Platform Console and press the Create
button.
If you've not created a project before, you'll need to select whether you want to receive email
updates or not, agree to the Terms of Service, and then you should be able to continue.
Enter a name for the project, edit your project ID and note it down. For this tutorial, the following
values are used:
Project Name: GAE Sample Site
Project ID: gaesamplesite
Click the Create button to create your project.
Creating an application
Each Cloud Platform project can contain one App Engine application. Let's prepare an app for our
project.
We'll need a sample application to publish. If you've not got one to use, download and unzip this
sample app.
Have a look at the sample application's structure — the website folder contains your website
content and app.yaml is your application configuration file.
1) Your website content must go inside the website folder, and its landing page must be called
index.html, but apart from that it can take whatever form you like.
2) The app.yaml file is a configuration file that tells App Engine how to map URLs to your static files.
You don't need to edit it.
5.6App Engine
An App Engine app is made up of a single application resource that consists of one or more services.
Each service can be configured to use different runtimes and to operate with different performance
settings. Within each service, you deploy versions of that service. Each version then runs within one or
more instances, depending on how much traffic you configured it to handle.
Components of an application
Your App Engine app is created under your Google Cloud project when you create an application
resource. The App Engine application is a top-level container that includes the service, version, and
instance resources that make up your app. When you create your App Engine app, all your resources are
created in the region that you choose, including your app code along with a collection of settings,
credentials, and your app's metadata.
Each App Engine application includes at least one service, the default service, which can hold as many
versions of that service as you like.
The following diagram illustrates the hierarchy of an App Engine app running with multiple services. In
this diagram, the app has two services that contain multiple versions, and two of those versions are
actively running on multiple instances:
Services
Use services in App Engine to factor your large apps into logical components that can securely share App
Engine features and communicate with one another. Generally, your App Engine services behave like
microservices. Therefore, you can run your whole app in a single service or you can design and deploy
multiple services to run as a set of microservices.
For example, an app that handles your customer requests might include separate services that each
handle different tasks, such as:
API requests from mobile devices
Internal, administration-type requests
Backend processing such as billing pipelines and data analysis
Each service in App Engine consists of the source code from your app and the corresponding App Engine
configuration files. The set of files that you deploy to a service represent a single version of that service
and each time that you deploy to that service, you are creating additional versions within that same
service.
Versions
Having multiple versions of your app within each service allows you to quickly switch between different
versions of that app for rollbacks, testing, or other temporary events. You can route traffic to one or more
specific versions of your app by migrating or splitting traffic.
Instances
The versions within your services run on one or more instances. By default, App Engine scales your app
to match the load. Your apps will scale up the number of instances that are running to provide consistent
performance, or scale down to minimize idle instances and reduces costs. For more information about
instances, see How Instances are Managed.
Application requests
Each of your app's services and each of the versions within those services must have a unique name. You
can then use those unique names to target and route traffic to specific resources using URLs, for example:
https://VERSION_ID-dot-SERVICE_ID-dot-PROJECT_ID.REGION_ID.r.appspot.com
Incoming user requests are routed to the services or versions that are configured to handle traffic. You can
also target and route requests to specific services and versions. For more information, see Handling
Requests.
Logging application requests
When your application handles a request, it can also write its own logging messages to stdout and stderr.
For details about your app's logs, see Writing Application Logs.
Limits
The maximum number of services and versions that you can deploy depends on your app's pricing:
Limit Free app Paid app
Maximum services per app 5 105
Maximum versions per app 15 210
OpenStack is a set of software tools for building and managing cloud computing platforms for public and
private clouds. Backed by some of the biggest companies in software development and hosting, as well as
thousands of individual community members, many think that OpenStack is the future of cloud
computing. OpenStack is managed by the OpenStack Foundation, a non-profit that oversees both
development and community-building around the project.
Introduction to OpenStack
OpenStack lets users deploy virtual machines and other instances that handle different tasks for
managing a cloud environment on the fly. It makes horizontal scaling easy, which means that tasks that
benefit from running concurrently can easily serve more or fewer users on the fly by just spinning up
more instances. For example, a mobile application that needs to communicate with a remote server might
be able to divide the work of communicating with each user across many different instances, all
communicating with one another but scaling quickly and easily as the application gains more users.
And most importantly, OpenStack is open source software, which means that anyone who chooses
to can access the source code, make any changes or modifications they need, and freely share these
changes back out to the community at large. It also means that OpenStack has the benefit of thousands of
developers all over the world working in tandem to develop the strongest, most robust, and most secure
product that they can.
Nova is the primary computing engine behind OpenStack. It is used for deploying and managing large
numbers of virtual machines and other instances to handle computing tasks.
Swift is a storage system for objects and files. Rather than the traditional idea of a referring to files by
their location on a disk drive, developers can instead refer to a unique identifier referring to the file or
piece of information and let OpenStack decide where to store this information. This makes scaling easy,
as developers don’t have the worry about the capacity on a single system behind the software. It also
allows the system, rather than the developer, to worry about how best to make sure that data is backed up
in case of the failure of a machine or network connection.
Cinder is a block storage component, which is more analogous to the traditional notion of a computer
being able to access specific locations on a disk drive. This more traditional way of accessing files might
be important in scenarios in which data access speed is the most important consideration.
Neutron provides the networking capability for OpenStack. It helps to ensure that each of the
components of an OpenStack deployment can communicate with one another quickly and efficiently.
Horizon is the dashboard behind OpenStack. It is the only graphical interface to OpenStack, so for users
wanting to give OpenStack a try, this may be the first component they actually “see.” Developers can
access all of the components of OpenStack individually through an application programming interface
(API), but the dashboard provides system administrators a look at what is going on in the cloud, and to
manage it as needed.
Keystone provides identity services for OpenStack. It is essentially a central list of all of the users of the
OpenStack cloud, mapped against all of the services provided by the cloud, which they have permission
to use. It provides multiple means of access, meaning developers can easily map their existing user access
methods against Keystone.
Glance provides image services to OpenStack. In this case, "images" refers to images (or virtual copies)
of hard disks. Glance allows these images to be used as templates when deploying new virtual machine
instances.
Ceilometer provides telemetry services, which allow the cloud to provide billing services to individual
users of the cloud. It also keeps a verifiable count of each user’s system usage of each of the various
components of an OpenStack cloud. Think metering and usage reporting.
Heat is the orchestration component of OpenStack, which allows developers to store the requirements of
a cloud application in a file that defines what resources are necessary for that application. In this way, it
helps to manage the infrastructure needed for a cloud service to run.
Operating system (OS): OpenStack supports the following operating systems: CentOS, Debian, Fedora,
Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise Server and Ubuntu. Other system
support is provided by different editors or can be developed by porting nova modules on the target
platform.
Cloud federation introduces additional issues that have to be addressed in order to provide a secure
environment in which to move applications and services among a collection of federated providers.
Baseline security needs to be guaranteed across all cloud vendors that are part of the federation.
An interesting aspect is represented by the management of the digital identity across diverse
organizations, security domains, and application platforms. In particular, the term federated identity
management refers to standards-based approaches for handling authentication, single sign-on (SSO), role-
based access control, and session management in a federated environment . This enables users to utilize
services more effectively in a federated context by providing their authentication details only once to log
into a network composed of several entities involved in a transaction. This capability is realized by either
relying on open industry standards or openly published specifications (Liberty Alliance Identity
Federation, OASIS Security Assertion Markup Language, and WS-Federation) such that interoperation
can be achieved. No matter the specific protocol and framework, two main approaches can be considered:
The first model is currently used today; the second constitutes a future vision for identity
management in the cloud.
OpenNebula can be used in conjunction with a reverse proxy to form a cloud bursting hybrid cloud
architecture with load balancing and virtualization support provided by OpenNebula . The OpenNebula
VM controls server allocation in both the EC2 cloud as well as the OpenNebula cloud, while the Nginx
proxy to which the clients are connected distributes load over the web servers both in EC2 as well as the
OpenNebula cloud. In addition to web servers, the EC2 cloud also has its own Nginx load balancer.
Much research work has been developed around OpenNebula. For example, the University of Chicago
has come up with an advance reservation system called Haizea Lease Manager. IBM Haifa has developed
a policy-driven probabilistic admission control and dynamic placement optimization for site level
management policies called the RESERVOIR Policy Engine, Nephele is an SLA-driven automatic
service management tool developed by Telefonica and Virtual Cluster Tool for atomic cluster
management with versioning with multiple transport protocols from CRS4 Distributed Computing Group.
Cloud Federations and Server Coalitions
In large-scale systems, coalition formation supports more effective use of resources, as well as convenient
means to access these resources. It is therefore not surprising that coalition formation for computational
grids has been investigated in the past. There is also little surprise that the interest in coalition formation
migrated in recent years from computational grids to CRM. The interest in grid computing is fading
away, while cloud computing is widely accepted today and its adoption by more and more institutions
and individuals seems to be guaranteed at least for the foreseeable future.
The vast majority of ongoing research in this area is focused on game-theoretic aspects of coalition
formation for cloud federations, while coalitions among the servers of a single cloud has received little
attention in the past. This is likely to change due to the emerging interest in Big Data cloud applications
which require more resources than a single server can provide. To address this problem, sets of identically
configured servers able to communicate effectively among themselves form coalitions with sufficient
resources for data- and computationally intensive problems.
Cloud coalition formation raises a number of technical, as well as nontechnical problems. Cloud
federations require a set of standards. The cloud computing landscape is still evolving and an early
standardization may slowdown and negatively affects the adoption of new ideas and technologies. At the
same time, CSPs want to maintain their competitive advantages by closely guarding the details of their
internal algorithms and protocols.
Reaching agreements on a set of standards is particularly difficult when the infrastructure of the members
of the group is designed to support different cloud delivery models. For example, it is hard to see how the
IaaS could be supported by either SaaS or PaaS clouds. Thus, in spite of the efforts coordinated by the
National Institute of Standards (NIST), the adoption of inter-operability standards supporting cloud
federations seems a rather distant possibility, that resource management in one cloud is extremely
challenging therefore, dynamic resource sharing among multiple cloud infrastructures seems infeasible at
this time. Communication between the members of a cloud federation would also require dedicated
networks with low latency and high bandwidth.
5.9 Four Levels of Federation
Creating a cloud federation involves research and development at different levels: conceptual, logical and
operational, and infrastructural.
Figure provides a comprehensive view of the challenges faced in designing and implementing an
organizational structure that coordinates together cloud services that belong to different administrative
domains and makes them operate within a context of a single unified service middleware.
Each cloud federation level presents different challenges and operates at a different layer of the IT stack.
It then requires the use of different approaches and technologies. Taken together, the solutions to the
challenges faced at each of these levels constitute a reference model for a cloud federation.
CONCEPTUAL LEVEL
The conceptual level addresses the challenges in presenting a cloud federation as a favourable solution
with respect to the use of services leased by single cloud providers. In this level it is important to clearly
identify the advantages for either service providers or service consumers in joining a federation and to
delineate the new opportunities that a federated environment creates with respect to the single-provider
solution.
Elements of concern at this level are:
Motivations for cloud providers to join a federation.
Motivations for service consumers to leverage a federation.
Advantages for providers in leasing their services to other providers.
Obligations of providers once they have joined the federation.
Trust agreements between providers.
Transparency versus consumers.
Among these aspects, the most relevant are the motivations of both service providers and consumers in
joining a federation.
LOGICAL & OPERATIONAL LEVEL
The logical and operational level of a federated cloud identifies and addresses the challenges in
devising a framework that enables the aggregation of providers that belong to different
administrative domains within a context of a single overlay infrastructure, which is the cloud
federation.
At this level, policies and rules for interoperation are defined. Moreover, this is the layer at which
decisions are made as to how and when to lease a service to—or to leverage a service from—
another provider.
The logical component defines a context in which agreements among providers are settled and
services are negotiated, whereas the operational component characterizes and shapes the dynamic
behaviour of the federation as a result of the single providers’ choices.
This is the level where MOCC is implemented and realized. It is important at this level to address the
following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud provider, or an agreement?
• How should we define the rules and policies that allow providers to join a federation?
• What are the mechanisms in place for settling agreements among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which fraction of resources should we lease? The
logical and operational level provides opportunities for both academia and industry.
INFRASTRUCTURE LEVEL
The infrastructural level addresses the technical challenges involved in enabling heterogeneous
cloud computing systems to interoperate seamlessly.
It deals with the technology barriers that keep separate cloud computing systems belonging to different
administrative domains. By having standardized protocols and interfaces, these barriers can be overcome.
The federated cloud model is a force for real democratization in the cloud market. It’s how
businesses will be able to use local cloud providers to connect with customers, partners and employees
anywhere in the world. It’s how end users will finally get to realize the promise of the cloud. And, it’s
how data center operators and other service providers will finally be able to compete with, and beat,
today’s so-called global cloud providers.
The future of cloud computing as one big public cloud. Others believe that enterprises will
ultimately build a single large cloud to host all their corporate services. This is, of course, because the
benefit of cloud computing is dependent on large – very large – scale infrastructure, which provides
administrators and service administrators and consumers the ability for ease of deployment, self service,
elasticity, resource pooling and economies of scale. However, as cloud continues to evolve – so do the
services being offered.
Cloud Services & Hybrid Clouds
Services are now able to reach a wider range of consumers, partners, competitors and public
audiences. It is also clear that storage, compute power, streaming, analytics and other advanced services
are best served when they are in an environment tailored for the proficiency of that service.
One method of addressing the need of these service environments is through the advent of hybrid clouds.
Hybrid clouds, by definition, are composed of multiple distinct cloud infrastructures connected in a
manner that enables services and data access across the combined infrastructure. The intent is to leverage
the additional benefits that hybrid cloud offers without disrupting the traditional cloud benefits. While
hybrid cloud benefits come through the ability to distribute the work stream, the goal is to continue to
realize the ability for managing peaks in demand, to quickly make services available and capitalize on
new business opportunities.
The Solution: Federation
Federation creates a hybrid cloud environment with an increased focus on maintaining the
integrity of corporate policies and data integrity. Think of federation as a pool of clouds connected
through a channel of gateways; gateways which can be used to optimize a cloud for a service or set of
specific services. Such gateways can be used to segment service audiences or to limit access to specific
data sets. In essence, federation has the ability for enterprises to service their audiences with economy of
scale without exposing critical applications or vital data through weak policies or vulnerabilities.
Many would raise the question: if Federation creates multiples of clouds, doesn’t that mean cloud
benefits are diminished? I believe the answer is no, due to the fact that a fundamental change has
transformed enterprises through the original adoption of cloud computing, namely the creation of
a flexible environment able to adapt rapidly to changing needs based on policy and automation.
Cloud end-users are often tied to a unique cloud provider, because of the different APIs, image
formats, and access methods exposed by different providers that make very difficult for an
average user to move its applications from one cloud to another, so leading to a vendor lock-in
problem.
Many SMEs have their own on-premise private cloud infrastructures to support the internal
computing necessities and workloads. These infrastructures are often over-sized to satisfy peak
demand periods, and avoid performance slow-down. Hybrid cloud (or cloud bursting) model is a
solution to reduce the on-premise infrastructure size, so that it can be dimensioned for an average
load, and it is complemented with external resources from a public cloud provider to satisfy peak
demands.
Many big companies (e.g. banks, hosting companies, etc.) and also many large institutions
maintain several distributed data-centers or server-farms, for example to serve to multiple
geographically distributed offices, to implement HA, or to guarantee server proximity to the end
user. Resources and networks in these distributed data-centers are usually configured as non-
cooperative separate elements, so that usually every single service or workload is deployed in a
unique site or replicated in multiple sites.
Many educational and research centers often deploy their own computing infrastructures, that
usually do not cooperate with other institutions, except in same punctual situations (e.g. in joint
projects or initiatives). Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures.cloud federationThis Study Group will
evaluate the main challenges to enable the provision of federated cloud infrastructures, with
special emphasis on inter-cloud networking and security issues:
It is important to bring perspectives from Europe and USA in order to define the basis for an open cloud
market, addressing barriers to adoption and meeting regulatory, legal, geographic, trust and performance
constraints.
This group will directly contribute to the first two key actions of the European Cloud Strategy
”Unleashing the Potential of Cloud Computing in Europe”.
The first key action aims at “Cutting through the Jungle of Standards” to help the adoption of cloud
computing by encouraging compliance of cloud services with respect to standards and thus providing
evidence of compliance to legal and audit obligations. These standards aim to avoid customer lock in by
promoting interoperability, data portability and reversibility.
The second key action “Safe and Fair Contract Terms and Conditions” aims to protect the cloud
consumer from insufficiently specific and balanced contracts with cloud providers that do not “provide
for liability for data integrity, confidentiality or service continuity”. The cloud consumer is often
presented with "take-it-or-leave-it standard contracts that might be cost-saving for the provider but is
often undesirable for the user”. The commission aims to develop with “stakeholders model terms for
cloud computing service level agreements for contracts”.
Interface: Various cloud service providers have different APIs, pricing models and cloud infrastructure.
Open cloud computing interface is necessary to be initiated to provide a common application
programming interface for multiple cloud environments. The simplest solution is to use a software
component that allows the federated system to connect with a given cloud environment. Another solution
can be to perform the federation at Infrastructure level and not at application or service level.
Networking: Virtual machines in the cloud may be located in different network architectures using
different addressing schemes. To interconnect these VMs a virtual network can be formed on the
underlying physical network with uniform IP addressing scheme. When services are running on remote
clouds, main concern is security of the sensitive strategic information running on remote cloud. C.
Heterogeneity of resource: Each cloud service providers offers different VMs with varying processing
memory and storage capacity resulting in unbalanced processing load and system instability. It is likely
that the cloud owner will purchase latest models of hardware available at the time of purchase while it is
unlikely to retire the older model nodes until their useful life is over. This creates heterogeneity.
Trusted Servers
In order to make it easier to find people on other servers we introduced the concept of “trusted servers” as
one of our last steps. This allows administrator to define other servers they trust. If two servers trust each
other they will sync their user lists. This way the share dialogue can auto-complete not only local users
but also users on other trusted servers. The administrator can decide to define the lists of trusted servers
manually or allow the server to auto add every other server to which at least one federated share was
successfully created. This way it is possible to let your cloud server learn about more and more other
servers over time, connect with them and increase the network of trusted servers.