0% found this document useful (0 votes)
29 views195 pages

Cloud Computing FINAL

The document outlines the syllabus for a Cloud Computing course in an M.Sc program, covering topics such as cloud architecture, virtualization, high-throughput computing, and data-intensive computing. It includes learning outcomes for each unit, a model paper for assessments, and a brief history of cloud computing's evolution. Additionally, it discusses various cloud service models (IaaS, PaaS, SaaS) and notable cloud platforms like AWS and Microsoft Azure.

Uploaded by

ksssandilya504
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views195 pages

Cloud Computing FINAL

The document outlines the syllabus for a Cloud Computing course in an M.Sc program, covering topics such as cloud architecture, virtualization, high-throughput computing, and data-intensive computing. It includes learning outcomes for each unit, a model paper for assessments, and a brief history of cloud computing's evolution. Additionally, it discusses various cloud service models (IaaS, PaaS, SaaS) and notable cloud platforms like AWS and Microsoft Azure.

Uploaded by

ksssandilya504
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

[Type here]

M.Sc (Computers)
SEM-IV
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Subject: Cloud Computing 1|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
UNIT-I
SYLLABUS

Introduction Cloud computing at a glance, Historical Developments, Building


Cloud Computing Environments, Computing Platforms and Technologies.
Principles of Parallel and Distributed Computing Eras of Computing, Parallel Vs
Distributed computing, Elements of Parallel Computing, Elements of Distributed
Computing, Technologies for Distributed Computing.
Virtualization Introduction, Characteristics of Virtualized Environments,
Taxonomy of Virtualization Techniques, Virtualization and Cloud Computing,
Pros and Cons of Virtualization, Technology Examples.

Learning Outcomes
Students upon completion of this unit will be able to
• Understand the concepts of Parallel and Distributed Computing.
• Describe importance of virtualization along with their technologies

UNIT-II

Cloud Computing Architecture Introduction, Cloud reference model, Types of


clouds, Economics of the cloud, open challenges.
Aneka Cloud Application Platform Framework Overview, Anatomy of the
Aneka Container, Building Aneka Clouds, Cloud programming and Management.
Concurrent Computing Thread Programming Introducing Parallelism for Single
machine Computation, Programming Application with Threads, Multithreading
with Aneka, Programming Applications with Aneka Threads.

Learning Outcomes
Students upon completion of this unit will be able to
• identify the architecture and infrastructure of cloud computing, including
SaaS, PaaS, IaaS, public cloud, private cloud, hybrid cloud, etc.
• understand concurrent programming in cloud computing.

UNIT-III

High- Throughput Computing Task Programming Task Computing, Task based


Application Models, Aneka Task-Based Programming.
Data Intensive Computing Map-Reduce Programming What is Data Intensive
Computing, Technologies for Data-Intensive Computing, Aneka MapReduce
Programming.

Subject: Cloud Computing 2|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Learning Outcomes
Students upon completion of this unit will be able to
• Understanding the high throughput Computing
• Understanding data intensive computing

UNIT-IV

Cloud Platforms in Industry Amazon Web Services, Google App Engine,


Microsoft Azure, Observations.
Cloud Applications Scientific Applications, Business and Consumer Applications.
Advanced Topics in Cloud Computing Energy Efficiency in Clouds, Market Based
Management of Clouds, Federated Clouds/ Inter Cloud, Third Party Cloud
Services.

Learning Outcomes
Students upon completion of this unit will be able to
• Understanding the key dimensions of the challenge of Cloud Computing

Prescribed Book
Rajkumar Buyya, Christian Vecchiola, S.Thamarai Selvi, "Mastering Cloud
Computing", Mc Graw Hill Education.

REFERENCES
1. Michael Miller, “Cloud Computing”, Pearson Education, New

Subject: Cloud Computing 3|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
MODEL PAPER
MCA 402.2 Cloud Computing
Time 3 Hrs Max. Marks 70

Answer Question No.1 Compulsory 7 x 2 = 14 M


Answer ONE Question from each unit 4 x 14 = 56 M
1.What is Service-Oriented Computing?
a) Define a Distributed System?
b) Give an example for full virtualization and brief about it.
c) What is a hybrid cloud?
d) Scalability
e) Give two examples of cloud applications in CRM and ERP.
f) What is a MOCC?
UNIT – I
2. Discuss about the historic developments from early computing to the
contemporary cloud computing.
OR
3. a) What are characteristics of Virtualization?
b) Discuss about Machine Reference Model.

UNIT – II

4. a)Discuss about the cloud architecture.


b) What are different types of clouds? Explain.
OR

5. a) Explain about Aneka Framework overview.


b) Discuss about Aneka SDK.

UNIT - III

6. a)What is Task computing and what are its frame works?


b)Discuss about Task based application models.
OR
7. a) What is Data Intensive Computing? Explain about its characteristics.
b) What are the technologies required for Data Intensive computing? Explain
about them.

UNIT – IV
8. Discuss about Amazon Web Services.
OR
9.Give a reference model for MOCC. What are the technologies for MOCC?

Subject: Cloud Computing 4|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Unit- I
Q1) Introduction to Cloud Computing

Cloud Computing is the delivery of computing services such as servers,


storage, databases, networking, software, analytics, intelligence, and more,
over the Cloud (Internet).

Cloud Computing provides an alternative to the on-premises datacentre. With


an on-premises datacentre, we have to manage everything, such as purchasing
and installing hardware, virtualization, installing the operating system, and any
other required applications, setting up the network, configuring the firewall, and
setting up storage for data. After doing all the set-up, we become responsible
for maintaining it through its entire lifecycle.

But if we choose Cloud Computing, a cloud vendor is responsible for the


hardware purchase and maintenance. They also provide a wide variety of
software and platform as a service. We can take any required services on rent.
The cloud computing services will be charged based on usage.

Subject: Cloud Computing 5|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

The cloud environment provides an easily accessible online portal that makes
handy for the user to manage the compute, storage, network, and application
resources. Some cloud service providers are in the following figure.

Subject: Cloud Computing 6|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Advantages of cloud computing
o Cost: It reduces the huge capital costs of buying hardware and software.
o Speed: Resources can be accessed in minutes, typically within a few clicks.
o Scalability: We can increase or decrease the requirement of resources
according to the business requirements.
o Productivity: While using cloud computing, we put less operational effort.
We do not need to apply patching, as well as no need to maintain
hardware and software. So, in this way, the IT team can be more
productive and focus on achieving business goals.
o Reliability: Backup and recovery of data are less expensive and very fast
for business continuity.
o Security: Many cloud vendors offer a broad set of policies, technologies,
and controls that strengthen our data security.

Types of Cloud Computing

o Public Cloud: The cloud resources that are owned and operated by a
third-party cloud service provider are termed as public clouds. It delivers
computing resources such as servers, software, and storage over the
internet
o Private Cloud: The cloud computing resources that are exclusively used
inside a single business or organization are termed as a private cloud. A
Subject: Cloud Computing 7|Page
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
private cloud may physically be located on the company’s on-site
datacentre or hosted by a third-party service provider.
o Hybrid Cloud: It is the combination of public and private clouds, which is
bounded together by technology that allows data applications to be
shared between them. Hybrid cloud provides flexibility and more
deployment options to the business.

Types of Cloud Services

1. Infrastructure as a Service (IaaS): In IaaS, we can rent IT infrastructures


like servers and virtual machines (VMs), storage, networks, operating
systems from a cloud service vendor. We can create VM running Windows
or Linux and install anything we want on it. Using IaaS, we don’t need to
care about the hardware or virtualization software, but other than that,
we do have to manage everything else. Using IaaS, we get maximum
flexibility, but still, we need to put more effort into maintenance.
2. Platform as a Service (PaaS): This service provides an on-demand
environment for developing, testing, delivering, and managing software
applications. The developer is responsible for the application, and the
PaaS vendor provides the ability to deploy and run it. Using PaaS, the
flexibility gets reduce, but the management of the environment is taken
care of by the cloud vendors.
3. Software as a Service (SaaS): It provides a centrally hosted and managed
software services to the end-users. It delivers software over the internet,
on-demand, and typically on a subscription basis. E.g., Microsoft One

Subject: Cloud Computing 8|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Drive, Dropbox, WordPress, Office 365, and Amazon Kindle. SaaS is used
to minimize the operational cost to the maximum extent.

Q2) History of Cloud Computing

Before emerging the cloud computing, there was Client/Server computing


which is basically a centralized storage in which all the software applications, all
the data and all the controls are resided on the server side.

If a single user wants to access specific data or run a program, he/she need to
connect to the server and then gain appropriate access, and then he/she can do
his/her business.

Then after, distributed computing came into picture, where all the computers
are networked together and share their resources when needed.

On the basis of above computing, there was emerged of cloud computing


concepts that later implemented.

At around in 1961, John MacCharty suggested in a speech at MIT that computing


can be sold like a utility, just like a water or electricity. It was a brilliant idea, but

Subject: Cloud Computing 9|Page


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
like all brilliant ideas, it was ahead if its time, as for the next few decades, despite
interest in the model, the technology simply was not ready for it.

But of course time has passed and the technology caught that idea and after few
years we mentioned that:

In 1999, Salesforce.com started delivering of applications to users using a simple


website. The applications were delivered to enterprises over the Internet, and
this way the dream of computing sold as utility were true.

In 2002, Amazon started Amazon Web Services, providing services like storage,
computation and even human intelligence. However, only starting with the
launch of the Elastic Compute Cloud in 2006 a truly commercial service open to
everybody existed.

In 2009, Google Apps also started to provide cloud computing enterprise


applications.

Of course, all the big players are present in the cloud computing evolution,
some were earlier, some were later. In 2009, Microsoft launched Windows
Azure, and companies like Oracle and HP have all joined the game. This proves
that today, cloud computing has become mainstream.

Cloud computing is all about renting computing services. This idea first
came in the 1950s. In making cloud computing what it is today, five
technologies played a vital role. These are distributed systems and its
peripherals, virtualization, web 2.0, service orientation, and utility computing.

Subject: Cloud Computing 10 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

• Distributed Systems:
It is a composition of multiple independent systems but all of them are
depicted as a single entity to the users. The purpose of distributed
systems is to share resources and also use them effectively and
efficiently. Distributed systems possess characteristics such as scalability,
concurrency, continuous availability, heterogeneity, and independence in
failures. But the main problem with this system was that all the systems
were required to be present at the same geographical location. Thus to
solve this problem, distributed computing led to three more types of
computing and they were-Mainframe computing, cluster computing, and
grid computing.

• Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful
and reliable computing machines. These are responsible for handling
large data such as massive input-output operations. Even today these are
used for bulk processing tasks such as online transactions etc. These
systems have almost no downtime with high fault tolerance. After
distributed computing, these increased the processing capabilities of the
system. But these were very expensive. To reduce this cost, cluster
computing came as an alternative to mainframe technology.

• Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe
Subject: Cloud Computing 11 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
computing. Each machine in the cluster was connected to each other by a
network with high bandwidth. These were way cheaper than those
mainframe systems. These were equally capable of high computations.
Also, new nodes could easily be added to the cluster if it was required.
Thus, the problem of the cost was solved to some extent but the problem
related to geographical restrictions still pertained. To solve this, the
concept of grid computing was introduced.

• Grid computing:
In 1990s, the concept of grid computing was introduced. It means that
different systems were placed at entirely different geographical locations
and these all were connected via the internet. These systems belonged to
different organizations and thus the grid consisted of heterogeneous
nodes. Although it solved some problems but new problems emerged as
the distance between the nodes increased. The main problem which was
encountered was the low availability of high bandwidth connectivity and
with it other network associated issues. Thus. cloud computing is often
referred to as “Successor of grid computing”.

• Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating
a virtual layer over the hardware which allows the user to run multiple
instances simultaneously on the hardware. It is a key technology used in
cloud computing. It is the base on which major cloud computing services
such as Amazon EC2, VMware vCloud, etc work on. Hardware
virtualization is still one of the most common types of virtualization.

• Web 2.0:
It is the interface through which the cloud computing services interact
with the clients. It is because of Web 2.0 that we have interactive and
dynamic web pages. It also increases flexibility among web pages. Popular
examples of web 2.0 include Google Maps, Facebook, Twitter, etc.
Needless to say, social media is possible because of this technology only.
In gained major popularity in 2004.

• Service orientation:
It acts as a reference model for cloud computing. It supports low-cost,
flexible, and evolvable applications. Two important concepts were
introduced in this computing model. These were Quality of Service (QoS)

Subject: Cloud Computing 12 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
which also includes the SLA (Service Level Agreement) and Software as a
Service (SaaS).

• Utility computing:
It is a computing model that defines service provisioning techniques for
services such as compute services along with other major services such as
storage, infrastructure, etc which are provisioned on a pay-per-use basis.
Thus, the above technologies contributed to the making of cloud computing.
Road map of Cloud computing
✓ In 1950 the main frame and time sharing are born, introducing the concept of
shared computer resources.
✓ During this time word cloud was not in use.
✓ Cloud computing is believed to have been invented by Joseph Carl Robnett
Licklider in the 1960s with his work on ARPANET to connect people and data
from anywhere at any time.
✓ In 1969 the first working prototype of ARPANET is launched.
✓ In 1970 the word “Client-Server” come in to use.
✓ Client server defines the computing model where client access the data and
applications from a central server.
✓ In 1995, pictures of cloud are started showing in diagrams, for not technical
people to understand.
✓ At that time AT & T had already begun to develop an architecture and system
where data would be located centrally.
✓ In 1999 the salesforce.com was launched, the first company to make enterprise
applications available from a website.
✓ In 1999, the search engine Google launches.
✓ In 1999, Netflix was launched, introducing the new revenue way.
✓ In 2003, web2.0 is born, which is characterized by rich multimedia. Now user
can generate content.
✓ In 2004 Facebook launches giving users facility to share themselves.
✓ In 2006, Amazon launched Amazon Web Services(AWS), giving users a new
way.
✓ In2006, Google CEO Eric Schmidt uses the word “cloud” as an industry event.
✓ In 2007, Apple launches iPhone, which could be used on any wireless network.
✓ In 2007, Netflix launches streaming services, and live video watching is born.
✓ In 2008, private cloud come in to existence.

Subject: Cloud Computing 13 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
✓ In 2009, browser based application like google apps are introduced.
✓ In 2010, hybrid cloud (private+public cloud) comes in to existence.
✓ In 2012, Google launches google drive with free cloud storage.
✓ Now cloud adoption is present, which makes cloud computing more stronger.
✓ The IT services progressed over the decades with the adoption of technologies
such as Internet Service Providers (ISP) Application Service Providers.

Q3) Building Cloud Computing Environments


For building a dedicated cloud infrastructure, there are several key
requirements are needed to achieve it. Also, it is important to go for the best
hosting providers for this since we have to invest a good amount of resources in
it.

Steps for Building a Cloud Computing Infrastructure –

#1: First you should decide which technology will be the basis for your on-
demand application infrastructure
The decision related to which virtualization technology will be the organizational
standard is already made. But decide before you start if it has not made. There
are cons and pros to both a homogeneous and heterogeneous virtualization
infrastructure and the decision will impact the ability to monitor and manage
infrastructure later. So make this decision first.

Subject: Cloud Computing 14 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
#2: Determine what delivery infrastructure you will be used to abstract the
application infrastructure
Cloud infrastructure’s the on-demand capabilities are first designed to do two
things: make efficient use of resources and ensure scalability. Some method of
load balancing/application delivery will be necessary to accomplish the former.

To abstract the applications, this layer of the architecture will helpful and will
provide a consistent means of access to users, shielding them from the high rate
of change which occurs in the infrastructure.

The delivery infrastructure or load balancer will need to be included in the


provisioning process and to provide visibility into application capacity,
performance as well as resource management.

Many solutions are there to match your choice but select one solution in which
you will integrate the system into the architecture. Also, you need to verify the
solution that whether it is capable of providing the visibility you will need into
performance metrics or not. So, decide what metrics and thresholds you’ll use
to trigger provisioning processes and ensure that the infrastructure can support
it.

Related read – Cloud Infrastructure: 5 Ways to Keep It Secure


#3: Prepare the network infrastructure
Prepare the network to deal with an on-demand application infrastructure.
Application delivery: must be configured correctly for the application being
deployed, Hardware: network, storage. The network must be configured to deal
with such change without requiring human intervention and must be able to
handle applications which migrate from hardware to hardware. For this, the
network will require constant optimization to adapt to changing traffic patterns.

#4: Provide visibility and automation of management tasks


Remember, visibility is a key to an on-demand infrastructure. An associated
management systems and the infrastructure must know what is running, when
and where to evaluate available resources. Find how you will collect the data
and from where. Decide which system is authoritative for each metric and verify
to feed that information in real-time to the automation system.

#5: Integrate all the moving parts, such that the infrastructure and realizes the
benefits of automation, abstraction and resource sharing
The last one is the most difficult part and requires the previous steps be
completed. The integration, automation of all the pieces of the infrastructure
like storage, network, and application enable the infrastructure to act on-
Subject: Cloud Computing 15 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
demand. The realization of cost-reduction benefits will be marginalized without
automation. The integration step automates workflow. Automation requires
constant monitoring across the application infrastructure from the network
layer to the applications executing in the environment.

In many cases, this integration may require a custom solution. if you are an early
adopter it may be necessary to build management system and an automation
framework yourself.

Q4) Computing Platforms and Technologies

Cloud computing applications develops by leveraging platforms and


frameworks. Various types of services are provided from the bare metal
infrastructure to customize-able applications serving specific purposes.

Amazon Web Services (AWS) –

AWS provides different wide-ranging clouds IaaS services, which ranges


from virtual compute, storage, and networking to complete computing stacks.
AWS is well known for its storage and compute on demand services, named as
Elastic Compute Cloud (EC2) and Simple Storage Service (S3). EC2 offers
customizable virtual hardware to the end user which can be utilize as the base
infrastructure for deploying computing systems on the cloud. It is likely to
choose from a large variety of virtual hardware configurations including GPU
and cluster instances. Either the AWS console, which is a wide-ranged Web
portal for retrieving AWS services, or the web services API available for several
programming language is used to deploy the EC2 instances. EC2 also offers the
capability of saving an explicit running instance as image, thus allowing users
to create their own templates for deploying system. S3 stores these templates
and delivers persistent storage on demand. S3 is well ordered into buckets
which contains objects that are stored in binary form and can be grow with
attributes. End users can store objects of any size, from basic file to full disk
images and have them retrieval from anywhere. In addition, EC2 and S3, a wide
range of services can be leveraged to build virtual computing system including:
networking support, caching system, DNS, database support, and others.

Subject: Cloud Computing 16 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Google AppEngine –

Google AppEngine is a scalable runtime environment frequently


dedicated to executing web applications. These utilize benefits of the large
computing infrastructure of Google to dynamically scale as per the demand.
AppEngine offers both a secure execution environment and a collection of
which simplifies the development if scalable and high-performance Web
applications. These services include: in-memory caching, scalable data store,
job queues, messaging, and corn tasks. Developers and Engineers can build and
test applications on their own systems by using the AppEngine SDK, which
replicates the production runtime environment, and helps test and profile
applications. On completion of development, Developers can easily move their
applications to AppEngine, set quotas to containing the cost generated, and
make it available to the world. Currently, the supported programming
languages are Python, Java, and Go.

Microsoft Azure –

Microsoft Azure is a Cloud operating system and a platform in which user


can develop the applications in the cloud. Generally, a scalable runtime
environment for web applications and distributed applications is provided.
Application in Azure are organized around the fact of roles, which identify a
distribution unit for applications and express the application’s logic. Azure
provides a set of additional services that complement application execution
such as support for storage, networking, caching, content delivery, and others.

Hadoop –

Apache Hadoop is an open source framework that is appropriate for


processing large data sets on commodity hardware. Hadoop is an
implementation of MapReduce, an application programming model which is
developed by Google. This model provides two fundamental operations for
data processing: map and reduce. Yahoo! Is the sponsor of the Apache Hadoop
project, and has put considerable effort in transforming the project to an
enterprise-ready cloud computing platform for data processing. Hadoop is an
integral part of the Yahoo! Cloud infrastructure and it supports many business
processes of the corporates. Currently, Yahoo! Manges the world’s largest
Hadoop cluster, which is also available to academic institutions.

Subject: Cloud Computing 17 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Force.com and Salesforce.com –

Force.com is a Cloud computing platform at which user can develop


social enterprise applications. The platform is the basis of SalesForce.com – a
Software-as-a-Service solution for customer relationship management.
Force.com allows creating applications by composing ready-to-use blocks: a
complete set of components supporting all the activities of an enterprise are
available. From the design of the data layout to the definition of business rules
and user interface is provided by Force.com as a support. This platform is
completely hostel in the Cloud, and provides complete access to its
functionalities, and those implemented in the hosted applications through
Web services technologies.

Q5) Principles of Parallel and Distributed Computing

What is Parallel Computing?


Parallel computing is a model that divides a task into multiple sub-tasks and
executes them simultaneously to increase the speed and efficiency.
Here, a problem is broken down into multiple parts. Each part is then broke
down into a number of instructions.
These parts are allocated to different processors which execute them
simultaneously. This increases the speed of execution of programs as a whole.
What is Distributed Computing?
Distributed computing is different than parallel computing even though
the principle is the same.
Distributed computing is a field that studies distributed systems. Distributed
systems are systems that have multiple computers located in different locations.
These computers in a distributed system work on the same program. The
program is divided into different tasks and allocated to different computers.
The computers communicate with the help of message passing. Upon
completion of computing, the result is collated and presented to the user.
Distributed Computing vs. Parallel Computing: A Quick Comparison
Having covered the concepts, let’s dive into the differences between
them:
Number of Computer Systems Involved
Parallel computing generally requires one computer with multiple
processors. Multiple processors within the same computer system execute
instructions simultaneously.
Subject: Cloud Computing 18 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
All the processors work towards completing the same task. Thus they have to
share resources and data.
In distributed computing, several computer systems are involved. Here multiple
autonomous computer systems work on the divided tasks.
These computer systems can be located at different geographical locations as
well.
Dependency Between Processes
In parallel computing, the tasks to be solved are divided into multiple smaller
parts. These smaller tasks are assigned to multiple processors.
Here the outcome of one task might be the input of another. This increases
dependency between the processors. We can also say, parallel computing
environments are tightly coupled.
Some distributed systems might be loosely coupled, while others might be
tightly coupled.
Which is More Scalable?
In parallel computing environments, the number of processors you can add is
restricted. This is because the bus connecting the processors and the memory
can handle a limited number of connections.
There are limitations on the number of processors that the bus connecting them
and the memory can handle. This limitation makes the parallel systems less
scalable.
Distributed computing environments are more scalable. This is because the
computers are connected over the network and communicate by passing
messages.
Resource Sharing
In systems implementing parallel computing, all the processors share the
same memory.
They also share the same communication medium and network. The processors
communicate with each other with the help of shared memory.
Distributed systems, on the other hand, have their own memory and processors.
Synchronization
In parallel systems, all the processes share the same master clock for
synchronization. Since all the processors are hosted on the same physical
system, they do not need any synchronization algorithms.
In distributed systems, the individual processing systems do not have access to
any central clock. Hence, they need to implement synchronization algorithms.

Subject: Cloud Computing 19 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Where Are They Used?
Parallel computing is often used in places requiring higher and faster
processing power. For example, supercomputers.
Since there are no lags in the passing of messages, these systems have high
speed and efficiency.
Distributed computing is used when computers are located at different
geographical locations.
In these scenarios, speed is generally not a crucial matter. They are the preferred
choice when scalability is required.
Distributed Computing vs. Parallel Computing’s Tabular Comparison

All in all, we can say that both computing methodologies are needed. Both
serve different purposes and are handy based on different circumstances.

Q6) Parallel versus distributed computing


While both distributed computing and parallel systems are widely
available these days, the main difference between these two is that a parallel
Subject: Cloud Computing 20 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
computing system consists of multiple processors that communicate with each
other using a shared memory, whereas a distributed computing system contains
multiple processors connected by a communication network.

In parallel computing systems, as the number of processors increases, with


enough parallelism available in applications, such systems easily beat sequential
systems in performance through the shared memory. In such systems, the
processors can also contain their own locally allocated memory, which is not
available to any other processors.

In distributed computing systems, multiple system processors can communicate


with each other using messages that are sent over the network. Such systems
are increasingly available these days because of the availability at low price of
computer processors and the high-bandwidth links to connect them.

The following reasons explain why a system should be built distributed, not just
parallel:

• Scalability: As distributed systems do not have the problems associated


with shared memory, with the increased number of processors, they are
obviously regarded as more scalable than parallel systems.
• Reliability: The impact of the failure of any single subsystem or a
computer on the network of computers defines the reliability of such a
connected system. Definitely, distributed systems demonstrate a better
aspect in this area compared to the parallel systems.
• Data sharing: Data sharing provided by distributed systems is similar to
the data sharing provided by distributed databases. Thus, multiple
organizations can have distributed systems with the integrated
applications for data exchange.

Subject: Cloud Computing 21 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Resources sharing: If there exists an expensive and a special purpose
resource or a processor, which cannot be dedicated to each processor in
the system, such a resource can be easily shared across distributed
systems.
• Heterogeneity and modularity: A system should be flexible enough to
accept a new heterogeneous processor to be added into it and one of the
processors to be replaced or removed from the system without affecting
the overall system processing capability. Distributed systems are
observed to be more flexible in this respect.
• Geographic construction: The geographic placement of different
subsystems of an application may be inherently placed as distributed.
Local processing may be forced by the low communication bandwidth
more specifically within a wireless network.
• Economic: With the evolution of modern computers, high-bandwidth
networks and workstations are available at low cost, which also favors
distributed computing for economic reasons.

Q7) Elements of Parallel Computing?

What is Parallel Computing?

Parallel computing refers to the process of executing several processors


an application or computation simultaneously. Generally, it is a kind of
computing architecture where the large problems break into independent,
smaller, usually similar parts that can be processed in one go. It is done by
multiple CPUs communicating via shared memory, which combines results upon
completion. It helps in performing large computations as it divides the large
problem between more than one processor.

Parallel computing also helps in faster application processing and task resolution
by increasing the available computation power of systems. The parallel
Subject: Cloud Computing 22 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
computing principles are used by most supercomputers employ to operate. The
operational scenarios that need massive processing power or computation,
generally, parallel processing is commonly used there.

Typically, this infrastructure is housed where various processors are installed in


a server rack; the application server distributes the computational requests into
small chunks then the requests are processed simultaneously on each server.
The earliest computer software is written for serial computation as they are able
to execute a single instruction at one time, but parallel computing is different
where it executes several processors an application or computation in one time.

There are many reasons to use parallel computing, such as save time and money,
provide concurrency, solve larger problems, etc. Furthermore, parallel
computing reduces complexity. In the real-life example of parallel computing,
there are two queues to get a ticket of anything; if two cashiers are giving tickets
to 2 persons simultaneously, it helps to save time as well as reduce complexity.

Types of parallel computing

From the open-source and proprietary parallel computing vendors, there are
generally three types of parallel computing available, which are discussed
below:

1. Bit-level parallelism: The form of parallel computing in which every task


is dependent on processor word size. In terms of performing a task on
large-sized data, it reduces the number of instructions the processor must
execute. There is a need to split the operation into series of instructions.
For example, there is an 8-bit processor, and you want to do an operation
on 16-bit numbers. First, it must operate the 8 lower-order bits and then
the 8 higher-order bits. Therefore, two instructions are needed to execute
the operation. The operation can be performed with one instruction by a
16-bit processor.
2. Instruction-level parallelism: In a single CPU clock cycle, the processor
decides in instruction-level parallelism how many instructions are
implemented at the same time. For each clock cycle phase, a processor in
instruction-level parallelism can have the ability to address that is less
than one instruction. The software approach in instruction-level
parallelism functions on static parallelism, where the computer decides
which instructions to execute simultaneously.
3. Task Parallelism: Task parallelism is the form of parallelism in which the
tasks are decomposed into subtasks. Then, each subtask is allocated for
Subject: Cloud Computing 23 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
execution. And, the execution of subtasks is performed concurrently by
processors.

Applications of Parallel Computing

There are various applications of Parallel Computing, which are as follows:

o One of the primary applications of parallel computing is Databases and


Data mining.
o The real-time simulation of systems is another use of parallel computing.
o The technologies, such as Networked videos and Multimedia.
o Science and Engineering.
o Collaborative work environments.
o The concept of parallel computing is used by augmented reality, advanced
graphics, and virtual reality.

Advantages of Parallel computing

Parallel computing advantages are discussed below:

o In parallel computing, more resources are used to complete the task that
led to decrease the time and cut possible costs. Also, cheap components
are used to construct parallel clusters.
o Comparing with Serial Computing, parallel computing can solve larger
problems in a short time.
o For simulating, modeling, and understanding complex, real-world
phenomena, parallel computing is much appropriate while comparing
with serial computing.
o When the local resources are finite, it can offer benefit you over non-local
resources.
o There are multiple problems that are very large and may impractical or
impossible to solve them on a single computer; the concept of parallel
computing helps to remove these kinds of issues.
o One of the best advantages of parallel computing is that it allows you to
do several things in a time by using multiple computing resources.
o Furthermore, parallel computing is suited for hardware as serial
computing wastes the potential computing power.

Subject: Cloud Computing 24 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Disadvantages of Parallel Computing

There are many limitations of parallel computing, which are as follows:

o It addresses Parallel architecture that can be difficult to achieve.


o In the case of clusters, better cooling technologies are needed in parallel
computing.
o It requires the managed algorithms, which could be handled in the parallel
mechanism.
o The multi-core architectures consume high power consumption.
o The parallel computing system needs low coupling and high cohesion,
which is difficult to create.
o The code for a parallelism-based program can be done by the most
technically skilled and expert programmers.

o Although parallel computing helps you out to resolve computationally and


the data-exhaustive issue with the help of using multiple processors,
sometimes it affects the conjunction of the system and some of our
control algorithms and does not provide good outcomes due to the
parallel option.
o Due to synchronization, thread creation, data transfers, and more, the
extra cost sometimes can be quite large; even it may be exceeding the
gains because of parallelization.
o Moreover, for improving performance, the parallel computing system
needs different code tweaking for different target architectures

Fundamentals of Parallel Computer Architecture

Parallel computer architecture is classified on the basis of the level at


which the hardware supports parallelism. There are different classes of parallel
computer architectures, which are as follows:

Multi-core computing

A computer processor integrated circuit containing two or more distinct


processing cores is known as a multi-core processor, which has the capability of
executing program instructions simultaneously. Cores may implement
Subject: Cloud Computing 25 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
architectures like VLIW, superscalar, multithreading, or vector and are
integrated on a single integrated circuit die or onto multiple dies in a single chip
package. Multi-core architectures are classified as heterogeneous that consists
of cores that are not identical, or they are categorized as homogeneous that
consists of only identical cores.

Symmetric multiprocessing

In Symmetric multiprocessing, a single operating system handles


multiprocessor computer architecture having two or more homogeneous,
independent processors that treat all processors equally. Each processor can
work on any task without worrying about the data for that task is available in
memory and may be connected with the help of using on-chip mesh networks.
Also, all processor contains a private cache memory.

Distributed computing

On different networked computers, the components of a distributed


system are located. These networked computers coordinate their actions with
the help of communicating through HTTP, RPC-like message queues, and
connectors. The concurrency of components and independent failure of
components are the characteristics of distributed systems. Typically, distributed
programming is classified in the form of peer-to-peer, client-server, n-tier, or
three-tier architectures. Sometimes, the terms parallel computing and
distributed computing are used interchangeably as there is much overlap
between both.

Massively parallel computing

In this, several computers are used simultaneously to execute a set of


instructions in parallel. Grid computing is another approach where numerous
distributed computer system execute simultaneously and communicate with
the help of the Internet to solve a specific problem.

Why parallel computing?

There are various reasons why we need parallel computing, such are discussed
below:

o Parallel computing deals with larger problems. In the real world, there are
multiple things that run at a certain time but at numerous places
simultaneously, which is difficult to manage. In this case, parallel
computing helps to manage this kind of extensively huge data.

Subject: Cloud Computing 26 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o Parallel computing is the key to make data more modeling, dynamic
simulation and for achieving the same. Therefore, parallel computing is
needed for the real world too.
o With the help of serial computing, parallel computing is not ideal to
implement real-time systems; also, it offers concurrency and saves time
and money.
o Only the concept of parallel computing can organize large datasets,
complex, and their management.
o The parallel computing approach provides surety the use of resources
effectively and guarantees the effective use of hardware, whereas only
some parts of hardware are used in serial computation, and some parts
are rendered idle.

Future of Parallel Computing

From serial computing to parallel computing, the computational graph


has completely changed. Tech giant likes Intel has already started to include
multicore processors with systems, which is a great step towards parallel
computing. For a better future, parallel computation will bring a revolution in
the way of working the computer. Parallel Computing plays an important role in
connecting the world with each other more than before. Moreover, parallel
computing's approach becomes more necessary with multi-processor
computers, faster networks, and distributed systems.

Difference Between serial computation and Parallel Computing

Serial computing refers to the use of a single processor to execute a


program, also known as sequential computing, in which the program is divided
into a sequence of instructions, and each instruction is processed one by one.
Traditionally, the software offers a simpler approach as it has been programmed
sequentially, but the processor's speed significantly limits its ability to execute
each series of instructions. Also, sequential data structures are used by the uni-
processor machines in which data structures are concurrent for parallel
computing environments.

As compared to benchmarks in parallel computing, in sequential programming,


measuring performance is far less important and complex because it includes
identifying bottlenecks in the system. With the help of benchmarking and
performance regression testing frameworks, benchmarks can be achieved in
parallel computing. These testing frameworks include a number of
measurement methodologies like multiple repetitions and statistical treatment.

Subject: Cloud Computing 27 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
With the help of moving data through the memory hierarchy, the ability to avoid
this bottleneck is mainly evident in parallel computing. Parallel computing
comes at a greater cost and may be more complex. However, parallel computing
deals with larger problems and helps to solve problems faster.

History of Parallel Computing

Throughout the '60s and '70s, with the advancements of supercomputers,


the interest in parallel computing dates back to the late 1950s. On the need for
branching and waiting and the parallel programming, Stanley Gill (Ferranti)
discussed in April 1958. Also, on the use of the first parallelism in numerical
calculations, IBM researchers Daniel Slotnick and John Cocke discussed in the
same year 1958.

In 1962, a four-processor computer, the D825, was released by Burroughs


Corporation. A conference, the American Federation of Information Processing
Societies, was held in 1967 in which a debate about the feasibility of parallel
processing was published by Amdahl and Slotnick. Asymmetric multiprocessor
system, the first Multics system of Honeywell, was introduced in 1969, which
was able to run up to eight processors in parallel.

In the 1970s, a multi-processor project, C.mmp, was among the first


multiprocessors with more than a few processors at Carnegie Mellon University.
During this project, for scientific applications from 64 Intel 8086/8087
processors, a supercomputer was launched, and a new type of parallel
computing was started. In 1984, the Synapse N+1, with snooping caches, was
the first bus-connected multiprocessor. For the Lawrence Livermore National
Laboratory, building on the large-scale parallel computer had proposed by
Slotnick in 1964. The ILLIAC IV was the earliest SIMD parallel-computing effort,
which was designed by the US Air Force.

Q8) Distributed Computing

• Distributed computing is a method of computer processing in which


different parts of a computer program are run on two or more
computers that are communicating with each other over a network.
• According to the narrowest definition, distributed computing is limited
to programs whose components are shared by computers in a limited
geographic area. The broader definitions include shared tasks as well as
program components.

Subject: Cloud Computing 28 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• In the broadest sense of the term, distributed computing simply means
that something is shared between multiple systems that may also be in
different places.
• In the enterprise, distributed computing has often involved various
stages in business processes at the most efficient places in the computer
network. For example, in the typical distribution using the 3-tier model,
the processing of the user interface is done on the PC, at the user’s
location, the commercial processing is done on a remote computer and
access to the database and the processing are performed on another
computer providing centralized access for many business processes.
• Generally, this type of distributed computing uses the client / server
communication model.
• The distributed computing environment (DCE) is a widely used industry
standard that supports this type of distributed computing. On the
Internet, third-party service providers are now offering generalized
services that fit into this model.
• Grid Computing is a computer model involving a distributed architecture
of many connected computers to solve a complex problem. In the grid
computing model, servers or personal computers perform independent
tasks and are loosely linked via the Internet or low-speed networks.
• Individual participants may allow a portion of their computer’s
processing time to be used for a big problem. SETI @ home is the largest
grid computing project in which PC owners are dedicating some of their
multitasking processing cycles (while using their computers at the same
time) to the Search for Extraterrestrial Intelligence (SETI) project.
• There is a lot of disagreement about the difference between distributed
computing and grid computing. According to some, grid computing is
just one type of distributed computing. The SETI project, for example,
characterizes the model on which it is based as distributed computing.
• Similarly, cloud computing, which simply involves hosted services made
available to users from a remote location, can be considered a type of
distributed computing, depending on your demand.
• One of the first uses of grid computing was breaking a cryptographic
code by a group now called.net. This group also describes its distributed
computing model.

Subject: Cloud Computing 29 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
What are Distributed Systems?

• A group of independent nodes connected with one another in a


coordinated manner in order to achieve a common result and they are
structured in such a way that the group appears to be a single system to
the end user.
• The nodes are programmable, asynchronous, autonomous and failure-
prone.
• Every node has its own memory and processor. They have shared states
and can operate concurrently.
• The nodes are connected with one another in order to offer a service,
share data or simply store data ( e.g. blockchain ).
• All the nodes communicate with each other using Messages.
• All the nodes in the distributed system are capable of sending or receiving
messages to each other.

Goals to be met to build an efficient distributed system are -

1. Connecting users with the resources


2. Transparency in hiding fact about process and resources
3. Openness in offering services as per the standard rules
4. Scalability

In what way can we organize our computers?


So we have two types of architecture in distributed computing:

1. Client-Server Architecture
2. Peer-Peer Architecture

Below, I will be briefing you with both the type of architectures along with their
respective diagrams.

Client-Server Architecture
There are two main entities of this architecture,

1. Server
2. Clients

• This architecture forms the basis of all the services provided by the
Distributed architecture which is easy to design and it can accomplish the
task from a single point. (i.e. server).

Subject: Cloud Computing 30 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Here in this architecture, the server acts as the main centralized entity
which is responsible to fulfill all the requests coming from all the clients
connected to the server in the same network.

• Server and Clients communicate with each other using the Request-
Response approach.

• Modern day examples of this architecture are Cloud services, IoT devices,
Browsers, etc.

Server
An Entity that is responsible to offer the services to the client,
server provides services like data processing, storage, deploying applications,
etc.
Client
An Entity which communicates with the server in order to complete their
task. They are usually connected to the server on the Internet.

Fig. Client – Server Architecture Diagram


This architecture is a good service-oriented system.
The biggest disadvantage of this architecture is that the whole system is
dependent on the central point (i.e. server). If the server goes down, then the
whole system stops working.
In a client-server architecture, there is tiered/layered architecture in which we
can add several layers on both the client and the server side in order to
accomplish the system requirements, complexity and security.

Subject: Cloud Computing 31 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The most commonly used layered/tiered architecture is 2-tier and 3-tier
architecture.
Peer-to-Peer Architecture
This architecture is like a network of interconnected computers systems
in which they are able to share the information and resources.
Here, every system in the network is referred to as a node or ‘peer’.

This architecture can be used in Blockchain technology, Transportation services,


E-Commerce, Education, Banking & Finance, etc.
Advantages of this architecture are,

1. It can be easily configured and installed.


2. All the nodes in the network are capable of sharing resource and
information with others node present in the network.
3. Even if any one node goes down it does not affect the whole system.
4. Maintaining and building such architecture is comparatively cost
effective.

Fig. Peer-Peer Architecture diagram

Blockchain technology works on the principle of P2P architecture which helps


the technology to be more secure and efficient. Blockchain technology can be
used in many industries but the main highlight where it is mostly used is
‘Cryptocurrencies’.

Subject: Cloud Computing 32 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
A P2P network is central when it comes to doing a transaction within a
blockchain. All the nodes can transact with each other in the blockchain. Now,
all the P2P networks are decentralized and that is why blockchain is also known
as decentralized applications. This characteristic makes blockchain more secure
and hard to hack or break into.
In P2P networks, limitations come into the picture when the size of the network
grows which results in performance, security, and accessibility within the
network.
Disadvantages of this network are,

1. No centralized entity to manage all the network operations.


2. Backup should be done on each computer within the network.
3. As any node can be accessed anytime in order for that network, security
is applied to each node individually.

Q11) Major Technologies for Distributed Computing

Distributed computing is a model in which components of a software


system are shared among multiple computers to improve efficiency and
performance. It is a field of computer science that studies distributed systems.
In distributed system components are located on different networked
computers.

Three Major Distributed Computing Technologies:


1. Mainframes
2. Clusters
3. Grid

Subject: Cloud Computing 33 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Mainframes:
Mainframes were the first example of large computing facilities which
leverage multiple processing units. They are powerful, highly reliable computers
specialized for large data movement and large I/O operations. Mainframes are
mostly used by large organizations for bulk data processing such as online
transactions, enterprise resource planning and other big data operations. They
are not considered as a distributed system; however they can perform big data
processing and operations due to their high computational power by multiple
processors. One of the most attractive features of mainframe was the ability to
be highly reliable computers that were always on and capable of tolerating
failures transparently. Furthermore, system shutdown is not required to change
its component. Batch processing is the important application of mainframes.
Their popularity has been reduced nowadays.

Clusters:
Clusters have started as the low-cost alternative to the mainframes and
supercomputer. Due to advancement of technology in mainframes and
supercomputers, other hardware’s and machines have become cheap which are
connected by high bandwidth networks controlled by specific software tools
that manage the messaging system. Since the 1980s cluster has become

Subject: Cloud Computing 34 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
standard technology for parallel and high-performance computing. Due to their
low investment cost different research institutions, companies, universities now
a day use clusters. This technology contributed to the evolution of tools and
framework for distributed computing like Condor, PVM, MP. One of the
attractive features of clusters is the cheap machines with high computational
power to solve the problem. And clusters are scalable. Example of a cluster is
amazon EC2 clusters to process data using Hadoop which has multiple
nodes(machines) with master nodes and data nodes and we can scale it if we
have a big volume of data.

Grids:
They appeared in the early 1990’s as the evolution of cluster computing.
Grid computing can have an analogy with electric power grid which is an
approach to use high computational power, storage services and other variety
of services. Users can consume resources in the same way as use of utilities such
as power, gas and water. Grids initially developed aggregation of geographically
dispersed clusters by means of internet connections and clusters belonging to
different organizations and arrangement is made to share computational power
between those organizations. Grid is dynamic aggregation of heterogeneous
Subject: Cloud Computing 35 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
computing nodes which can be both nationwide and worldwide. Different
development in technology has made possible in diffusion of computing grids
which are:

• becoming cluster as common resources


• underutilization
• Some problems with higher computational requirement and
seems impossible from single cluster
• high band network, long distance connectivity

Distributing technology has led to the development of cloud computing.

Virtualization
Q12) Explain about virtualization? Advantages and Disadvantages.

Virtualization:-
Virtualization is the "creation of a virtual rather than actual of something,
such as hardware, server, a desktop, a storage device, an operating system or
network resources". In virtualization dedicated resources are created in the
infrastructure based on service requests. It refers to the creation of a virtual
resource such as a server, desktop, operating system, storage or network.
Subject: Cloud Computing 36 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

In other words, Virtualization is a technique, which allows sharing a single


physical instance of a resource or an application among multiple customers and
organizations. It does by assigning a logical name to a physical storage and
providing a pointer to that physical resource when demanded.

• The virtualization refers to the ability to run multiple operating systems


on a single computer system and share the underplaying hardware
resources.
• To make simpler it is a process by which one computer provides the
appearance of many computers.
• It is a process to improve it optimization of physical resources, costs and
uses a resource pool to allocate resources.
• The main goal of virtualization is to manage workloads by radically
transforming traditional computing to make it more scalable.

Advantages of virtualization: -:-


Virtualization can help companies maximize the value of IT investments,
decreasing the server hardware footprint, energy consumption, and cost and

Subject: Cloud Computing 37 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
complexity of managing IT systems while increasing the flexibility of the overall
environment.

Cost
Depending on your solution, you can have a cost-free data center. You do
have to shell out the money for the physical server itself, but there are options
for free virtualization software and free operating systems. Microsoft’s Virtual
Server and VMware Server are free to download and install. If you use a licensed
operating system, of course that will cost money. For instance, if you wanted
five instances of Windows Server on that physical server, then you’re going to
have to pay for the licenses. That said, if you were to use a free version of Linux
for the host and operating system, then all you’ve had to pay for is the physical
server.

Administration:-
Having all your servers in one place reduces your administrative burden.
According to VMware, you can reduce your administrative burden from 1:10 to
1:30. What this means is that you can save time in your daily server
administration or add more servers by having a virtualized environment.
The following factors ease your administrative burdens:
• A centralized console allows quicker access to servers.
• CDs and DVDs can be quickly mounted using ISO files.
• New servers can be quickly deployed.
• New virtual servers can be deployed more inexpensively than physical
servers.
• RAM can be quickly allocated for disk drives.

Subject: Cloud Computing 38 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Virtual servers can be moved from one server to another.

Fast Deployment:-
Because every virtual guest server is just a file on a disk, it’s easy to copy
(or clone) a system to create a new one. To copy an existing server, just copy the
entire directory of the current virtual server.

Need virtualization:-
Virtualization provides various benefits including saving time and energy,
decreasing costs and minimizing overall risk.
• Provides ability to manage resources effectively.
• Increases productivity, as it provides secure remote access.
• Provides for data loss prevention.

Reduced Infrastructure Costs


We already talked about how you can cut costs by using free servers and
clients, like Linux, as well as free distributions of Windows Virtual Server, Hyper-
V, or VMware. But there are also reduced costs across your organization. If you
reduce the number of physical servers you use, then you save money on
hardware, cooling, and electricity. You also reduce the number of network ports,
console video ports, mouse ports, and rack space.
Some of the savings you realize include
• Increased hardware utilization by as much as 70 percent
• Decreased hardware and software capital costs by as much as 40
percent
• Decreased operating costs by as much as 70 percent
Subject: Cloud Computing 39 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Disadvantages of virtualization: -
Extra Costs:-
Maybe you have to invest in the virtualization software and possibly
additional hardware might be required to make the virtualization possible. This
depends on your existing network. Many businesses have sufficient capacity to
accommodate the virtualization without requiring much cash. If you have an
infrastructure that is more than five years old, you have to consider an initial
renewal budget.

Software Licensing:-
This is becoming less of a problem as more software vendors adapt to the
increased adoption of virtualization. However, it is important to check with your
vendors to understand how they view software use in a virtualized environment.
Learn the new Infrastructure:-
Implementing and managing a virtualized environment will
require IT staff with expertise in virtualization. On the user side, a typical virtual
environment will operate similarly to the non-virtual environment. There are
some applications that do not adapt well to the virtualized environment.

Q13) Characteristics of Virtualization


1. Increased Security –

The ability to control the execution of a guest program in a completely


transparent manner opens new possibilities for delivering a secure, controlled
execution environment. All the operations of the guest programs are generally
performed against the virtual machine, which then translates and applies them
to the host programs.

A virtual machine manager can control and filter the activity of the guest
programs, thus preventing some harmful operations from being performed.
Resources exposed by the host can then be hidden or simply protected from the
guest. Increased security is a requirement when dealing with untrusted code.

Example-1: Untrusted code can be analyzed in Cuckoo sandboxes


environment.
The term sandbox identifies an isolated execution environment where
Subject: Cloud Computing 40 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
instructions can be filtered and blocked before being translated and executed
in the real execution environment.

Example-2: The expression sandboxed version of the Java Virtual


Machine (JVM) refers to a particular configuration of the JVM where, by means
of security policy, instructions that are considered potentially harmful can be
blocked.

2. Managed Execution –
In particular, sharing, aggregation, emulation, and isolation are the most
relevant features.

Functions enabled by a managed execution


3. Sharing –
Virtualization allows the creation of a separate computing environment
within the same host. This basic feature is used to reduce the number of active
servers and limit power consumption.
4. Aggregation –
It is possible to share physical resources among several guests, but
virtualization also allows aggregation, which is the opposite process. A group of
separate hosts can be tied together and represented to guests as a single virtual
host. This functionality is implemented with cluster management software,
which harnesses the physical resources of a homogeneous group of machines
and represents them as a single resource.

Subject: Cloud Computing 41 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
5. Emulation –
Guest programs are executed within an environment that is controlled by
the virtualization layer, which ultimately is a program. Also, a completely
different environment with respect to the host can be emulated, thus allowing
the execution of guest programs requiring specific characteristics that are not
present in the physical host.
6. Isolation –
Virtualization allows providing guests—whether they are operating
systems, applications, or other entities—with a completely separate
environment, in which they are executed. The guest program performs its
activity by interacting with an abstraction layer, which provides access to the
underlying resources. The virtual machine can filter the activity of the guest and
prevent harmful operations against the host.
Besides these characteristics, another important capability enabled by
virtualization is performance tuning. This feature is a reality at present, given
the considerable advances in hardware and software supporting virtualization.
It becomes easier to control the performance of the guest by finely tuning the
properties of the resources exposed through the virtual environment. This
capability provides a means to effectively implement a quality-of-service (QoS)
infrastructure.
7. Portability –

The concept of portability applies in different ways according to the


specific type of virtualization considered. In the case of a hardware
virtualization solution, the guest is packaged into a virtual image that, in most
cases, can be safely moved and executed on top of different virtual machines.
In the case of programming-level virtualization, as implemented by the JVM or
the .NET runtime, the binary code representing application components (jars
or assemblies) can run without any recompilation on any implementation of
the corresponding virtual machine.

Q14)Explain about hardware virtualization?

Virtualization is the "creation of a virtual (rather than actual) version of


something, such as a server, a desktop, a storage device, an operating system or
network resources".

Subject: Cloud Computing 42 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

In other words, Virtualization is a technique, which allows to share a single


physical instance of a resource or an application among multiple customers and
organizations. It does by assigning a logical name to a physical storage and
providing a pointer to that physical resource when demanded.

Hardware/Server Virtualization
It is the most common type of virtualization as it provides advantages of
hardware utilization and application uptime. The basic idea of the technology is
to combine many small physical servers into one large physical server, so that
the processor can be used more effectively and efficiently. The operating system
that is running on a physical server gets converted into a well-defined OS that
runs on the virtual machine.
The hypervisor controls the processor, memory, and other components by
allowing different OS to run on the same machine without the need for a source
code.
Full Virtualization
Full virtualization is a technique in which a complete installation of one
machine is run on another. The result is a system in which all software running
on the server is within a virtual machine.

Subject: Cloud Computing 43 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Virtualization is relevant to cloud computing because it is one of the ways


in which you will access services on the cloud. That is, the remote data centre
may be delivering your services in a fully virtualized format. In order for full
virtualization to be possible, it was necessary for specific hardware
combinations to be used. It wasn’t until 2005 that the introduction of the AMD-
Virtualization (AMD-V) and Intel Virtualization Technology (IVT) extensions
made it easier to go fully virtualized.
Full virtualization has been successful for several purposes:
• Sharing a computer system among multiple users
• Isolating users from each other and from the control program
• Emulating hardware on another machine
Paravirtualization
Paravirtualization allows multiple operating systems to run on a single
hardware device at the same time by more efficiently using system resources,
like processors and memory. In full virtualization, the entire system is emulated
(BIOS, drive, and so on), but in Paravirtualization, its management module
operates with an operating system that has been adjusted to work in a virtual
machine. Paravirtualization typically runs better than the full virtualization
model, simply because in a fully virtualized deployment, all elements must be
emulated.

Subject: Cloud Computing 44 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Paravirtualization works best in these sorts of deployments:


• Disaster recovery In the event of a catastrophe, guest instances can be moved
to other hardware until the equipment can be repaired.
• Migration moving to a new system is easier and faster because guest instances
can be removed from the underlying hardware.
• Capacity management Because of easier migrations, capacity management is
simpler to implement. It is easier to add more processing power or hard drive
capacity in a virtualized

Partial virtualization:-
When entire operating systems cannot run in the virtual machine,
but some or many applications can, it is known as Partial Virtualization.
Basically, it partially simulates the physical hardware of a system.
This type of virtualization is far easier to execute than full
virtualization. This is very successful when computer resources are shared
Subject: Cloud Computing 45 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
amongst multiple users. Situations that need backward compatibility or
portability require full virtualization. Here partial virtualization falls woefully
short. This is because it is difficult to anticipate which features have been
used by a particular application.

Partial virtualization was used in the first-generation, time sharing


system, the CTSS. This was a historical milestone since it paved the way to full
virtualization.

Q15) Explain different types of virtualizations.


There is software that makes virtualization possible. This software is
known as a Hypervisor, also known as a virtualization manager. It sits between
the hardware and the operating system, and assigns the amount of access that
the applications and operating systems have with the processor and other
hardware resources.

Subject: Cloud Computing 46 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Types of virtualization:–

Hardware/Server Virtualization
It is the most common type of virtualization as it provides advantages of
hardware utilization and application uptime. The basic idea of the technology is
to combine many small physical servers into one large physical server, so that
the processor can be used more effectively and efficiently. The operating system
that is running on a physical server gets converted into a well-defined OS that
runs on the virtual machine.
The hypervisor controls the processor, memory, and other components by
allowing different OS to run on the same machine without the need for a source
code.
Hardware virtualization is further subdivided into the following types:
Full Virtualization – In it, the complete simulation of the actual hardware takes
place to allow software to run an unmodified guest OS.
Para Virtualization – In this type of virtualization, software unmodified runs in
modified OS as a separate system.
Partial Virtualization – In this type of hardware virtualization, the software may
need modification to run.
Desktop virtualization
It provides the work convenience and security. As one can access
remotely, you are able to work from any location and on any PC. It provides a
lot of flexibility for employees to work from home or on the go. It also protects
confidential data from being lost or stolen by keeping it safe on central servers.

Subject: Cloud Computing 47 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Network Virtualization
It refers to the management and monitoring of a computer network as a
single managerial entity from a single software-based administrator’s console.
It is intended to allow network optimization of data transfer rates, scalability,
reliability, flexibility, and security. It also automates many network
administrative tasks. Network virtualization is specifically useful for networks
experiencing a huge, rapid, and unpredictable increase of usage.
The intended result of network virtualization provides improved network
productivity and efficiency.
Two categories:
Internal: Provide network like functionality to a single system.
External: Combine many networks, or parts of networks into a virtual unit.
Storage Virtualization
In this type of virtualization, multiple network storage resources are present
as a single storage device for easier and more efficient management of these
resources. It provides various advantages as follows:
• Improved storage management in a heterogeneous IT environment
• Easy updates, better availability
• Reduced downtime
• Better storage utilization
• Automated management
• In general, there are two types of storage virtualization:
• Block- It works before the file system exists. It replaces controllers and
takes over at the disk level.
Subject: Cloud Computing 48 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• File- The server that uses the storage must have software installed on it in
order to enable file-level usage.

Memory Virtualization
It introduces a way to decouple memory from the server to provide a
shared, distributed or networked function. It enhances performance by
providing greater memory capacity without any addition to the main memory.
That’s why a portion of the disk drive serves as an extension of the main
memory.
Implementations –
Application-level integration – Applications running on connected
computers directly connect to the memory pool through an API or the file
system.

Operating System Level Integration – The operating system first connects to the
memory pool, and makes that pooled memory available to applications.

Software Virtualization
It provides the ability to the main computer to run and create one or more
virtual environments. It is used to enable a complete computer system in order
to allow a guest OS to run. For instance letting Linux to run as a guest that is
natively running a Microsoft Windows OS (or vice versa, running Windows as a
guest on Linux).

Subject: Cloud Computing 49 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Types:
• Operating system
• Application virtualization
• Service virtualization

Data Virtualization
Without any technical details, you can easily manipulate data and know
how it is formatted or where it is physically located. It decreases the data errors
and workload.
Q16) What is a hypervisor? Explain different types of hypervisors.
Hypervisor
There is software that makes virtualization possible. This software is
known as a Hypervisor, also known as a virtualization manager. It sits between
the hardware and the operating system, and assigns the amount of access that
the applications and operating systems have with the processor and other
hardware resources.
A Hypervisor also known as Virtual Machine Monitor (VMM) can be a
piece of software, firmware or hardware that gives an impression to the guest
machines (virtual machines) as if they were operating on a physical hardware. It
allows multiple operating systems to share a single host and its hardware. The
hypervisor manages requests by virtual machines to access to the hardware
resources (RAM, CPU, NIC etc) acting as an independent machine.

Subject: Cloud Computing 50 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Hypervisor is mainly divided into two types namely
• Type 1/Native/Bare Metal Hypervisor
• Type 2/Hosted Hypervisor
Type 1 Hypervisor :-
• This is also known as Bare Metal or Embedded or Native Hypervisor.
• It works directly on the hardware of the host and can monitor
operating systems that run above the hypervisor.
• It is completely independent from the Operating System.

• The hypervisor is small as its main task is sharing and managing


hardware resources between different operating systems.
A major advantage is that any problems in one virtual machine
or guest operating system do not affect the other guest operating
systems running on the hypervisor.

Examples:
• VMware ESXi Server
• Microsoft Hyper-V
• Citrix/Xen Server

Type 2 Hypervisor :-
• This is also known as Hosted Hypervisor.
• In this case, the hypervisor is installed on an operating system and then
supports other operating systems above it.
• It is completely dependent on host Operating System for its operations
• While having a base operating system allows better specification of policies,
any problems in the base operating system affects the entire system as well
even if the hypervisor running above the base OS is secure.

Subject: Cloud Computing 51 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Examples:
• VMware Workstation.
• Microsoft Virtual PC
• Oracle Virtual Box

Q16) Introduction to Example Of Cloud Computing

Cloud computing is a technology that uses the internet for storing and
managing data on remote servers and then access data via the internet. This
type of system allows users to work on the remote. Cloud computing customers
do not own the physical infrastructure; they rent the usage from a third-party
provider. Cloud Computing and the Essential characteristics of cloud services are
On-demand self- service, Broad network access, Resource pooling, rapid
elasticity. Cloud computing is so successful because of its simplicity in its usage.
They are a cost-effective solution for enterprises. The various features are
Optimal Server Utilization, On-demand cloud services (satisfying client),
Dynamic Scalability, Virtualization techniques. One such example is Google
cloud – It is a suite of public cloud services offered by Google. All the application
development run on Google hardware. They include Google Compute Engine,
App engine, google cloud storage, Google container engine.
Types of Services:

1. SAAS (Software-as-a-Service)- Examples Microsoft Office Live, Dropbox.


2. PAAS (Platform-as-a-Service)- Examples Google App Engine
3. IAAS (Infrastructure-as-a-Service) – Examples IBM cloudburst.
Why now cloud computing?

1. Economics of scale: The rapid growth of e-commerce and social media


has increased the demand for computational resources. In larger data

Subject: Cloud Computing 52 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
centers, it is easier to maximize the amount of work and reduce idle server
time.
2. Expertise: Companies built data centers for their internal cloud, they
could develop expertise and technology to build public data centers.
3. Open-Source Software: The Linux operating system has become a major
cloud computing enabler.
Deployment Models of Cloud Computing

1. Private Cloud: It functions for single organizations on a private network


and it is secure. Ex: Corporate IT department.
2. Public Cloud: It is owned by the cloud service provider. Ex: Gmail.
3. Hybrid cloud: It is the combination of both private and public versions of
the cloud. Ex: Proprietary technology.
Top Cloud Computing Providers

• Amazon EC2 & S3: Is a key web service that creates and manage virtual
machine with the operating system running inside them.EC2 is much
more complex than S3.
• Google App Engine: Is a pure PAAS service. It is represented by the web
or application server.
• Windows Azure
• Google App
• Panda Cloud
Examples of Cloud Computing
Now we are going to discuss the Examples of Cloud Computing which are
mention below :
1. Dropbox, Facebook, Gmail
Cloud can be used for storage of files. The advantage is an easy backup.
They automatically synchronize the files from the desktop. Dropbox
allowing users to access files and storage up to 1 terabyte of free storage. Social
Networking platform requires a powerful hosting to manage and store data in
real-time. Cloud-based communication provides click-to-call capabilities from
social networking sites, access to the Instant messaging system.

2. Banking, Financial Services


Consumers store financial information to cloud computing serviced
providers. They store tax records as online backup services.

3. Health Care
Using cloud computing, Medical professionals host information, analytics
and do diagnostics remotely. As healthcare also comes in the list of examples of
cloud computing it allows other doctors around the world to immediately access

Subject: Cloud Computing 53 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
this medical information for faster prescriptions and updates. Application of
cloud computing in health care includes telemedicine, public and personal
health care, E-health services and bioinformatics.

4. Education
This is useful in institutions of higher learning provide benefits to
universities and colleges so henceforth Education comes in the examples of
cloud computing. Google and Microsoft provide various services free of charge
to staff and students in different learning institutions. Several Educational
institutions in united states use them to improve efficiency, cut on costs.
Example- Google App Education (GAE). They allow the user to use their personal
workspace, teaching becomes more interactive.

5. Government
They deliver e-Governance services to citizens using cloud-based IT
services. They have the technology to handle large transactions, citizens can see
fewer congestion bottlenecks.

6. Big data Analytics


Big data analytics is another example of Cloud computing, As cloud
computing enables data scientist in analyzing their data patterns, insights,
correlations, predictions and help in good decision making. There are many open
sources of big tools like Hadoop, Cassandra.

7. Communication
Cloud allows network-based access to communication tools like emails
and calendars. Wats app is also a cloud-based infrastructure as it comes in
communication it is also one of the examples of cloud computing. All the
messages and information are stored in service providers hardware.
8. Business Process
Business email is cloud-based. ERP, document management and CRM are
based on a cloud service provider. SAAS has become an important method for
the enterprise. Examples include Salesforce, HubSpot. They make many
business processes more reliable because data can be copied at multiple
redundant sites on the cloud providers.

Subject: Cloud Computing 54 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Unit-II
Q1) Explain about Cloud Computing Architecture.

(or)
Explain about SPI frame work.
Cloud Computing is nothing but using and accessing applications
through internet. In addition to configuration and manipulation of applications
we can also store data online. Usually in cloud computing you do not need to
install any software for any application to run or work in your PC, this is what
makes a difference which avoids platform dependency issues. This is how Cloud
computing is making applications mobile and collaborative.
Cloud Computing Architecture

The basic cloud computing architecture is divided into two main parts.

Subject: Cloud Computing 55 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

1. Front End:
The front end refers to the client/customer part of cloud computing system.
It consists of client's computer system, interfaces and applications that are
required to access the cloud system.
Individual users connect to the cloud from their own personal computers or
portable devices, over the Internet. To these individual users, the cloud is seen
as a single application, device, or document. The hardware in the cloud (and the
operating system that manages the hardware connections) is invisible. This
architecture is simple and it does require some intelligent management to
connect all those computers together and assign task processing to multitudes
(a large number of people or things) of users.
It all starts with the front-end interface seen by individual users. This is
how users select at ask or service (either starting an application or opening a
document). The user’s request then gets passed to the system management,
which finds the correct resources and then calls the system’s appropriate
provisioning services. These services carve out the necessary resources in the
cloud, launch the appropriate web application, and either creates or opens the
requested document. After the web application is launched, the system’s
monitoring and metering functions track the usage of the cloud so that
resources are apportioned and attributed to the proper user(s).
For example - To access gmail, we use browsers like Chrome, Firefox etc.
2. Back End:
The back End refers to the cloud itself. It consists of all the resources
required to provide cloud computing services such as data storage system,
servers, virtual machines, security mechanism, services, deployment models,
servers etc. It is the responsibility of the back end to provide security mechanism
and traffic control.

Subject: Cloud Computing 56 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Cloud Infrastructure
The cloud infrastructure contains set of management software,
deployment software, network, servers, storage and hypervisor.

Management software : It maintains and configures the infrastructure.


Hypervisor : It is a low level program that allows physical resources to share
among customers. This acts as a Virtual Machine Manager.
Network : It helps the cloud services in connecting through internet and also the
customers can change network route and protocol as per their requirement.
Server : It helps in computation and also provides services like resource
allocation and deallocation, monitoring and many more.
Storage : Cloud computing makes storage very reliable as it uses file distributed
system. In this system if the data couldn’t be fetched from one file then it fetches
from other files.
Deployment Software : It helps in deploying and integrating applications on the
cloud.

Cloud Service Models:-


There are 3 basic service models
• SaaS
• PaaS
• IaaS
Software as a Service (SaaS) Model:
In this model software is deployed on hosted service and is accessible through
internet. It allows providing software application to the users. Billing and
Invoicing System, customer relationship management (CRM) applications, Help
Desk Applications are some of SaaS applications. The software’s license is
available based on usage or subscription. They are cost effective and require less
maintenance. In this multiple users can share an instance and is not required to
code functionality of each user. Scalability, efficiency, performance are the
benefits of SaaS. The issues with this model are Lack of portability between SaaS
clouds and browser based risks.

Platform as a Service (PaaS) Model:


This model acts as a run time environment and allows developing and
deploying tools required for the applications. It has a special feature that helps
non developers to create web applications. This also offers API and development
tools required to develop an application. The benefits of this model are low cost
of ownership and scalable solutions. But the disadvantage is, in PaaS the
Subject: Cloud Computing 57 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
consumer’s browser has to maintain reliable and secure connections to the
provider systems. There is also a lack of probability between PaaS clouds.

Infrastructure as a Service (IaaS) Model:-


This model offers access to physical resources like physical and virtual
machines, data storage. Virtual Machines are pre installed as an operating
system and they allow to store a specific data in different locations and also
computing resources can be easily scaled up and down. It offers disk storage,
local area network , load balancers, IP addresses, software bundles etc. This
model also allows the cloud provider to locate infrastructure over the internet.
IaaS helps in controlling the computing resources via administrative access to
virtual machines. There are also some issues along with the benefits and the
issues are compatibility with legacy security vulnerabilities, virtual machine
sprawl etc.
Cloud Storage:-
The cloud storage helps in saving data offline, which is maintained by a third
party. The storage devices are classified into two:
Block Storage Devices : Raw storage is offered to the clients and allows to create
volumes using raw storage.
File Storage Devices : In this storage is in the form of NAS(Network Attached
Storage). Here clients are offered the storage in the form of files.

Q2) Explain about cloud deployment models(or) different types of


clouds.

Public Cloud:-
The cloud infrastructure is provisioned for open use by the general public.
It may be owned, managed, and operated by a business, academic, or
government organization, or some combination of them. It exists on the
premises of the cloud provider.
Simply the Public Cloud allows systems and services to be easily accessible
to general public, e.g., Google, Amazon, Microsoft offers cloud services via
Internet.

Subject: Cloud Computing 58 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Benefits :-
There are many benefits of deploying cloud as public cloud model. The
following diagram shows some of those benefits:

Cost effective
Since public cloud share same resources with large number of consumer, it has
low cost.
Subject: Cloud Computing 59 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Reliability
Since public cloud employs large number of resources from different locations,
if any of the resource fail, public cloud can employ another one.
Flexibility
It is also very easy to integrate public cloud with private cloud, hence gives
consumers a flexible approach.
Location independence
Since, public cloud services are delivered through internet, therefore ensures
location independence.
Utility style costing
Public cloud is also based on pay-per-use model and resources are accessible
whenever consumer needs it.
High scalability
Cloud resources are made available on demand from a pool of resources, i.e.,
they can be scaled up or down according the requirement.

Disadvantages:-
Here are the disadvantages of public cloud model:
Low security
In public cloud model, data is hosted off-site and resources are shared publicly,
therefore does not ensure higher level of security.
Less customizable
It is comparatively less customizable than private cloud.

Private cloud:-

The Private Cloud allows systems and services to be accessible within an


organization. The Private Cloud is operated only within a single organization.
However, It may be managed internally or by third-party.

Subject: Cloud Computing 60 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Benefits :-
There are many benefits of deploying cloud as private cloud model. The
following diagram shows some of those benefits:

Higher security and privacy


Private cloud operations are not available to general public and
resources are shared from distinct pool of resources, therefore, ensures high
security and privacy.
More control
Private clouds have more control on its resources and hardware than
public cloud because it is accessed only within an organization.

Subject: Cloud Computing 61 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Cost and energy efficiency


Private cloud resources are not as cost effective as public clouds but
they offer more efficiency than public cloud.
Improved reliability:-
Private cloud ensures the reliable services to end users because of the
high security and privacy.

Disadvantages:-
Here are the disadvantages of using private cloud model:
Restricted area
Private cloud is only accessible locally and is very difficult to deploy globally.
Inflexible pricing
In order to fulfil demand, purchasing new hardware is very costly.
Limited scalability
Private cloud can be scaled only within capacity of internal hosted resources.
Additional skills
In order to maintain cloud deployment, organization requires more skilled and
expertise.

Community cloud:-

The Community Cloud allows system and services to be accessible by


group of organizations. It shares the infrastructure between several
organizations from a specific community. It may be managed internally or by the
third-party.

Subject: Cloud Computing 62 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Benefits :-
There are many benefits of deploying cloud as community cloud model.
The following diagram shows some of those benefits:
Cost effective
Community cloud offers same advantage as that of private cloud at low
cost. Sharing between organizations community cloud provides an
infrastructure to share cloud resources and capabilities among several
organizations.
Security
Community cloud is comparatively more secure than the public cloud.
Issues :-
• Since all data is housed at one location, one must be careful in storing
data in community cloud because it might be accessible by others.
• It is also challenging to allocate responsibilities of governance, security
and cost.

Subject: Cloud Computing 63 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Hybrid cloud:-
The Hybrid Cloud is a mixture of public and private cloud. Non-critical
activities are performed using public cloud while the critical activities are
performed using private cloud.

Benefits :-
There are many benefits of deploying cloud as hybrid cloud model. The following
diagram shows some of those benefits:

Scalability
It offers both features of public cloud scalability and private cloud
scalability.
Flexibility
It offers both secure resources and scalable public resources.
Cost efficiencies
Subject: Cloud Computing 64 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Public cloud are more cost effective than private, therefore hybrid cloud
can have this saving.
Security
Private cloud in hybrid cloud ensures higher degree of security.
Disadvantages :-
Networking issues
Networking becomes complex due to presence of private and public
cloud.
Security compliance
It is necessary to ensure that cloud services are compliant with
organization's security policies.
Infrastructural dependency
The hybrid cloud model is dependent on internal it infrastructure,
therefore it is necessary to ensure redundancy across data centres.

Q3) Economics of Cloud Computing

Economics of Cloud Computing is based on the PAY AS YOU GO method.


Users/Customers must have to pay only for their way of usage of the cloud
services. It definitely beneficial for the users. So that Cloud is economically very
convenient for all. Another side is to eliminate some indirect cost which is
generated by assets such as license of software and their support. In cloud,
users can use software application on subscription basis without any cost
because the property of the software providing service remains to the cloud
provider.
Economical background of cloud is more useful for developers in the following
ways:
• Pay as you go model offered by cloud providers.
• Scalable and Simple.
Cloud Computing Allows:

• Reduces the capital costs of infrastructure.


• Removes the maintenance cost.
• Removes the administrative cost.
What is Capital Cost ?
It is cost occurred in the purchasing infrastructure or the assets that is
important in the production of goods. It takes long time to generate profit.
In the case of start-ups, there is no extra budget for the infrastructure and its
maintenance. So cloud can minimizes expenses of any small organization in

Subject: Cloud Computing 65 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
terms of economy. It leads to the developers can only focus on the
development logic and not on the maintenance of the infrastructure.
There are three different Pricing Strategies which are introduced by the Cloud
Computing: Tiered Pricing, Per-unit Pricing, and Subscription based Pricing.
These are explained as following below.
1. Tired Pricing:
Cloud Services are offered in the various tiers. Each tier offers fix
service agreements at specific cost. Amazon EC2 uses this kind of
pricing.
2. Per-unit Pricing:
The model is based upon the unit specific service concept. Data
transfer and memory allocation includes in this model for specific
units. GoGrid uses this kind of pricing in terms of RAM/hour.
3. Subscription based Pricing:
In this model users are paying periodic subscription fee for the usage
of software.
So these models gives more flexible solutions about cloud economy.

Q4) Cloud Computing Challenges

Cloud computing is the provisioning of resources like data and storage on


demand, that is in real-time. It has been proven to be revolutionary in the IT
industry with the market valuation growing at a rapid rate. Cloud development
has proved to be beneficial not only for huge public and private enterprises but
small-scale businesses as well as it helps to cut costs. It is estimated that more
than 94% of businesses will increase their spending on the cloud by more than
45%. This also has resulted in more and high-paying jobs if you are a cloud
developer.

Cloud technology was flourishing before the pandemic, but there has
been a sudden spike in cloud deployment and usage during the lockdown. The
tremendous growth can be linked to the fact that classes have been shifted
online, virtual office meetings are happening on video calling platforms,
conferences are taking place virtually as well as on-demand streaming apps have
a huge audience. All this is made possible by us of cloud computing only. We are
safe to conclude that the cloud is an important part of our life today, even if we
are an enterprise, student, developer, or anyone else and are heavily dependent
on it. But with this dependence, it is also important for us to look at the issues
and challenges that arise with cloud computing. Therefore, today we bring you

Subject: Cloud Computing 66 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
the most common challenges that are faced when dealing with cloud
computing, let’s have a look at them one by one:

1. Data Security and Privacy


Data security is a major concern when switching to cloud computing. User
or organizational data stored in the cloud is critical and private. Even if the cloud
service provider assures data integrity, it is your responsibility to carry out user
authentication and authorization, identity management, data encryption, and
access control. Security issues on the cloud include identity theft, data breaches,
malware infections, and a lot more which eventually decrease the trust amongst
the users of your applications. This can in turn lead to potential loss in revenue
alongside reputation and stature. Also, dealing with cloud computing requires
sending and receiving huge amounts of data at high speed, and therefore is
susceptible to data leaks.

2. Cost Management
Even as almost all cloud service providers have a “Pay As You Go” model,
which reduces the overall cost of the resources being used, there are times
when there are huge costs incurred to the enterprise using cloud computing.
When there is under optimization of the resources, let’s say that the servers are
not being used to their full potential, add up to the hidden costs. If there is a
degraded application performance or sudden spikes or overages in the usage, it
adds up to the overall cost. Unused resources are one of the other main reasons
why the costs go up. If you turn on the services or an instance of cloud and forget
to turn it off during the weekend or when there is no current use of it, it will
increase the cost without even using the resources.

3. Multi-Cloud Environments
Due to an increase in the options available to the companies, enterprises
not only use a single cloud but depend on multiple cloud service providers. Most
of these companies use hybrid cloud tactics and close to 84% are dependent on
multiple clouds. This often ends up being hindered and difficult to manage for
the infrastructure team. The process most of the time ends up being highly
complex for the IT team due to the differences between multiple cloud
providers.

4. Performance Challenges
Performance is an important factor while considering cloud-based
solutions. If the performance of the cloud is not satisfactory, it can drive away
users and decrease profits. Even a little latency while loading an app or a web
page can result in a huge drop in the percentage of users. This latency can be a
Subject: Cloud Computing 67 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
product of inefficient load balancing, which means that the server cannot
efficiently split the incoming traffic so as to provide the best user experience.
Challenges also arise in the case of fault tolerance, which means the operations
continue as required even when one or more of the components fail.

5. Interoperability and Flexibility


When an organization uses a specific cloud service provider and wants to
switch to another cloud-based solution, it often turns up to be a tedious
procedure since applications written for one cloud with the application stack are
required to be re-written for the other cloud. There is a lack of flexibility from
switching from one cloud to another due to the complexities involved. Handling
data movement, setting up the security from scratch and network also add up
to the issues encountered when changing cloud solutions, thereby reducing
flexibility.

6. High Dependence on Network


Since cloud computing deals with provisioning resources in real-time, it
deals with enormous amounts of data transfer to and from the servers. This is
only made possible due to the availability of the high-speed network. Although
these data and resources are exchanged over the network, this can prove to be
highly vulnerable in case of limited bandwidth or cases when there is a sudden
outage. Even when the enterprises can cut their hardware costs, they need to
ensure that the internet bandwidth is high as well there are zero network
outages, or else it can result in a potential business loss. It is therefore a major
challenge for smaller enterprises that have to maintain network bandwidth that
comes with a high cost.

7. Lack of Knowledge and Expertise


Due to the complex nature and the high demand for research working
with the cloud often ends up being a highly tedious task. It requires immense
knowledge and wide expertise on the subject. Although there are a lot of
professionals in the field they need to constantly update themselves. Cloud
computing is a highly paid job due to the extensive gap between demand and
supply. There are a lot of vacancies but very few talented cloud engineers,
developers, and professionals. Therefore, there is a need for upskilling so these
professionals can actively understand, manage and develop cloud-based
applications with minimum issues and maximum reliability.

Q6) Aneka Cloud Application Platform Framework.


• Aneka is the product of Manjarasoft.
• Aneka is used for developing, deploying and managing cloud
capplications.
Subject: Cloud Computing 68 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Aneka can be integrated with existing cloud technologies.
• Aneka includes extensible set of APIs associated with programming
models like MapReduce.
• These APIs supports different types of cloud models like private, public,
hybrid cloud.
Aneka framework:
✓ Aneka is a software platform for developing cloud computing
applications.
✓ In Aneka cloud applications are executed.
✓ Aneka is a pure PaaS solution for cloud computing.
✓ Aneka is a cloud middleware product.
✓ Aneka can be deployed on a network of computers, a multicore server,
data centres, virtual cloud infrastructures, or a mixture of these.

Subject: Cloud Computing 69 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Aneka container can be classified into three major categories:
1. Fabric Services
2. Foundation Services
3. Application Services
1. Fabric services:
Fabric Services define the lowest level of the software stack representing
the Aneka Container. They provide access to the resource-provisioning
subsystem and to the monitoring facilities implemented in Aneka.
2. Foundation services:
Fabric Services are fundamental services of the Aneka Cloud and define
the basic infrastructure management features of the system. Foundation
Services are related to the logical management of the distributed system built
on top of the infrastructure and provide supporting services for the execution
of distributed applications.
3. Application services:
Application Services manage the execution of applications and
constitute a layer that differentiates according to the specific programming
model used for developing distributed applications on top of Aneka.

Q3) Explain about Anatomy of Aneka.

Subject: Cloud Computing 70 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Subject: Cloud Computing 71 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Subject: Cloud Computing 72 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Subject: Cloud Computing 73 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Subject: Cloud Computing 74 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q4) building Aneka public cloud? A company wants to set up a public
cloud using Aneka. Discuss the public cloud deployment using Aneka
with neat diagram.

Subject: Cloud Computing 75 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q5) Aneka Cloud Organization

Subject: Cloud Computing 76 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q6) What is concurrent programming?
Concurrency generally refers to events or circumstances that are happening or
existing at the same time.

In programming terms, concurrent programming is a technique in which two or


more processes start, run in an interleaved fashion through context
switching and complete in an overlapping time period by managing access
to shared resources e.g. on a single core of CPU.

This doesn’t necessarily mean that multiple processes will be running at


the same instant – even if the results might make it seem like it.

Difference between Concurrent & Parallel programming

In parallel programming, parallel processing is achieved through


hardware parallelism e.g. executing two processes on two separate CPU cores
simultaneously.

Concurrent: Two Queues & a Single Espresso machine.

Parallel: Two Queues & Two Espresso machines.

Subject: Cloud Computing 77 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Q7) Explain about thread computing?


Multithreading is the phenomenon of executing more than a thread in the
system, where the execution of these threads can be of two different types, such
as Concurrent and Parallel multithread executions. A Thread can be defined as
a chunk or unit of a process that can be identified as either a user-level thread
or a Kernel-level thread. It is usually used for its essential characteristics like it
uses the system resources efficiently, high performance, greatly responsive, and
also its parallel execution ability.
Understanding Multithreading
There are two terms that need to be understood :

1. Thread: Thread is the independent or basic unit of a process.


2. Process: A program that is being executed is called a process; multiple
threads exist in a process.
The execution in this is both concurrent and parallel.

• Concurrent Execution: If the processor can switch execution resources


between threads in a multithreaded process on a single processor, it is a
concurrent execution.
• Parallel Execution: When each thread in the process can run on a
separate processor at the same time in the same multithreaded process,
then it is said to be a parallel execution.
Types of Thread

• User-level thread: They are created and managed by users. They are used
at the application level. There is no involvement of the OS. A good
example is when we use threading in programming like in Java, C#,
Python, etc., we use user threads.
There are some unique data incorporated in each thread that helps to identify
them, such as:

1. Program counter: A program counter is responsible for keeping track of


instructions and to tell which instruction to execute next.
2. Register: System registers are there to keep track of the current working
variable of a thread.
3. Stack: It contains the history of thread execution.

Subject: Cloud Computing 78 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Kernel-level Thread: They are implemented and supported by the
operating system. They generally take more time to execute than user
threads, for example, Window Solaris.
Multithreading Models
These models are of three types

• Many to many
• Many to one
• One to on
Many to many: Any number of user threads can interact with an equal or lesser
number of kernel threads.

Many to one: It maps many user-level threads to one Kernel-level thread.

Subject: Cloud Computing 79 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
One to one: Relationship between the user-level thread and the kernel-level
thread is one to one.
Popular Course in this category

Uses of Multithreading
It is a way to introduce parallelism in the system or program. So, you can
use it anywhere you see parallel paths (where two threads are not dependent
on the result of one another) to make it fast and easy.
For example:

• Processing of large data where it can be divided into parts and get it done
using multiple threads.
• Applications which involve mechanism like validate and save, produce and
consume, read and validate are done in multiple threads. Few examples
of such applications are online banking, recharges, etc.
• It can be used to make games where different elements are running on
different threads.
• In Android, it is used to hit the APIs which are running in the background
thread to save the application from stopping.
• In web applications, it is used when you want your app to get
asynchronous calls and perform asynchronously.
Advantages of Multithreading
Below are mentioned some of the advantages:

• Economical: It is economical as they share the same processor resources.


It takes lesser time to create threads.

Subject: Cloud Computing 80 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Resource sharing: It allows the threads to share resources like data,
memory, files, etc. Therefore, an application can have multiple threads
within the same address space.
• Responsiveness: It increases the responsiveness to the user as it allows
the program to continue running even if a part of it is performing a lengthy
operation or is blocked.
• Scalability: It increases parallelism on multiple CPU machines. It enhances
the performance of multi-processor machines.
• It makes the usage of CPU resources better.
Why should we use Multithreading?
We should use this because of the following reasons:

• To increase parallelism
• To make most of the available CPU resources.
• To improve application responsiveness and give better interaction with
the user.

Q8) Introducing Parallelism for Single machine Computation

Before taking a toll on Parallel Computing, first, let’s take a look at the
background of computations of computer software and why it failed for the
modern era.
Computer software was written conventionally for serial computing. This
meant that to solve a problem, an algorithm divides the problem into smaller
instructions. These discrete instructions are then executed on the Central
Processing Unit of a computer one by one. Only after one instruction is finished,
next one starts.
A real-life example of this would be people standing in a queue waiting
for a movie ticket and there is only a cashier. The cashier is giving tickets one by
one to the persons. The complexity of this situation increases when there are 2
queues and only one cashier.
So, in short, Serial Computing is following:
1. In this, a problem statement is broken into discrete instructions.
2. Then the instructions are executed one by one.
3. Only one instruction is executed at any moment of time.

Look at point 3. This was causing a huge problem in the computing


industry as only one instruction was getting executed at any moment of time.
This was a huge waste of hardware resources as only one part of the hardware
will be running for particular instruction and of time. As problem statements

Subject: Cloud Computing 81 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
were getting heavier and bulkier, so does the amount of time in execution of
those statements. Examples of processors are Pentium 3 and Pentium 4.
Now let’s come back to our real-life problem. We could definitely say that
complexity will decrease when there are 2 queues and 2 cashiers giving tickets
to 2 persons simultaneously. This is an example of Parallel Computing.

Parallel Computing :

It is the use of multiple processing elements simultaneously for solving


any problem. Problems are broken down into instructions and are solved
concurrently as each resource that has been applied to work is working at the
same time.
Advantages of Parallel Computing over Serial Computing are as follows:

1. It saves time and money as many resources working together will reduce
the time and cut potential costs.
2. It can be impractical to solve larger problems on Serial Computing.
3. It can take advantage of non-local resources when the local resources are
finite.
4. Serial Computing ‘wastes’ the potential computing power, thus Parallel
Computing makes better work of the hardware.

Types of Parallelism:
1. Bit-level parallelism –
It is the form of parallel computing which is based on the increasing
processor’s size. It reduces the number of instructions that the system
must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the
sum of two 16-bit integers. It must first sum up the 8 lower-order bits,
then add the 8 higher-order bits, thus requiring two instructions to
perform the operation. A 16-bit processor can perform the operation with
just one instruction.
2. Instruction-level parallelism –
A processor can only address less than one instruction for each clock cycle
phase. These instructions can be re-ordered and grouped which are later
on executed concurrently without affecting the result of the program.
This is called instruction-level parallelism.
3. Task Parallelism –
Task parallelism employs the decomposition of a task into subtasks and
then allocating each of the subtasks for execution. The processors
perform the execution of sub-tasks concurrently.
4. Data-level parallelism (DLP) –

Subject: Cloud Computing 82 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Instructions from a single stream operate concurrently on several data –


Limited by non-regular data manipulation patterns and by memory
bandwidth
Why parallel computing?

• The whole real-world runs in dynamic nature i.e. many things happen at a
certain time but at different places concurrently. This data is extensively
huge to manage.
• Real-world data needs more dynamic simulation and modeling, and for
achieving the same, parallel computing is the key.
• Parallel computing provides concurrency and saves time and money.
• Complex, large datasets, and their management can be organized only
and only using parallel computing’s approach.
• Ensures the effective utilization of the resources. The hardware is
guaranteed to be used effectively whereas in serial computation only
some part of the hardware was used and the rest rendered idle.
• Also, it is impractical to implement real-time systems using serial
computing.
Applications of Parallel Computing:
• Databases and Data mining.
• Real-time simulation of systems.
• Science and Engineering.
• Advanced graphics, augmented reality, and virtual reality.
Limitations of Parallel Computing:
• It addresses such as communication and synchronization between
multiple sub-tasks and processes which is difficult to achieve.
• The algorithms must be managed in such a way that they can be handled
in a parallel mechanism.
• The algorithms or programs must have low coupling and high cohesion.
But it’s difficult to create such programs.
• More technically skilled and expert programmers can code a parallelism-
based program well.
Future of Parallel Computing:

The computational graph has undergone a great transition from serial


computing to parallel computing. Tech giant such as Intel has already taken a
step towards parallel computing by employing multicore processors. Parallel
computation will revolutionize the way computers work in the future, for the
better good. With all the world connecting to each other even more than before,
Parallel Computing does a better role in helping us stay that way. With faster

Subject: Cloud Computing 83 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
networks, distributed systems, and multi-processor computers, it becomes even
more necessary.

Q9) Thread Model


Local vs Remote Threads
The modern operating systems provide the abstractions of Process and
Thread for defining the runtime profile of a software application. A Process is a
software infrastructure that is used by the operating system to control the
execution of an application. A Process generally contains one or more threads.
A Thread is a sequence of instructions that can be executed in parallel with other
instructions. When an application is running the operating system takes care of
alternating their execution on the local machine. It is responsibility of the
developer to create a consistent computation as a result of thread execution.

The Thread Model uses the same abstraction of for defining a sequence
of instructions that can be remotely executed in parallel with other instructions.
Hence, within the Thread Model an application is a collection of remotely
executable threads. The Thread Model allows developers to virtualize the
execution of a local multi-threaded application (developed with the .NET
threading APIs) in an almost complete transparent manner. This model
represents the right solution when developers want to port the execution of a
.NET multi-threaded application on Aneka and still use the same way of
controlling the execution of application flow, which is based on synchronization
between threads.
Developers that are familiar with multi-threaded applications will find the
Thread Model the most natural path to program distributed applications with
Aneka. The transition between a .NET thread and an Aneka thread is almost
transparent. In the following a sample application will be used to discuss how to
use Aneka threads.

Working with Threads


Within the .NET threading model a thread is a represented by the Thread
sealed class that can be configured with the method to execute through the
ThreadStart class. The users
activates a thread by calling the Start method on it and by using the APIs exposed
by the
Thread class it can:
• Check the status of the thread by using the Thread.State and the Thread.IsAlive
properties.
• Control its execution by stopping it (Thread.Abort).

Subject: Cloud Computing 84 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Suspending and resuming their execution (Thread.Suspend and
Thread.Resume).
• Wait for its termination by calling Thread.Join.

The Thread.Join, Thread.Suspend, and Thread.Resume are the operations that


allow developer to create basic synchronization patterns between threads. The
Thread class
provides additional APIs that cover:
• Thread affinity.
• Volatile read and write.
• Critical region management.
• Asynchronous operations.
• Stack management.
More complex and advanced synchronization pattern can be obtained by using
other classes of the .NET threading APIs that does not have any reference to the
Thread class. In order to remotely execute a thread, the Thread Model provides
a counterpart of the the Thread class: AnekaThread. The AnekaThread class
represents the work unit in the Thread Model and exposes a subset of the APIs
of System.Threading.Thread. It is possible to perform almost all the basic
operations described before and the following table identifies the mappings
between the two worlds.

Subject: Cloud Computing 85 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The AnekaThread class implements the basic Thread operations but does not
give any Support for the advanced operations such as: critical region, stack,
apartment, culture, and execution context management. Moreover, some basic
operations have not been supported, these are:
• Thread Priority Management.
• Suspend, Resume, Sleep, and Interrupt.

The reason why these operations have not been supported is because the
AnekaThread stances are remotely executed on a computation node that
generally executes work in units coming from different distributed applications.
It is not possible to keep the resources of a computation node occupied with a
AnekaThread instance that is sleeping, or suspended. For what concerns the
priority Aneka does not provide any facility.

The AnekaThread class provides all the required facilities to control its life
cycle. Figure 2 depicts the life cycle of a AnekaThread instance. As soon as the
instance is created it is in the Unstarted state. A call to AnekaThread. Start()
makes it move into the Started state and causes the submission of the instance
to Aneka. If the AnekaThread has some dependent files to be transferred it
moves to Staging In state until all the dependent files are transferred. The Aneka
Thread can then move directly to the Running state is any computing node is
available or being queued, thus moving into the Queued state. As soon as
execution completes if there are any dependent output files to be downloaded
to the client the states is changed to Stating Out otherwise it is directly set to
Completed. At any stage an exception or an error can occur that causes the
AnekaThread instance to move into the Failed state. The user can also actively
terminate the execution by calling AnekaThread.Abort() and this causes the
AnekaThread instance to be stopped and its state to be set to Aborted.

Subject: Cloud Computing 86 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Additional Considerations
Serialization
Since AnekaThread instances are moved between different application
domains they need to be serialized. The AnekaThread instance is declared
serializable, but this does not guarantee that all AnekaThread instances created
by the users will be serializable. In particular, since the AnekaThread is
configured with a ThreadStart object referencing the instance that is the target
of the method invocation, the type containing the method definition of the
method need to be serializable too. The reason for this, is because the
infrastructure will serialize the local instance on which the method will be
invoked and
Subject: Cloud Computing 87 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
send it to the remote node.
In case the users provides a method that is not defined in a serializable type, the
AnekaThread constructor throws an ArgumentException alerting the user that
the selected method cannot be used to run a AnekaThread instance. This
prevents the user from creating a work unit that will not run.

Thread Programming Model vs Common APIs


As pointed out in section 3.1 the Thread Model allow developers to
completely controll the execution of the application by using the operations
exposed by the AnekaThread class. Once the AnekaApplication instance has
been properly set up there is no need to maintain a reference to it. The rationale
behind this choice is that developers familiar with the .NET Threading APIs do
not have the explicit concept of application but simply coordinate
the execution of threads. Since the Thread Model relies on the common APIs of
the infrastructure, it takes advantage of the services these APIs offer and these
services can be used by developers too. In this case the AnekaApplication class
plays an important role in controlling the execution flow, since it allows to:

• Monitor the state of AnekaThread instances by using events:


o AnekaApplication<W,M>.WorkUnitFailed
o AnekaApplication<W,M>.WorkUnitFinished
o GridApplicaiton<W,M>.WorkUnitAborted

• Programmatically control the execution of GirdThread instances:


o AnekaApplication<W,M>.ExecuteWorkUnit(W)
o AnekaApplication<W,M>.StopWorkUnit(W)

• Terminate the execution of the application:


o AnekaApplication<W,M>.ApplicationFinished
o AnekaApplication<W,M>.StopExecution

These APIs are available to all the models and allows developers to
perform the basic operations required to manage the distributed application in
a model independent fashion. For what concerns the Thread Model this seems
to be unnatural even though can be useful some times. This tutorial will not
explore further this option and the reader is suggested to look for the Task
Model that naturally uses this APIs. It is important to notice that the result of
using the AnekaThread operations or the AnekaApplication operations is the
same. The reason for this is that both of the two classes relies on the
ThreadManager class for performing the requested operations.

Subject: Cloud Computing 88 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q10) Aneka thread Vs Local thread.

Aneka thread
• Aneka offers the capability of implementing multi-
threaded applications over the Cloud by means
of Thread Programming Model.
• This model introduces the abstraction of distributed thread, also
called Aneka thread, which mimics the behavior of local threads but
executes over a distributed infrastructure.
Local thread.
• In computer science, a thread of execution is the smallest sequence of
programmed instructions that can be managed independently by
a scheduler, which is typically a part of the operating system.
• The implementation of threads and processes differs between
operating systems, but in most cases a thread is a component of a
process.
• Multiple threads can exist within one process,
executing concurrently and sharing resources such as memory, while
different processes do not share these resources.
• In particular, the threads of a process share its executable code and the
values of its dynamically allocated variables and non-thread-
local global variables at any given time.
Difference of local treads
• Threads are created and controlled by the application developer,
• Threads are transparently moved and remotely executed

Difference of aneka trend
• while Aneka is in charge of schedule their execution once they have
been started
• while developers control them from local objects that act like proxies
of the remote threads

Subject: Cloud Computing 89 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
UNIT-III
High- Throughput Computing Task Programming
Q1) Task computing
Task computing is a wide area of distributed system programming
encompassing several different models of architecting distributed applications,
A task represents a program, which require input files and produce output files
as a result of its execution. Applications are then constituted of a collection of
tasks. These are submitted for execution and their output
data are collected at the end of their execution. This chapter characterizes the
abstraction of a task and provides a brief overview of the distributed application
models that are based on the task abstraction.
The Aneka Task Programming Model is taken as a reference implementation to
illustrate the execution of bag-of-tasks (BoT) applications on a distributed
infrastructure.
Task computing
A task identifies one or more operations that produce a distinct output
and that can be isolated as a single logical unit.
In practice, a task is represented as a distinct unit of code, or a program,
that can be separated and executed in a remote run time environment.
Multithreaded programming is mainly concerned with providing a
support for parallelism within a single machine. Task computing provides
distribution by harnessing the compute power of several computing nodes.
Hence, the presence of a distributed infrastructure is explicit in this model.
Now clouds have emerged as an attractive solution to obtain a huge
computing power on demand for the execution of distributed applications. To
achieve it, suitable middleware is needed. A reference scenario for task
computing is depicted in Figure

Subject: Cloud Computing 90 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

The middleware is a software layer that enables the coordinated use of


multiple resources, which are drawn from a datacentre or geographically
distributed networked computers.
A user submits the collection of tasks to the access point(s) of the middleware,
which will take care of scheduling and monitoring the execution of tasks. Each
computing resource provides an appropriate runtime environment. Task
submission is done using the APIs provided by the middleware, whether a Web
or programming language interface.
Appropriate APIs are also provided to monitor task status and collect their
results upon completion. It is possible to identify a set of common operations
that the middleware needs to support the creation and execution of task-based
applications.
These operations are:
• Coordinating and scheduling tasks for execution on a set of remote
nodes
• Moving programs to remote nodes and managing their dependencies
• Creating an environment for execution of tasks on the remote nodes
• Monitoring each task’s execution and informing the user about its
status
• Access to the output produced by the task.

Characterizing a task
A task represents a component of an application that can be logically isolated
and executed separately.
A task can be represented by different elements:
• A shell script composing together the execution of several applications

Subject: Cloud Computing 91 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• A single program
• A unit of code (a Java/C11/.NET class) that executes within the context
of a specific runtime environment.
A task is characterized by input files, executable code (programs, shell scripts,
etc.), and output files. The runtime environment in which tasks execute is the
operating system or an equivalent sandboxed environment. A task may also
need specific software appliances on the remote execution nodes.
Computing categories
These categories provide an overall view of the characteristics of the problems.
They implicitly impose
requirements on the infrastructure and the middleware.
Applications falling into this category are:
1. High-performance computing
2. High-throughput computing
3. Many-task computing

1) High-performance computing
High-performance computing (HPC) is the use of distributed computing
facilities for solving problems that need large computing power. The general
profile of HPC applications is constituted by a large collection of compute-
intensive tasks that need to be processed in a short period of time. The metrics
to evaluate HPC systems are floating-point operations per second (FLOPS), now
tera-FLOPS or even peta-FLOPS, which identify the number of floating- point
operations per second.
Ex: supercomputers and clusters are specifically designed to support HPC
applications that are developed to solve “Grand Challenge” problems in science
and engineering.

2) High-throughput computing
High-throughput computing (HTC) is the use of distributed computing
facilities for applications requiring large computing power over a long period of
time. HTC systems need to be robust and to reliably operate over a long time
scale. The general profile of HTC applications is that they are made up of a large
number of tasks of which the execution can last for a considerable amount of
time.
Ex: scientific simulations or statistical analyses.
It is quite common to have independent tasks that can be scheduled in
distributed resources because they do not need to communicate.
HTC systems measure their performance in terms of jobs completed per month.

3) Many-task computing

Subject: Cloud Computing 92 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
MTC denotes high-performance computations comprising multiple
distinct activities coupled via file system operations. MTC is the heterogeneity
of tasks that might be of different nature: Tasks may be small or large, single
processor or multiprocessor, compute-intensive or data-intensive, static or
dynamic, homogeneous or heterogeneous.
MTC applications includes loosely coupled applications that are communication-
intensive but not naturally expressed using the message-passing interface. It
aims to bridge the gap between HPC and HTC. MTC is similar to HTC, but it
concentrates on the use of many computing resources over a short period of
time to accomplish many computational tasks.

Q2) Task based Application Models


There are several models based on the concept of the task as the
fundamental unit for composing distributed applications. What makes these
models different from one another is the way in which tasks are generated, the
relationships they have with each other, and the presence of dependencies or
other conditions.
In this section, we quickly review the most common and popular models based
on the concept of the task.

Embarrassingly parallel applications


Embarrassingly parallel applications constitute the most simple and
intuitive category of distributed applications. The tasks might be of the same
type or of different types, and they do not need to communicate among
themselves. This category of applications is supported by the majority of the
frameworks for distributed computing. Since tasks do not need to
communicate, there is a lot of freedom regarding the way they are scheduled.
Tasks can be executed in any order, and there is no specific requirement for
tasks to be executed at the same time. Scheduling these applications is
simplified and concerned with the optimal mapping of tasks to available
resources. Frameworks and tools supporting embarrassingly parallel
applications are the Globus Toolkit, BOINC, and Aneka.
There are several problems: image and video rendering, evolutionary
optimization, and model forecasting. In image and video rendering the task is
represented by the rendering of a pixel or a frame, respectively. For
evolutionary optimization meta heuristics, a task is identified by a single run of
the algorithm with a given parameter set.
The same applies to model forecasting applications.
In general, scientific applications constitute a considerable source of
embarrassingly parallel applications.

Subject: Cloud Computing 93 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Parameter sweep applications


Parameter sweep applications are a specific class of embarrassingly
parallel applications for which the tasks are identical in their nature and differ
only by the specific parameters used to execute.
Parameter sweep applications are identified by a template task and a set of
parameters. The template task defines the operations that will be performed on
the remote node for the execution of tasks. The parameter set identifies the
combination of variables whose assignments specialize the template task into a
specific instance. Any distributed computing framework that provides support
for embarrassingly parallel applications can also support the execution of
parameter sweep applications. The only difference is that the tasks that will be
executed are generated by iterating over all the possible and admissible
combinations of parameters. Nimrod/G is natively designed to support the
execution of parameter sweep applications, and Aneka provides client-based
tools for visually composing a template task, defining parameters, and iterating
over all the possible combinations. A plethora of applications fall into this
category. Scientific computing domain: evolutionary optimization
algorithms, weather-forecasting models, computational fluid dynamics
applications, Monte Carlo methods.
For example, in the case of evolutionary algorithms it is possible to identify the
domain of the applications as a combination of the relevant parameters.
For genetic algorithms these might be the number of individuals of the
population used by the optimizer and the number of generations for which to
run the optimizer. The following example in pseudo-code demonstrates how to
use parameter sweeping for the execution of a
generic evolutionary algorithm.
individuals 5 {100, 200,300,500,1000}
generations 5 {50, 100,200,400}
foreach indiv in individuals do
foreach generation in generations do
task = generate_task(indiv, generation)
submit_task(task)

In this case 20 tasks are generated. The function generate task is specific
to the application and creates the task instance by substituting the values of
indiv and generation to the corresponding variables in the template definition.
The function submit task is specific to the middleware used and performs the
actual task submission.

Subject: Cloud Computing 94 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
A template task is in general a composition of operations template task is in
general a composition of operations concerning the execution of legacy
applications with the appropriate parameters and set of file system operations.
Frameworks that natively support the execution of parameter sweep
applications provide a set of useful commands for manipulating or operating on
files.
The commonly available commands are:
• Execute. Executes a program on the remote node.
• Copy. Copies a file to/from the remote node.
• Substitute. Substitutes the parameter values with their placeholders
inside a file.
• Delete. Deletes a file.

Figures 7.2 and 7.3 provide examples of two possible task templates, the
former as defined according to the notation used by Nimrod/G, and the latter
as required by Aneka.

The template file has two sections: a header for the definition of the parameters,
and a task definition section that includes shell commands mixed with Nimrod/G
commands.
The prefix node: identifies the remote location where the task is executed.
Parameters are identified with the
${. . .} notation.

Subject: Cloud Computing 95 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

The file is an XML document containing several sections, the most


important of which are shared Files, parameters, and task. Parameters contains
the definition of the parameters that will customize the template task. Two
different types of parameters are defined: a single value and a range parameter.
The shared Files section contains the files that are required to execute the task;
The task has a collection of input and output files for which local and remote
paths are defined, as well as a collection of commands.

MPI applications
Message Passing Interface (MPI) is a specification for developing parallel
programs that communicate by exchanging messages. MPI has originated as an
attempt to create common ground from the several distributed shared memory
and message-passing infrastructures available for distributed computing. Now a
days, MPI has become a de

Subject: Cloud Computing 96 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
facto standard for developing portable and efficient message-passing HPC
applications.

MPI provides developers with a set of routines that:


• Manage the distributed environment where MPI programs are executed
• Provide facilities for point-to-point communication
• Provide facilities for group communication
• Provide support for data structure definition and memory allocation
• Provide basic support for synchronization with blocking calls

The general reference architecture is depicted in Figure 7.4. A distributed


application in MPI is composed of a collection of MPI processes that are
executed in parallel in a distributed infrastructure that supports MPI.

MPI applications that share the same MPI runtime are by default as part
of a global group called MPI_COMM_WORLD. Within this group, all the
distributed processes have a unique identifier that allows the MPI runtime to
localize and address them. Each MPI process is assigned a rank within the group.
The rank is a unique identifier that allows processes to communicate with each
other within a group. To create an MPI application it is necessary to define the
code for the MPI process that will be executed in parallel. This program has, in
general, the structure described in Figure 7.5.
The section of code that is executed in parallel is clearly identified by two
operations that set up the MPI environment and shut it down, respectively. In
the code section, it is possible to use all the MPI functions to send or receive
messages in either asynchronous or synchronous mode. The diagram in Figure
7.5 might suggest that the MPI might allow the definition of completely
Subject: Cloud Computing 97 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
symmetrical applications, since the portion of code executed in each node is the
same.
A common model used in MPI is the master-worker model, where by one MPI
process coordinates the execution of others that perform the same task. Once
the program has been defined in one of the available MPI implementations, it is
compiled with a modified version of the compiler for the language. The output
of the compilation process can be run as a distributed application by using a
specific tool provided with the MPI implementation.
One of the most popular MPI software environments is developed by the
Argonne National Laboratory in the United States.

Workflow applications with task dependencies


Workflow applications are characterized by a collection of tasks that
exhibit dependencies among them. Such dependencies, which are mostly data
dependencies determine the way in which the applications are scheduled as
well as where they are scheduled.

1)What is a workflow?
A workflow is the automation of a business process, in whole or part,
during which documents, information, or tasks are passed from one participant
(a resource; human or machine) to another for action, according to a set of
procedural rules.
The concept of workflow as a structured execution of tasks that have
dependencies on each other has
demonstrated itself to be useful for expressing many scientific experiments and
gave birth to the idea of scientific workflow.

Subject: Cloud Computing 98 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
In the case of scientific workflows, the process is identified by an
application to run, the elements that are passed among participants are mostly
tasks and data, and the participants are mostly computing or storage nodes.
The set of procedural rules is defined by a workflow definition scheme
that guides the scheduling of the application.
A scientific workflow generally involves data management, analysis,
simulation, and middleware supporting the execution of the workflow.
A scientific workflow is generally expressed by a directed acyclic graph (DAG),
which defines the dependencies among tasks or operations.
The nodes on the DAG represent the tasks to be executed in a workflow
application; the arcs connecting the nodes identify the dependencies among
tasks and the data paths that connect the tasks.
The most common dependency that is realized through a DAG is data
dependency, which means that the output files of a task constitute the input
files of another task.

The DAG in Figure 7.6 describes a sample Montage workflow. Montage is a


toolkit for assembling images into mosaics; it has been specially designed to
support astronomers in composing the images taken from different telescopes
or points of view into a coherent image.
The workflow depicted in Figure 7.6 describes the general process for
composing a mosaic; the labels on the right describe the different tasks that
have to be performed to compose a mosaic. In the case presented in the
diagram, a mosaic is composed of seven images.
For each of the image files, the following process has to be performed: image
file transfer, reprojection, calculation of the difference, and common plane
placement.
Therefore, each of the images can be processed in parallel for these tasks.

Subject: Cloud Computing 99 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Workflow technologies
Business-oriented computing workflows are defined as compositions of
services.
There are specific languages and standards for the definition of workflows,
such as Business Process Execution Language (BPEL).
An abstract reference model for a workflow management system, as
depicted in Figure 7.7. Design tools allow users to visually compose a workflow
application.
This specification is stored in the form of an XML document based on a specific
workflow language and constitutes the input of the workflow engine, which
controls the execution of the workflow by leveraging a distributed
infrastructure. The workflow engine is a client- side component that might
interact directly with resources or with one or several middleware components
for executing the workflow.

Some of the most relevant technologies for designing and executing workflow-
based applications are:
1. Kepler,
2. DAGMan,
3. Cloudbus Workflow Management System, and
4. Offspring.

Subject: Cloud Computing 100 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
1. Kepler
Kepler is an open-source scientific workflow engine.
The system is based on the Ptolemy II system, which provides a solid platform
for developing data flow oriented workflows. Kepler provides a design
environment based on the concept of actors, which are reusable and
independent blocks of computation such as Web services, data- base calls. The
connection between actors is made with ports. An actor consumes data from
the input ports and writes data/results to the output ports. Kepler supports
different models, such as synchronous and asynchronous models.
The workflow specification is expressed using a proprietary XML language.

2. DAGMan
DAGMan (Directed Acyclic Graph Manager) constitutes an extension to
the Condor scheduler to handle job interdependencies. DAGMan acts as a
metascheduler for Condor by submitting the jobs to the scheduler in the
appropriate order.
The input of DAGMan is a simple text file that contains the information about
the jobs, pointers to their job submission files, and the dependencies among
jobs.

Subject: Cloud Computing 101 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

3. Cloud bus Workflow Management System


Cloud bus Workflow Management System (WfMS) is a middleware
platform built for managing large application workflows on distributed
computing platforms such as grids and clouds. It comprises software tools that
help end users compose, schedule, execute, and monitor workflow applications
through a Web-based portal. The portal provides the capability of uploading
workflows or defining new ones with a graphical editor. To execute workflows,
WfMS relies on the Gridbus Broker, a grid/cloud resource broker that supports
the execution of applications with quality-of- service (QoS) attributes.

4. Offspring
It offers a programming-based approach to developing workflows. Users
can develop strategies and plug them into the environment, which will execute
them by leveraging a specific distribution engine. The advantage provided by
Offspring is the ability to define dynamic workflows. This strategy represents a
semi structured workflow that can change its behaviour at runtime according to
the execution of specific tasks. This allows developers to dynamically control the
dependencies of tasks at runtime. Offspring supports integration with any
distributed computing middleware that can manage a simple BagOfTasks(BOT)
application. Offspring allows the definition of workflows in the form of plug-ins.

Q3) Aneka Task-Based Programming.


Aneka provides support for all the flavors of task-based programming by
means of the Task Programming Model, which constitutes the basic support
given by the framework for supporting the execution of bag-oftasks
applications.

Task programming is realized through the abstraction of the Aneka.


Tasks.ITask. By using This abstraction as a basis support for execution of legacy
applications, parameter sweep
Applications and workflows have been integrated into the framework.

Task programming model

The Task Programming Model provides a very intuitive abstraction for


quickly developing distributed applications on top of Aneka. It provides a
minimum set of APIs that are mostly centered on the Aneka.Tasks.ITask
interface. Figure 7.8 provides an overall view of the components of the Task
Programming Model and their roles during application execution.

Subject: Cloud Computing 102 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Developers create distributed applications in terms of ITask instances, the


collective execution of which describes a running application. These tasks,
together with all the required dependencies (data files and libraries), are
grouped and managed through the Aneka Application class, which is specialized
to support the execution of tasks. Two other components, AnekaTask and
TaskManager, constitute the client-side view of a task-based
application. The former constitutes the runtime wrapper Aneka uses to
represent a task within the middleware; the latter is the underlying component
that interacts with Aneka, submits the tasks, monitors their execution, and
collects the results. In the middleware, four services coordinate their activities
inorder to execute task-based applications. These are MembershipCatalogue,
TaskScheduler, ExecutionService, and StorageService.
MembershipCatalogue constitutes the main access point of the cloud and acts
as a service directory to locate the TaskScheduler service that is incharge of
managing the execution of task-based applications. Its main responsibility is to
allocate task instances to resources featuring the Execution Service for task
execution and for monitoring task state.

Developing applications with the task model


Execution of task-based applications involves several components.
The development of such applications is limited to the following operations:
• Defining classes implementing the ITask interface
• Creating a properly configured AnekaApplication instance
• Creating ITask instances and wrapping them into AnekaTask instances
• Executing the application and waiting for its completion.
1. ITask and AnekaTask
2. Controlling task execution
3. File management
Subject: Cloud Computing 103 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
4. Task libraries
5. Web services integration

1. ITask and AnekaTask


All the client-side features for developing task-based applications with
Aneka are contained in the Aneka.Tasks namespace (Aneka.Tasks.dll). The
most important component for designing tasks is the ITask interface, which is
defined in Listing7.1.
This interface exposes only one method: Execute. The method is invoked in
order to execute the task on the remote node.

The ITask interface provides a programming approach for developing


native tasks, which means tasks implemented in any of the supported
programming languages of the .NET framework. The restrictions on
implementing task classes are minimal; they need to be serializable, since task
instances are created and moved over the network. ITask provides minimum
restrictions on how to implement a task class and decouples the specific
operation
of the task from the runtime wrapper classes. It is required for managing tasks
within Aneka. This role is performed by the AnekaTask class that represents the
task instance in accordance with the Aneka application model APIs. This class
extends the Aneka.Entity.WorkUnit class and provides the feature for
embedding ITask instances. AnekaTask is mostly used internally, and for end
users it provides facilities for specifying input and output files for the task.

Listing 7.2 describes a simple implementation of a task class that computes the
Gaussian distribution for a given point x.

Subject: Cloud Computing 104 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Listing 7.3 describes how to wrap an ITask instance into an AnekaTask. It also
shows how to add input and output files specific to a given task. The Task
Programming Model leverages the basic capabilities for file management that
belong to the WorkUnit class, from which the AnekaTask class inherits.
WorkUnit has two collections of files, Input Files and Output Files; developers
can add files to these collections and the runtime environment will
automatically move these files where it is necessary. Input files will be staged
into the Aneka Cloud and moved to the remote node where the task is executed.
Output files will be collected from the execution node and moved to the local
machine or a remote FTP server.

Subject: Cloud Computing 105 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
2. Controlling task execution
Task classes and Aneka Task define the computation logic of a task-
based application.
Aneka Application class provides the basic feature for implementing the
coordination logic of the application. In task programming, it assumes the form
of Aneka Application<AnekaTask, TaskManager>.
The operations provided for the task model are:
• Static and dynamic task submission
• Application state and task state monitoring
• Event-based notification of task completion or failure.

Static submission is a very common pattern in the case of task-based


applications, and it involves the creation of all the tasks that need to be executed
in one loop and their submission as a single bag. Dynamic submission of tasks is
a more efficient technique and involves the submission of tasks as a result of the
event-based notification mechanism implemented in the Aneka Application
class.

Listing 7.4 shows how to create and submit 400 Gauss tasks as a bag by using
the static submission approach. Each task can be referenced using its unique
identifier (Work Unit.Id) by the indexer operator [] applied to the application
class. In the case of static submission, the tasks are added to the application,
and the method SubmitExecution() is called.

A different scenario is constituted by dynamic submission, where tasks


are submitted as a result of other events that occur during the execution—for
example, the completion or the failure of previously submitted tasks or other
conditions that are not related to the interaction of Aneka.
Listing 7.5 extends the previous example and implements a dynamic task
submission strategy for refining the computation of Gaussian distribution.

Subject: Cloud Computing 106 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
To capture the failure and the completion of tasks, it is necessary to listen
to the events WorkUnitFailed and WorkUnitFinished. This class exposes a
WorkUnit property that, if not null, gives access to the task instance. The event
handler for the task failure simply dumps the information that the task is failed
to the console with, if possible, additional information about the error that
occurred. The event handler for task completion checks whether the task
completed was submitted within the original bag, and in this case submits
another task by using the ExecuteWorkUnit(AnekaTasktask) method. To
discriminate tasks submitted within the initial bag and other tasks, the value of
GaussTask.X is used. If X contains a value with no fractional digits, it is an initial
task; otherwise, it is not. In designing the coordination logic of the application,
it is important to note that the task sub- mission identifies an asynchronous
execution pattern, which means that the method SubmitExecution, as well as
the method ExecuteWorkUnit, returns when the submission of tasks is
completed, but not the actual completion of tasks. This requires the developer
to put in place the proper synchronization logic to let the main thread of the
application wait until all the tasks are terminated and the application is
completed.

3. File management
Task-based applications normally deal with files to perform their
operations. Files may constitute input data for tasks, may contain the result of a
computation, or may represent
executable code or library dependencies. Any model based on the WorkUnit and
ApplicationBase classes has built-in support for file management. A fundamental
component for the management of files is the FileData class, which constitutes
the logic representation of physical files, as defined in the Aneka.Data.Entity
namespace (Aneka. Data.dll).
A File Data instance provides information about a file:
• Its nature: whether it is a shared file, an input file, or an outputfile
• Its path both in the local and in the remote file system, including a
different name
• A collection of attributes that provides other information.

4 Task libraries
Aneka provides a set of ready-to-use tasks for performing the most basic
operations for remote file management. These tasks are part of the
Aneka.Tasks.BaseTasks namespace, which is part of the Aneka.Tasks.dll library.
The following operations are implemented:
• File copy. The Local Copy Task performs the copy of a file on the remote
node; it takes a file as input and produces a copy of it under a different
name or path.

Subject: Cloud Computing 107 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Legacy application execution. The Execute Task allows executing
external and legacy applications by using the System.Diagnostics.Process
class. It requires the location of the executable file to run, and it is also
possible to specify command-line parameters.
• Substitute operation. The Substitute Task performs a search-and-
replace operation within a given file by saving the resulting file under a
different name.
• File deletion. The Delete Task deletes a file that is accessible through the
file system on the remote node.
• Timed delay. The Wait Task introduces a timed delay. This task can be
used in several scenarios; it can be used for profiling or for simulation of
the execution.
• Task composition. The Composite Task implements the composite
pattern and allows expressing a task as a composition of multiple tasks
that are executed in sequence.

5 Web services integration


The task submission Web service is an additional component that can be
deployed in any ASP.NET Web server and that exposes a simple interface for job
submission, which is compliant with the Aneka Application Model. The task Web
service provides an interface that is more compliant with the traditional way
fostered by grid computing. The reference scenario for Web-based submission
is depicted in Figure 7.9.
Users create a distributed application instance on the cloud, they can submit
jobs querying the status of the application or a single job. It is up to the users to
then terminate the application when all the jobs are completed or abort it if
there is no need to complete job execution.

Subject: Cloud Computing 108 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Data Intensive Computing, Map-Reduce Programming


Q4) Data Intensive Computing
Data Intensive Computing is a class of parallel computing which uses data
parallelism in order to process large volumes of data. The size of this data is
typically in terabytes or petabytes. This large amount of data is generated each
day and it is referred to Big Data.
In 2007, exactly a decade ago, An IDC white paper sponsored by EMC
Corporation estimated the amount of information currently stored in a digital
form in 2007 at 281 exabytes. One can only imagine how massive it would be
today.
The figure revealed by IDC proves that the amount of data generated is beyond
the capacity to analyse it. The same methods cannot be used to in this case,
which are generally used to solve the usual traditional problems in
computational science.
In order to tackle the problem, the companies are coming up with a tool or set
of tools.
Data intensive computing characteristics
Data intensive computing has some characteristics which are different from
other forms of computing. They are:
• In order to achieve high performance in data intensive computing, it is
necessary to minimize the movement of data. This reduces system
overhead and increases performance by allowing the algorithms to
execute on the node where the data resides.
• The data intensive computing system utilizes a machine independent
approach where the run time system controls the scheduling, execution,
load balancing, communications and the movement of programs.
• Data intensive computing hugely focuses on reliability and availability of
data. Traditional large scale systems may be susceptible to hardware
failures, communication errors and software bugs, and data intensive
computing is designed to overcome these challenges.
• Data intensive computing is designed for scalability so it can
accommodate any amount of data and so it can meet the time critical
requirements. Scalability of the hardware as well as the software
architecture is one of the biggest advantages of data intensive computing.

Subject: Cloud Computing 109 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

2.3 Requirements and Expectations of Data-Intensive Clouds


Data Intensive computing refers to computing of largescale data. Gorton
describe types of applications and research issues for data intensive systems.
Such systems may either include pure data-intensive systems or they may also
contain data/compute-intensive systems. In that, the former type of systems
devotes most of their time to data manipulation or data I/O, whereas in the
latter type data computation is dominant. Normally, parallelization techniques
and high-performance computing are adopted to encounter
the challenges related to data/compute-intensive systems.

With the growth of data-intensive computing, traditional differences


between data/compute intensive systems and pure data-intensive systems have
started to merge and both are collectively referred as data-intensive systems.
Major research issues for data-intensive systems include management,
handling, fusion, and analysis of data. Often, time-sensitive applications are also
deployed on data-intensive systems.

The Pacific Northwest National Laboratory has proposed a comprehensive


definition: Data Intensive computing is managing, analyzing, and understanding
data at volumes and rates that push the frontiers of current technologies.
A wide set of requirements and issues arise when data-intensive
applications are deployed on clouds. The cloud must be scalable and available.
It should also facilitate huge data analysis and massive input/output operations.
Considering the administrative challenges and the development requirements a

Subject: Cloud Computing 110 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
cloud should offer, we propose the following definition for data-intensive cloud
computing:

Data-intensive cloud computing involves study of both programming


techniques and platforms to solve data-intensive tasks and management and
administration of hardware and software which can facilitate these solutions.
Depending upon its usage, a data-intensive cloud could either be deployed as a
private cloud—supporting users of a specific organization, or it may be deployed
as a public cloud— providing shared resources to a number of users. A data-
intensive cloud entails many challenges and issues. These include data-centric
issues such as implementing efficient algorithms and techniques to store,
manage, retrieve, and analyse the data and communication-centric issues such
as dissipation of information, placement of replicas, data locality, and retrieval
of data. Note that issues in the two categories may be interrelated. For instance,
data locality often leads to faster execution of data.
Grossman and Gu discussed varieties of cloud infrastructures for data
intensive computing. Figure 1 illustrates the two architectural models for such a
system: a cloud could provide EC2like instances for data-intensive computing,
or it could offer computing platforms (like MapReduce) to its users. In the former
case, a user is required to select tools and a platform for computing, and the
cloud provider is responsible for storage and computing power. The provider is
also liable for replication, fault tolerance, and consistency. In comparison, for
platform-based cloud computing, application-specific solutions exist which
provide enhanced performance.

Fig. 1 Architecture model of data-intensive cloud computing

Requirements and Expectations of Data-Intensive Clouds

A data intensive cloud system entails several requirements related to


scalability, availability and elasticity. Further, issues such as infrastructure
support, hardware issues, and software platforms are also important.
Depending upon the scope of an application and the type of services a cloud
provides, these requirements may vary for each application. Note that a data-
intensive cloud is different from a traditional cloud. In that, the former is

Subject: Cloud Computing 111 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
capable to process and manage massive amount of data. However, in addition
to the challenges related to data processing and management, a data-intensive
system should also meet the requirements of a traditional cloud system
such as scalability, fault tolerance, and availability. We now describe significant
requirements for data-intensive clouds. We have mentioned these
requirements with respect to data-intensive computing.
1) Scalability
A data-intensive cloud should be able to support a large number of users
without any noticeable performance degradation. Large scaling may be
achieved through addition of commodity hardware.

2) Availability and Fault Tolerance


The strict requirement of availability is tied with the ability of the system
to tolerate faults. Faults could occur at the infrastructure/ physical layer or
they could also arise at the platform (or application) layer. As mentioned, in
analytical systems, fault tolerance denotes the capability of the system to
facilitate query execution with little interruption. Comparatively, in
transactional systems, ACID guarantees must be ensured [1]. Overall, the system
should have the ability to sustain both the transient failures (such as network
congestion, bandwidth limitation and CPU availability) and persistent failures
(such as network outages, power faults, and disk failures).

3) Flexibility and Efficient U ser Access Mechanism


A data-intensive cloud should facilitate a flexible development
environment in which desired tasks and queries should be easily implemented.
A significant requirement is to facilitate efficient mechanism for data access. For
intensive tasks, the framework should also support parallel and high
performance access and computing methods.

4) Elasticity
Elasticity refers to the capability of the cloud to utilize system resources
as per the needs and usage. This implies that more capacity can be added to
existing system [30]. The resources may shrink or grow according to the current
state of the cloud.

5) Sharing—Effective Resource Utilization


Many applications share clouds for their computation. This is specifically
true for a private cloud. The authors mentioned that data is shared between
multiple applications of Facebook. Sharing reduces the overhead of data-
duplication and yields in better resource utilization. Efficient and effective
mechanisms are needed to facilitate this sharing requirement.

Subject: Cloud Computing 112 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
6) Heterogeneous Environment
The cloud system should support heterogeneous infrastructure.
Homogenous configuration is not always possible for data-intensive systems.
In such an environment, issues such as differing computation power across
cloud machines, varying disk speeds, and networking hardware with dissimilar
capacity are not infrequent. Consequently, a cloud may have to encounter
varying delays.

7) Data Placement and Data Locality


Big data systems have complex requirements of data placement. Issues to
be considered include, data locality, fast data loading and query
processing, efficient storage space utilization, reduce network overhead, ability
to support various work patterns, and low power. Multiple copies of data sets
may be maintained to achieve fault tolerance, load balancing, availability, and
data locality. Consistency requirements vary with the type of application being
hosted on the cloud. It has also been suggested that data-intensive applications
with strong consistency requirements are less likely to be deployed on clouds.

8) Effective Data Handling


Fault tolerance should be aided by effective data handling. For instance,
many tasks in data intensive computing are multi-stage. Handling of
intermediate data is important for such tasks. A failure in intermediate steps of
the work flow should not drastically effect system execution.

9) Effective Storage Mechanism


The storage mechanism should facilitate fast and efficient retrieval of
documents. Since data is distributed, effective utilization of disk is important.

10) Support For Large Data Sets


In a cloud environment, data-intensive systems should provide scalable
support for huge datasets. A cloud should be able to execute a large number of
queries with only a small latency. Considering the varieties of data-intensive
computing, characteristics such as huge files and large number of small files in
the directory are also beneficial.
11) Privacy and Access Control
In cloud computing, data is outsourced and stored on cloud servers. With
this requirement, issues of data protection data privacy are induced. Although
encryption may be used to protect sensitive data, it induces additional cost of
encryption or decryption in the system.

Subject: Cloud Computing 113 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
12) Billing
For a public cloud, an efficient billing mechanism is needed as it covers
the cost of cloud operations. A user may be charged on the basis of three
components.
These include (i) data storage, (ii) data access, and (iii) data computation.
The inclusion of these components in the billing may vary depending upon the
type of service a provider offers to its customers.

13) Power Efficiency


Data intensive cluster consume tons of electrical power. Low-power solutions
save infrastructural cost and ease cooling requirements. In a power constraint
environment, such solutions could also yield lead to enhanced capacity and
increased computational power.

14) Efficient Network Setup


Cloud providers use over-provisioning for profit maximization. In a multi-
user cloud environment, network problems such as congestion, bandwidth
limitation, and excessive network delays could be induced. Problems such as
high packet loss and TCP Incast could also arise. A data intensive cloud should
be able to encounter these challenges. Effective bandwidth utilization, efficient
downloading and uploading, and low latency data-access are critical
requirements. However, with multiple users accessing intensive data in the
cloud, network problems such as congestion, bandwidth limitations, and TCP
Incast are plausible.

15) Efficiency
A data-intensive computing system must be efficient in fulfilling its core tasks.
Intensive tasks require multi-stage pipeline execution, intelligent workflows,
and effective distribution and retrieval capabilities. These requirements
collectively determine the efficiency of the system. With the diversity in data
intensive computing, algorithms and techniques also vary for each application.
For instance, some algorithms (such as page-rank or N-body computation)
require optimization for iterative computation.

Subject: Cloud Computing 114 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q5) What is Map Reduce Programming model? An application is generating a
big data. Suggest the Map Reduce computation workflow with neat diagram
to manage the big data. Write Map and Reduce functions for Word Counter
Problem.

Subject: Cloud Computing 115 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

MapReduce consists of 2 steps:


• Map Function – It takes a set of data and converts it into another set
of data, where individual elements are broken down into tuples (KeyValue
pair).
Example – (Map function in Word Count)

Subject: Cloud Computing 116 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Reduce Function – Takes the output from Map as an input and combines
those data tuples into a smaller set of tuples.

Example – (Reduce function in Word Count)

Q6) What is data intensive computing?


Data-intensive computing focuses on aa class of applications that deal
with a large amount of data. Several application fields, ranging from
computational science to social networking, produce large volumes of data that
need to be efficiently stored, made accessible, indexed, and analysed.
Distributed computing is definitely of help in addressing these challenges by
providing more scalable and efficient storage architectures and a better
performance in terms of data computation and processing. This chapter
characterizes the nature of data-intensive computing and presents an overview
of the challenges introduced by production of large volumes of data and how
they are handled by storage systems and computing models. It describes
MapReduce, which is a popular programming model for creating data-intensive
applications and their deployment on clouds.

What is data-intensive computing?


Data-intensive computing is concerned with production, manipulation, and
analysis of large-scale data in
the range of hundreds of megabytes (MB) to petabytes (PB) and beyond.
Dataset is commonly used to identify a collection of information elements that
is relevant to one or more
applications. Datasets are often maintained in repositories, which are
infrastructures supporting the storage,
Subject: Cloud Computing 117 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
retrieval, and indexing of large amounts of information.
To facilitate classification and search, relevant bits of information, called
metadata, are attached to datasets.
Data-intensive computations occur in many application domains.
Computational science is one of the most popular ones. People conducting
scientific simulations and
experiments are often keen to produce, analyze, and process huge volumes of
data. Hundreds of gigabytes of
data are produced every second by telescopes mapping the sky; the collection
of images of the sky easily
reaches the scale of petabytes over a year.
Bioinformatics applications mine databases that may end up containing
terabytes of data.
Earthquake simulators process a massive amount of data, which is produced as
a result of recording the
vibrations of the Earth across the entire globe.

1 Characterizing data-intensive computations


2 Challenges ahead
3 Historical perspective
1 The early age: high-speed wide-area networking
2 Data grids
3 Data clouds and “Big Data”
4 Databases and data-intensive computing

1) Characterizing data-intensive computations


Data-intensive applications deals with huge volumes of data, also exhibit
compute-intensive properties.
Figure 8.1 identifies the domain of data-intensive computing in the two upper
quadrants of the graph.
Data-intensive applications handle datasets on the scale of multiple terabytes
and petabytes.

Subject: Cloud Computing 118 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Challenges ahead
The huge amount of data produced, analyzed, or stored imposes
requirements on the supporting infrastructures
and middleware that are hardly found in the traditional solutions.
Moving terabytes of data becomes an obstacle for high-performing
computations.
Data partitioning, content replication and scalable algorithms help in
improving the performance.
Open challenges in data-intensive computing given by Ian Gorton et al. are:
1. Scalable algorithms that can search and process massive datasets.
2. New metadata management technologies that can handle complex,
heterogeneous, and distributed data
sources.
3. Advances in high-performance computing platforms aimed at providing a
better support for accessing
in-memory multiterabyte data structures.
4. High-performance, highly reliable, petascale distributed file systems.
5. Data signature-generation techniques for data reduction and rapid
processing.
6. Software mobility that are able to move the computation to where the data
are located.
7. Interconnection architectures that provide better support for filtering multi
gigabyte datastreams coming
Subject: Cloud Computing 119 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
from high-speed networks and scientific instruments.
8. Software integration techniques that facilitate the combination of software
modules running on different
platforms to quickly form analytical pipelines.

Historical perspective
Data-intensive computing involves the production, management, and
analysis of large volumes of data.
Support for data-intensive computations is provided by harnessing storage,
networking, technologies,
algorithms, and infrastructure software all together.

1) The early age: high-speed wide-area networking


In 1989, the first experiments in high-speed networking as a support for
remote visualization of scientific
data led the way.
Two years later, the potential of using high-speed wide area networks for
enabling high-speed, TCP/IP-based
distributed applications was demonstrated at Supercomputing 1991 (SC91).
Kaiser project, leveraged the Wide Area Large Data Object (WALDO) system,
used to provide following
capabilities:
1. automatic generation of metadata;
2. automatic cataloguing of data and metadata processing data in real time;
3. facilitation of cooperative research by providing local and remote users access
to data; and
4. mechanisms to incorporate data into databases and other documents.
The Distributed Parallel Storage System (DPSS) was developed, later
used to support TerraVision, a terrain
visualization application that lets users explore and navigate a tri-dimensional
real landscape.
Clipper project, the goal of designing and implementing a collection of
independent, architecturally consistent
service components to support data-intensive computing. The challenges
addressed by Clipper project include
management of computing resources, generation or consumption of high-rate
and high-volume data flows,
human interaction management, and aggregation of resources.

2) Data grids
Huge computational power and storage facilities could be obtained by
harnessing heterogeneous resources
Subject: Cloud Computing 120 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
across different administrative domains.
Data grids emerge as infrastructures that support data-intensive computing.
A data grid provides services that help users discover, transfer, and manipulate
large datasets stored in
distributed repositories as well as create and manage copies of them.
Data grids offer two main functionalities:
● high-performance and reliable file transfer for moving large amounts of data,
and
● scalable replica discovery and management mechanisms.
Data grids mostly provide storage and dataset management facilities as support
for scientific experiments that
produce huge volumes of data.
Datasets are replicated by infrastructure to provide better availability.
Data grids have their own characteristics and introduce new challenges:
1. Massive datasets. The size of datasets can easily be on the scale of gigabytes,
terabytes, and beyond. It is
therefore necessary to minimize latencies during bulk transfers, replicate
content with appropriate
strategies, and manage storage resources.
2. Shared data collections. Resource sharing includes distributed collections of
data. For example,
repositories can be used to both store and read data.
3. Unified namespace. Data grids impose a unified logical namespace where to
locate data collections and
resources. Every data element has a single logical name, which is eventually
mapped to different physical
filenames for the purpose of replication and accessibility.
4. Access restrictions. Even though one of the purposes of data grids is to
facilitate sharing of results and
data for experiments, some users might want to ensure confidentiality for their
data and restrict access to
them to their collaborators. Authentication and authorization in data grids
involve both coarse-grained and
fine-grained access control over shared data collections.
As a result, several scientific research fields, including high-energy physics,
biology, and astronomy, leverage
data grids.

3) Data clouds and “Big Data”


Together with the diffusion of cloud computing technologies that support
data-intensive computations, the term

Subject: Cloud Computing 121 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Big Data has become popular. Big Data characterizes the nature of data-
intensive computations today and
currently identifies datasets that grow so large that they become complex to
work with using on-hand database
management tools.
In general, the term Big Data applies to datasets of which the size is beyond the
ability of commonly used
software tools to capture, manage, and process within a tolerable elapsed time.
Therefore, Big Data sizes are a
constantly moving target, currently ranging from a few dozen tera-bytes to
many petabytes of data in a single
dataset.
Cloud technologies support data-intensive computing in several ways:
1. By providing a large amount of compute instances on demand, which can be
used to process and analyze
large datasets in parallel.
2. By providing a storage system optimized for keeping large blobs of data and
other distributed data store
architectures.
3. By providing frameworks and programming APIs optimized for the processing
and management of large
amounts of data.
A data cloud is a combination of these components.
Ex 1: MapReduce framework, which provides the best performance for
leveraging the Google File System on
top of Google’s large computing infrastructure.
Ex 2: Hadoop system, the most mature, large, and open-source data cloud. It
consists of the Hadoop Distributed
File System (HDFS) and Hadoop’s implementation of MapReduce.
Ex 3: Sector, consists of the Sector Distributed File System (SDFS) and a compute
service called Sphere that
allows users to execute arbitrary user-defined functions (UDFs) over the data
managed by SDFS.
Ex 4: Greenplum uses a shared-nothing massively parallel processing (MPP)
architecture based on commodity
hardware.

4) Databases and data-intensive computing


Distributed databases are a collection of data stored at different sites of
a computer network. Each site might
expose a degree of autonomy, providing services for the execution of local
applications, but also participating
Subject: Cloud Computing 122 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
in the execution of a global application.
A distributed database can be created by splitting and scattering the data of an
existing database over different
sites or by federating together multiple existing databases. These systems are
very robust and provide
distributed transaction processing, distributed query optimization, and efficient
management of resources.

Q7) Technologies for data-intensive computing


Data-intensive computing concerns the development of applications that
are mainly focused on processing large
quantities of data.
Therefore, storage systems and programming models constitute a natural
classification of the technologies
supporting data-intensive computing.

Storage systems
1. High-performance distributed file systems and storage clouds
2. NoSQL systems
Programming platforms
1. The MapReduce programming model.
2. Variations and extensions of MapReduce.
3. Alternatives to MapReduce.
Storage systems
Traditionally, database management systems constituted the de facto
storage. Due to the explosion of unstructured data in the form of blogs, Web
pages, software logs, and sensor readings,the relational model in its original
formulation does not seem to be the preferred solution for supporting data
analytics on a large scale. Some factors contributing to change in database are:
A. Growing of popularity of Big Data. The management of large quantities of
data is no longer a rare case but instead has become common in several fields:
scientific computing, enterprise applications, media entertainment, natural
language processing, and social network analysis.
B. Growing importance of data analytics in the business chain. The
management of data is no longer considered a cost but a key element of
business profit. This situation arises in popular social networks such as
Facebook, which concentrate their focus on the management of user profiles,
interests, and connections among people.
C. Presence of data in several forms, not only structured. As previously
mentioned, what constitute relevant information today exhibits a
heterogeneous nature and appears in several forms and formats.

Subject: Cloud Computing 123 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
D. New approaches and technologies for computing. Cloud computing
promises access to a massive
amount of computing capacity on demand. This allows engineers to design
software systems that
incrementally scale to arbitrary degrees of parallelism.

1. High-performance distributed file systems and storage clouds


Distributed file systems constitute the primary support for data
management. They provide an interface whereby to store information in the
form of files and later access them for read and write operations.
a. Lustre. The Lustre file system is a massively parallel distributed file system
that covers the needs of a small workgroup of clusters to a large-scale computing
cluster. The file system is used by several of the Top 500 supercomputing
system. Lustre is designed to provide access to petabytes (PBs) of storage to
serve thousands of clients with an I/O. throughput of hundreds of gigabytes per
second (GB/s). The system is composed of a metadata server that contains the
metadata about the file system and a collection of object storage servers that
are in charge of providing storage.
b. IBM General Parallel File System (GPFS). GPFS is the high-performance
distributed file system developed by IBM that provides support for the RS/6000
supercomputer and Linux computing clusters. GPS is a multiplatform distributed
file system built over several years of academic research and provides advanced
recovery mechanisms. GPFS is built on the concept of shared disks, in which a
collection of disks is attached to the file system nodes by means of some
switching fabric. The file system makes this infrastructure transparent to users
and stripes large files over the disk array by replicating portions of the file to
ensure high availability.
c. Google File System (GFS). GFS is the storage infrastructure that supports the
execution of distributed
applications in Google’s computing cloud.
GFS is designed with the following assumptions:
1. The system is built on top of commodity hardware that often fails.
2. The system stores a modest number of large files; multi-GB files are
common and should be treated
efficiently, and small files must be supported, but there is no need to
optimize for that.
3. The workloads primarily consist of two kinds of reads: large streaming
reads and small random reads.
4. The workloads also have many large, sequential writes that append
data to files.
5. High-sustained bandwidth is more important than low latency.

Subject: Cloud Computing 124 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The architecture of the file system is organized into a single master, which
contains the metadata of the entire file system, and a collection of chunk
servers, which provide storage space. From a logical point of view the
system is composed of a collection of software daemons, which implement
either the master server or the chunk server.
d. Sector. Sector is the storage cloud that supports the execution of data-
intensive applications defined according to the Sphere framework. It is a user
space file system that can be deployed on commodity hardware across a wide-
area network. The system’s architecture is composed of four nodes: a security
server, one or more master nodes, slave nodes, and client machines. The
security server maintains all the information about access control policies for
user and files, whereas master servers coordinate and serve the I/O requests of
clients, which ultimately interact with slave nodes to access files. The protocol
used to exchange data with slave nodes is UDT, which is a lightweight
connection-oriented protocol.
e. Amazon Simple Storage Service (S3). Amazon S3 is the online storage service
provided by Amazon. The system offers a flat storage space organized into
buckets, which are attached to an Amazon Web Services (AWS) account. Each
bucket can store multiple objects, each identified by a unique key. Objects are
identified by unique URLs and exposed through HTTP, thus allowing very simple
get-put semantics.

2. NoSQL systems
The term Not Only SQL (NoSQL) was coined in 1998 to identify a set of UNIX
shell scripts and commands to operate on text files containing the actual data.
NoSQL cannot be considered a relational database, it is a collection of scripts
that allow users to manage most of the simplest and more common database
tasks by using text files as information stores.
Two main factors have determined the growth of the NoSQL:
1. simple data models are enough to represent the information used by
applications, and
2. the quantity of information contained in unstructured formats has
grown.

Let us now examine some prominent implementations that support data-


intensive applications.
a. Apache CouchDB and MongoDB.
Apache CouchDB and MongoDB are two examples of document stores.
Both provide a schema-less store whereby the primary objects are documents
organized into a collection of key-value fields. The value of each field can be of
type string, integer, float, date, or an array of values.

Subject: Cloud Computing 125 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The databases expose a RESTful interface and represent data in JSON format.
Both allow querying and indexing data by using the MapReduce programming
model, expose JavaScript as a base language for data querying and manipulation
rather than SQL, and support large files as documents.
b. Amazon Dynamo.
The main goal of Dynamo is to provide an incrementally scalable and
highly available storage system. This goal helps in achieving reliability at a
massive scale, where thousands of servers and network components build an
infrastructure serving 10 million requests per day. Dynamo provides a simplified
interface based on get/put semantics, where objects are stored and retrieved
with a unique identifier (key). The architecture of the Dynamo system, shown in
Figure 8.3, is composed of a collection of storage peers organized in a ring that
shares the key space for a given application. The key space is partitioned among
the storage peers, and the keys are replicated across the ring, avoiding adjacent
peers. Each peer is configured with access to a local storage facility where
original objects and replicas are stored Each node provides facilities for
distributing the updates among the rings and to detect failures and unreachable
nodes.
c. Google Bigtable.
Bigtable provides storage support for several Google applications that
expose different types of workload: from throughput-oriented batch-processing
jobs to latency-sensitive serving of data to end users Bigtable’s key design goals
are wide applicability, scalability, high performance, and high availability. To
achieve these goals, Bigtable organizes the data storage in tables of which the
rows are distributed over the distributed file system supporting the middleware,
which is the Google File System. From a logical point of view, a table is a
multidimensional sorted map indexed by a key that is represented by a
string of arbitrary length. A table is organized into rows and columns; columns
can be grouped in column family, which allow for specific optimization for better
access control, the storage and the indexing of data Bigtable APIs also allow
more complex operations such as single row transactions and advanced data
manipulation.

Subject: Cloud Computing 126 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Figure 8.4 gives an overview of the infrastructure that enables Bigtable.


The service is the result of a collection of processes that coexist with other
processes in a cluster-based environment. Bigtable identifies two kinds of
processes: master processes and tablet server processes. A tablet server is
responsible for serving the requests for a given tablet that is a contiguous
partition of rows of a table. Each server can manage multiple tablets (commonly
from 10 to 1,000). The master server is responsible for keeping track of the
status of the tablet servers and of the allocation of tablets to tablet servers. The
server constantly monitors the tablet servers to check whether they are alive,
and in case they are not reachable, the allocated tablets are reassigned and
eventually partitioned to other servers.
d. Apache Cassandra.
The system is designed to avoid a single point of failure and offer a highly
reliable service. Cassandra was initially developed by Facebook; now it is part of
the Apache incubator initiative. Currently, it provides storage support for several
very large Web applications such as Facebook itself, Digg, and Twitter.
The data model exposed by Cassandra is based on the concept of a table that is
implemented as a distributed multidimensional map indexed by a key. The
value corresponding to a key is a highly structured object and constitutes the
row of a table. Cassandra organizes the row of a table into columns, and sets of
columns can be grouped into column families. The APIs provided by the system
to access and manipulate the data are very simple: insertion, retrieval, and

Subject: Cloud Computing 127 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
deletion. The insertion is performed at the row level; retrieval and deletion can
operate at the column level.
e. Hadoop HBase.
HBase is designed by taking inspiration from Google Bigtable; its main goal
is to offer real-time read/write operations for tables with billions of rows and
millions of columns by leveraging clusters of commodity hardware. The internal
architecture and logic model of HBase is very similar to Google Bigtable, and the
entire system is backed by the Hadoop Distributed File System (HDFS).

Programming platforms
Programming platforms for data-intensive computing provide higher-
level abstractions, which focus on the processing of data and move into the
runtime system the management of transfers, thus making the data always
available where needed This is the approach followed by the MapReduce
programming platform, which expresses the computation in the form of two
simple functions—map and reduce—and hides the complexities of managing
large and numerous data files into the distributed file system supporting the
platform. In this section, we discuss the characteristics of MapReduce and
present some variations of it.
1. The MapReduce programming model.
MapReduce expresses the computational logic of an application in two simple
functions: map and reduce. Data transfer and management are completely
handled by the distributed storage infrastructure (i.e., the Google File System),

Subject: Cloud Computing 128 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
which is in charge of providing access to data, replicating files, and eventually
moving them where needed. the MapReduce model is expressed in the form of
the two functions, which are defined as follows:

The map function reads a key-value pair and produces a list of key-value
pairs of different types. The reduce function reads a pair composed of a key and
a list of values and produces a list of values of the same type. The types
(k1,v1,k2,kv2) used in the expression of the two functions provide hints as to
how these two functions are connected and are executed to carry out the
computation of a MapReduce job: The output of map tasks is aggregated
together by grouping the values according to their corresponding keys and
constitutes the input of
Reduce tasks that, for each of the keys found, reduces the list of attached
values to a single value. Therefore, the input of a MapReduce computation is
expressed as a collection of key-value pairs < k1,v1 >, and the final output is
represented by a list of values: list(v2).
Figure 8.5 depicts a reference workflow characterizing MapReduce
computations. As shown, the user submits a collection of files that are
expressed in the form of a list of < k1,v1> pairs and specifies the map and reduce
functions. These files are entered into the distributed file system that supports
MapReduce and, if necessary, partitioned in order to be the input of map tasks.
Map tasks generate intermediate files that store collections of < k2, list(v2) >
pairs, and these files are saved into the distributed file system. These files
constitute the input of reduce tasks, which finally produce output files in the
form of list(v2).

Subject: Cloud Computing 129 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

The computation model expressed by MapReduce is very straightforward and


allows greater productivity for people who have to code the algorithms for
processing huge quantities of data In general, any computation that can be
expressed in the form of two major stages can be represented in terms of
MapReduce computation.
These stages are:
1. Analysis. This phase operates directly on the data input file and corresponds
to the operation performed by the map task. Moreover, the computation at this
stage is expected to be embarrassingly parallel, since map tasks are executed
without any sequencing or ordering.
2. Aggregation. This phase operates on the intermediate results and is
characterized by operations that are aimed at aggregating, summing, and/or
elaborating the data obtained at the previous stage to present the data in
their final form. This is the task performed by the reduce function.
Figure 8.6 gives a more complete overview of a MapReduce infrastructure,
according to the implementation proposed by Google. As depicted, the user
submits the execution of MapReduce jobs by using the client libraries that are
in charge of submitting the input data files, registering the map and reduce
functions, and returning control to the user once the job is completed. A generic
distributed infrastructure (i.e., a cluster) equipped with job-scheduling
capabilities and distributed storage can be used to run MapReduce applications.
Two different kinds of processes are run on the distributed infrastructure:
a master process and a worker process. The master process is in charge of
controlling the execution of map and reduce tasks, partitioning, and

Subject: Cloud Computing 130 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
reorganizing the intermediate output produced by the map task in order to feed
the reduce tasks. The master process generates the map tasks and assigns input
splits to each of them by balancing the load. The worker processes are used to
host the execution of map and reduce tasks and provide basic I/O facilities
that are used to interface the map and reduce tasks with input and output files.
Worker processes have input and output buffers that are used to optimize the
performance of map and reduce tasks. In particular, output buffers for map
tasks are periodically dumped to disk to create intermediate files.
Intermediate files are partitioned using a user-defined function to evenly split
the output of map tasks.

2. Variations and extensions of MapReduce.


MapReduce constitutes a simplified model for processing large quantities
of data and imposes constraints on the way distributed algorithms should be
organized to run over a MapReduce infrastructure. Therefore, a series of
extensions to and variations of the original MapReduce model have been
proposed. They aim at extending the MapReduce application space and
providing developers with an easier interface for designing distributed
algorithms. We briefly present a collection of MapReduce-like frameworks and
discuss how they differ from the original MapReduce model.
A. Hadoop.
B. Pig.
C. Hive.
D. Map-Reduce-Merge.

Subject: Cloud Computing 131 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
E. Twister.

A. Hadoop.
Apache Hadoop is a collection of software projects for reliable and
scalable distributed computing. The initiative consists of mostly two projects:
Hadoop Distributed File System (HDFS) and Hadoop MapReduce.The former is
an implementation of the Google File System; the latter provides the same
features and abstractions as Google MapReduce.

B. Pig.
Pig is a platform that allows the analysis of large datasets. Developed as
an Apache project, Pig consists of a high-level language for expressing data
analysis programs, coupled with infrastructure for evaluating these
programs. The Pig infrastructure’s layer consists of a compiler for a high-level
language that produces a sequence of MapReduce jobs that can be run on top
of distributed infrastructures.

C. Hive.
Hive is another Apache initiative that provides a data warehouse
infrastructure on top of Hadoop MapReduce. It provides tools for easy data
summarization, ad hoc queries, and analysis of large datasets stored in Hadoop
MapReduce file. Hive’s major advantages reside in the ability to scale out, since
it is based on the Hadoop framework, and in the ability to provide a data
warehouse infrastructure in environments where there is already a Hadoop
system running.

D. Map-Reduce-Merge.
Map-Reduce-Merge is an extension of the MapReduce model, introducing a
third phase to the standard MapReduce pipeline—the Merge phase—that
allows efficiently merging data already partitioned and sorted (or hashed) by
map and reduce modules. The Map-Reduce-Merge framework simplifies the
management of heterogeneous related datasets and provides an abstraction
able to express the common relational algebra operators as well as several join
algorithms.

E. Twister.
Twister is an extension of the MapReduce model that allows the creation
of iterative executions of MapReduce jobs. With respect to the normal
MapReduce pipeline, the model proposed by Twister proposes the following
extensions:
Subject: Cloud Computing 132 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
1. Configure Map
2. Configure Reduce
3. While Condition Holds True Do
a. Run MapReduce
b. Apply Combine Operation to Result
c. Update Condition
4. Close
Twister provides additional features such as the ability for map and reduce tasks
to refer to static and in-memory data; the introduction of an additional phase
called combine, run at the end of the MapReduce job, that aggregates the
output together.

3. Alternatives to MapReduce.
a. Sphere.
b. All-Pairs.
c. DryadLINQ.

a. Sphere.
Sphere is the distributed processing engine that leverages the Sector
Distributed File System (SDFS). Sphere implements the stream processing
model (Single Program, Multiple Data) and allows developers to express the
computation in terms of user-defined functions (UDFs), which are run against
the distributed infrastructure. Sphere is built on top of Sector’s API for data
access. UDFs are expressed in terms of programs that read and write streams. A
stream is a data structure that provides access to a collection of data segments
mapping one or more files in the SDFS. The execution of UDFs is achieved
through Sphere Process Engines (SPEs), which are assigned with a given
stream segment. Sphere client sends a request for processing to the master
node, which returns the list of available slaves, and the client will choose the
slaves on which to execute Sphere processes.

b. All-Pairs.
It provides a simple abstraction—in terms of the All-pairs function—that
is common in many scientific computing domains:
All-pairs(A:set; B:set; F:function) -> M:matrix
Ex 1: field of biometrics, where similarity matrices are composed as a result of
the comparison of several images that contain subject pictures.
Ex 2: applications and algorithms in data mining.
The All-pairs function can be easily solved by the following algorithm:
1. For each $i in A
2. For each $j in B
3. Submit job F $i $j
Subject: Cloud Computing 133 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The execution of a distributed application is controlled by the engine and
develops in four stages:
(1) model the system;
(2) distribute the data;
(3) dispatch batch jobs; and
(4) clean up the system.

c. DryadLINQ.
Dryad is a Microsoft Research project that investigates programming
models for writing parallel and distributed programs to scale from a small cluster
to a large data centre. In Dryad, developers can express distributed applications
as a set of sequential programs that are connected by means of channels.
Dryad computation expressed in terms of a directed acyclic graph in which
nodes are the sequential programs and vertices represent the channels
connecting such programs. Dryad is considered a superset of the MapReduce
model, its application model allows expressing graphs representing MapReduce
computation.

Q4) Aneka MapReduce programming


Aneka provides an implementation of the MapReduce abstractions
introduced by Google and implemented by Hadoop.

Introducing the MapReduce programming model


1 Programming abstractions
2 Runtime support
3 Distributed file system support
Example application
1 Parsing Aneka logs
2 Mapper design and implementation
3 Reducer design and implementation
4 Driver program
5 Running the application

Introducing the MapReduce programming model


The MapReduce Programming Model defines the abstractions and
runtime support for developing MapReduce applications on top of Aneka.
Figure 8.7 provides an overview of the infrastructure supporting MapReduce in
Aneka. The application instance is specialized, with components that identify the
map and reduce functions to use. These functions are expressed in terms of

Subject: Cloud Computing 134 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Mapper and Reducer classes that are extended from the Aneka
MapReduce APIs.

The runtime support is composed of three main elements:


1. MapReduce Scheduling Service, which plays the role of the master process in
the Google and Hadoop implementation.
2. MapReduce Execution Service, which plays the role of the worker process in
the Google and Hadoop implementation.
3. A specialized distributed file system that is used to move data files.
Client components, namely the MapReduce Application, are used to submit the
execution of a MapReduce job, upload data files, and monitor it.
The management of data files is transparent: local data files are automatically
uploaded to Aneka, and output files are automatically downloaded to the client
machine if requested. In the following sections, we introduce these major
components and describe how they collaborate to execute MapReduce jobs.

1 Programming abstractions
Aneka executes any piece of user code within distributed application.
The task creation is responsibility of the infrastructure once the user has defined
the map and reduce functions. Therefore, the Aneka MapReduce APIs provide
developers with base classes for developing Mapper and Reducer types and use
a specialized type of application class—Map Reduce Application— that supports
needs of this programming model.

Subject: Cloud Computing 135 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Figure 8.8 provides an overview of the client components defining the
MapReduce programming model. Three classes are of interest for application
development: Mapper<K,V>, Reducer<K,V>, and
MapReduceApplication<M,R>. The other classes are internally used to
implement all the functionalities required by the model and expose simple
interfaces that require minimum amounts of coding for implementing
the map and reduce functions and controlling the job submission. Mapper<K,V>
and Reducer<K,V> constitute the starting point of the application design and
implementation. The submission and execution of MapReduce job is performed
through class MapReduceApplication<M,R>, which provides the interface to
Aneka Cloud to support MapReduce programming model. This class exposes
two generic types: M and R. These two placeholders identify the specific types
of Mapper<K,V> and Reducer<K,V> that will be used by the application.
Listing 8.1 shows in detail the definition of the Mapper<K,V> class and of the
related types that developers should be aware of for implementing the map
function.
Listing 8.2 shows the implementation of the Mapper<K,V> component for Word
Counter sample. This sample counts frequency of words in a set of large text
files. The text files are divided into lines, each of which will become the value
component of a key-value pair, whereas the key will be represented by the
offset in the file where the line begins.
Listing 8.3 shows the definition of Reducer<K,V> class. The implementation of a
specific reducer requires specializing the generic class and overriding the
abstract method: Reduce(IReduceInputEnumerator<V> input).
Listing 8.4 shows how to implement the reducer function for word-counter
example.
Listing 8.5 shows the interface of MapReduceApplication<M,R>.
Listing 8.6 displays collection of methods that are of interest in this class for
execution of MapReduce jobs.
Listing 8.7 shows how to create a MapReduce application for running the word-
counter example defined by the
previous WordCounterMapper and WordCounterReducer classes.

Subject: Cloud Computing 136 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

using Aneka.MapReduce.Internal;
namespace Aneka.MapReduce
{
/// Interface IMapInput<K,V>. Extends IMapInput and provides strongly-typed
version of extended
/// interface.
public interface IMapInput<K,V>: IMapInput
{
/// Property <i>Key</i> returns the key of key/value pair.
K Key { get; }
/// Property <i>Value</i> returns the value of key/value pair.
V Value { get; }
}
/// Delegate MapEmitDelegate. Defines signature of method that is used to
doEmit intermediate results/// generated by mapper.
public delegate void MapEmitDelegate(object key, object value);
/// Class Mapper. Extends MapperBase and provides a reference
implementation that can be further
/// extended in order to define the specific mapper for a given application.
public abstract class Mapper<K,V> : MapperBase
{
/// Emits the intermediate result source by using doEmit.
/// output stream the information about the output of the Map
operation.</para
public void Map(IMapInput input, MapEmitDelegate emit) { ... }
/// Gets the type of the <i>key</i> component of a <i>key-value</i> pair.
Subject: Cloud Computing 137 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
/// <returns>A Type instance containing the metadata
public override Type GetKeyType(){ return typeof(K); }
/// Gets the type of the <i>value</i> component of a <i>key-value</i> pair.
/// <returns>A Type instance containing the metadata
public overrideType GetValueType(){ return typeof(V); }
#region Template Methods
/// Function Map is overrided by users to define a map function.
protected abstract void Map(IMapInput<K, V> input);
#endregion
}
}
LISTING 8.1 Map Function APIs.
using Aneka.MapReduce;
namespace Aneka.MapReduce.Examples.WordCounter
{
/// Class WordCounterMapper. Extends Mapper<K,V> and provides an
/// implementation of the map function for the Word Counter sample.
public class WordCounterMapper: Mapper<long,string>
{
}
}
/// Reads the source and splits into words. For each of the words found
/// emits the word as a key with a vaue of 1.
protected override void Map(IMapInput<long,string> input)
{
// we don’t care about the key, because we are only interested on
// counting the word of each line.
string value = input.Value;
string[] words = value.Split(" \t\n\r\f\"\'|!-=()[]<>:{}.#".ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
// we emit each word without checking for repetitions. The word becomes
// the key and the value is set to 1, the reduce operation will take care
// of merging occurrences of the same word and summing them.
foreach(string word in words)
{
this.Emit(word, 1);
}
}
LISTING 8.2 Simple Mapper <K,V> Implementation.

using Aneka.MapReduce.Internal;
namespace Aneka.MapReduce

Subject: Cloud Computing 138 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
{
/// Delegate ReduceEmitDelegate. Defines the signature of a method
public delegate void ReduceEmitDelegate(object value);
/// Class <i>Reducer</i>. Extends the ReducerBase class
public abstract class Reducer<K,V> : ReducerBase
{
/// Performs the <i>reduce</i> phase of the <i>map-reduce</i> model.
public void Reduce(IReduceInputEnumerator input, ReduceEmitDelegate emit)
{ ... }
/// Gets the type of the <i>key</i> component of a <i>key-value</i> pair.
public override Type GetKeyType(){return typeof(K);}
/// Gets the type of the <i>value</i> component of a <i>key-value</i> pair.
public override Type GetValueType(){return typeof(V);}
#region Template Methods
/// Recuces the collection of values that are exposed by
/// <paramref name="source"/> into a single value.
protected abstract void Reduce(IReduceInputEnumerator<V> input);
#endregion
}
}
LISTING 8.3 Reduce Function APIs.

using Aneka.MapReduce;
namespace Aneka.MapReduce.Examples.WordCounter
{
/// Class <b><i>WordCounterReducer</i></b>. Reducer implementation for the
Word
/// Counter application.
public class WordCounterReducer: Reducer<string,int>
{
}
}
/// Iterates all over the values of the enumerator and sums up
/// all the values before emitting the sum to the output file.
protected override void Reduce(IReduceInputEnumerator<int>input)
{
int sum = 0;
while(input.MoveNext())
{
int value = input.Current;
sum += value;
}

Subject: Cloud Computing 139 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
this.Emit(sum);
}
LISTING 8.4 Simple Reducer <K,V> Implementation.

using Aneka.MapReduce.Internal;
namespace Aneka.MapReduce
{
/// Class <b><i>MapReduceApplication</i></b>. Defines a distributed
application
/// based on the MapReduce Model. It extends the ApplicationBase<M> and
specializes
/// it with the MapReduceManager<M,R> application manager.
public class MapReduceApplication<M, R> :
ApplicationBase<MapReduceManager<M, R>>
where M: MapReduce.Internal.MapperBase
where R: MapReduce.Internal.ReducerBase
{
/// Default value for the Attempts property.
public const intDefaultRetry = 3;
/// Default value for the Partitions property.
public const intDefaultPartitions = 10;
/// Default value for the LogFile property.
public const stringDefaultLogFile = "mapreduce.log";
/// List containing the result files identifiers.
private List<string>resultFiles = new List<string>();
/// Property group containing the settings for the MapReduce application.
private PropertyGroupmapReduceSetup;
/// Gets, sets an integer representing the number of partions for the key space.
public int Partitions { get { ... } set { ... } }
/// Gets, sets an boolean value indicating in whether to combine the result
/// after the map phase in order to decrease the number of reducers used in the
/// reduce phase.
public bool UseCombiner { get { ... } set { ... } }
/// Gets, sets an boolean indicating whether to synchronize the reduce phase.
public bool SynchReduce { get { ... } set { ... } }
/// Gets or sets a boolean indicating whether the source files required by the
/// required by the application is already uploaded in the storage or not.
public bool IsInputReady { get { ... } set { ... } }
/// Gets, sets the number of attempts that to run failed tasks.
public int Attempts { get { ... } set { ... } }
/// Gets or sets a string value containing the path for the log file.
public string LogFile { get { ... } set { ... } }

Subject: Cloud Computing 140 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
/// Gets or sets a boolean indicating whether application should download the
/// result files on the local client machine at the end of the execution or not.
public bool FetchResults { get { ... } set { ... } }
/// Creates a MapReduceApplication<M,R> instance and configures it with
/// the given configuration.
public MapReduceApplication(Configurationconfiguration) :
base("MapReduceApplication", configuration){ ... }
/// Creates MapReduceApplication<M,R> instance and configures it with
/// the given configuration.
public MapReduceApplication(string displayName, Configuration configuration)
: base(displayName,
configuration) { ... }
// here follows the private implementation…
}
}
LISTING 8.5 MapReduceApplication<M,R>.

namespace Aneka.Entity
{
/// Class <b><i>ApplicationBase<M></i></b>. Defines the base class for the
/// application instances for all the programming model supported by Aneka.
public class ApplicationBase<M> where M : IApplicationManager, new()
{
/// Gets the application unique identifier attached to this instance.
public string Id { get { ... } }
/// Gets the unique home directory for the AnekaApplication<W,M>.
public string Home { get { ... } }
/// Gets the current state of the application.
public ApplicationState State{get{ ... }}
/// Gets a boolean value indicating whether the application is terminated.
public bool Finished { get { ... } }
/// Gets the underlying IApplicationManager that ismanaging the execution of
the
/// application instanceon the client side.
public M ApplicationManager { get { ... } }
/// Gets, sets the application display name.
public string DisplayName { get { ... } set { ... } }
/// Occurs when the application instance terminates its execution.
public event EventHandler<ApplicationEventArgs> ApplicationFinished;
/// Creates an application instance with the given settings and sets the
/// application display name to null.

Subject: Cloud Computing 141 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
public ApplicationBase(Configuration configuration): this(null, configuration){ ...
}
/// Creates an application instance withthe given settings and display name.
public ApplicationBase(string displayName, Configuration configuration){ ... }
/// Starts the execution of the application instanceon Aneka.
public void SubmitExecution() { ... }
/// Stops the execution of the entire application instance.
public void StopExecution() { ... }
/// Invoke the application and wait until the application finishes.
public void InvokeAndWait() { this.InvokeAndWait(null); }
/// Invoke the application and wait until the application finishes, then invokes
/// the given callback.
public void InvokeAndWait(EventHandler<ApplicationEventArgs> handler) { ... }
/// Adds a shared file to the application.
public virtual void AddSharedFile(string file) { ... }
/// Adds a shared file to the application.
public virtual void AddSharedFile(FileData fileData) { ... }
/// Removes a file from the list of the shared files of the application.
public virtual void RemoveSharedFile(string filePath) { ... }
// here come the private implementation.
}
}
LISTING 8.6 ApplicationBase<M>

using System.IO;
using Aneka.Entity;
using Aneka.MapReduce;
namespace Aneka.MapReduce.Examples.WordCounter
{
/// Class <b><i>Program<M></i></b>. Application driver for the Word Counter
sample.
public class Program
{
/// Reference to the configuration object.
private static Configuration configuration = null;
/// Location of the configuration file.
private static string confPath = "conf.xml";
/// Processes arguments given to application & read runs application or shows
help.
private static void Main(string[] args)
{
try

Subject: Cloud Computing 142 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
{
Logger.Start();
// get the configuration
configuration = Configuration.GetConfiguration(confPath);
// configure MapReduceApplication
MapReduceApplication<WordCountMapper, WordCountReducer> application
=
new MapReduceApplication<WordCountMapper, WordCountReducer>
("WordCounter",
configuration);
// invoke and wait for result
application.InvokeAndWait(new
EventHandler<ApplicationEventArgs>(OnDone));
}
catch(Exception ex)
{
}
finally
{
Usage();
IOUtil.DumpErrorReport(ex, "Aneka WordCounter Demo - Error Log");
Logger.Stop();
}
}
/// Hooks the ApplicationFinished events and Process the results if the
application has been successful.
private static void OnDone(object sender, ApplicationEventArgs e) { ... }
/// Displays a simple informative message explaining the usage of the
application.
private static void Usage() { ... }
}
}
LISTING 8.7 WordCounter Job.

2 Runtime support
The runtime support for the execution of MapReduce jobs comprises the
collection of services that deal with scheduling and executing MapReduce tasks.
These are the MapReduce Scheduling Service and the MapReduce Execution
Service Job and Task Scheduling. The scheduling of jobs and tasks is the
responsibility of the MapReduce Scheduling Service,

Subject: Cloud Computing 143 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
which covers the same role as the master process in the Google MapReduce
implementation. The architecture of the Scheduling Service is organized into
two major components: the MapReduceScheduler Service and the
MapReduceScheduler Main role of the service wrapper is to translate messages
coming from the Aneka runtime or the client applications into calls or events
directed to the scheduler component, and vice versa. The relationship of the
two components is depicted in Figure 8.9 The core functionalities for job and
task scheduling are implemented in the MapReduceScheduler class. The
scheduler manages multiple queues for several operations, such as uploading
input files into the distributed file system; initializing jobs before scheduling;
scheduling map and reduce tasks; keeping track of unreachable nodes;
resubmitting failed tasks; and reporting execution statistics.

Task Execution. The execution of tasks is controlled by the MapReduce


Execution Service. This component plays the role of the worker process in the
Google MapReduce implementation. The service manages the execution of map
and reduce tasks and performs other operations, such as sorting and merging
intermediate files. The service is internally organized, as described in Figure
8.10. There are three major components that coordinate together for executing
tasks:
1. MapReduce- SchedulerService,
2. ExecutorManager, and
3. MapReduceExecutor.
The MapReduceSchedulerService interfaces the ExecutorManager with the
Aneka middleware; the ExecutorManager is in charge of keeping track of the
tasks being executed by demanding the specific execution of a task to the

Subject: Cloud Computing 144 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
MapReduceExecutor and of sending the statistics about the execution back to
the Scheduler Service.

3 Distributed file system support


Aneka supports, the MapReduce model that uses a distributed file system
implementation. Distributed file system implementations guarantee high
availability and better efficiency by means of replication and
distribution. the original MapReduce implementation assumes the existence of
a distributed and reliable storage; hence, the use of a distributed file system for
implementing the storage layer is natural. Aneka provides the capability of
interfacing with different storage implementations and it maintains the same
flexibility for the integration of a distributed file system.
The level of integration required by MapReduce requires the ability to perform
the following tasks:
● Retrieving the location of files and file chunks
● Accessing a file by means of a stream
The first operation is useful to the scheduler for optimizing the scheduling of
map and reduce tasks according to the location of data; the second operation is
required for the usual I/O operations to and from data files. On top of these low-
level interfaces, MapReduce programming model offers classes to read from
and write to files in a sequential manner. These are classes SeqReader and
SeqWriter. They provide sequential access for reading and writing key-value
pairs, and they expect specific file format, which is described in Figure 8.11.
An Aneka MapReduce file is composed of a header, used to identify the file, and
a sequence of record blocks, each storing a key-value pair. The header is
Subject: Cloud Computing 145 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
composed of 4 bytes: the first 3 bytes represent the character sequence SEQ
and the fourth byte identifies the version of the file. The record block is
composed as follows: the first 8 bytes are used to store two integers
representing the length of the rest of the block and the length of the key section,
which is immediately following. The remaining part of the block stores the data
of the value component of the pair. The SeqReader and SeqWriter classes are
designed to read and write files in this format by transparently handling the file
format information and translating key and value instances to and from their
binary representation.

Listing 8.8 shows the interface of the SeqReader and SeqWriter classes. The
SeqReader class provides an enumerator-based approach through which it is
possible to access the key and the value sequentially by calling the
NextKey() and the NextValue() methods, respectively. It is also possible to access
the raw byte data of keys and values by using the NextRawKey() and
NextRawValue(). HasNext() returns a Boolean, indicating whether there are
more pairs to read or not.

namespace Aneka.MapReduce.DiskIO
{
/// Class <b><i>SeqReader</i></b>. This class implements a file reader for the
sequence
/// file, which isa standard file split used by MapReduce.NET to store a partition
of a fixed size of a data file.
public class SeqReader
{
/// Creates a SeqReader instance and attaches it to the given file.

Subject: Cloud Computing 146 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
public SeqReader(string file) : this(file, null, null) { ... }
/// Creates a SeqReader instance, attaches it to the given file, and sets the
/// internal buffer size to bufferSize.
public SeqReader(string file, int bufferSize) : this(file,null,null,bufferSize) { ... }
/// Creates a SeqReader instance, attaches it to the given file, and provides
/// metadata information about the content of the file in the form of keyType
andcvalueType.
public SeqReader(string file, Type keyType, Type valueType) : this(file, keyType,
valueType,
SequenceFile.DefaultBufferSize) { ... }
/// Creates a SeqReader instance, attaches it to the given file, and provides
/// metadata information about the content of the file in the form of keyType
and valueType.
public SeqReader(string file, Type keyType, Type valueType, int bufferSize){ ... }
/// Sets the metadata information about the keys and the values contained in
the data file.
public void SetType(Type keyType, Type valueType) { ... }
/// Checks whether there is another record in data file and moves current file
pointer to its beginning.
public bool HaxNext() { ... }
/// Gets the object instance corresponding to the next key in the data file. in the
data file.
public object NextKey() { ... }
/// Gets the object instance corresponding to the next value in the data file. in
the data file.
public object NextValue() { ... }
/// Gets the raw bytes that contain the value of the serializedinstance of the
current key.
public BufferInMemory NextRawKey() { ... }
/// Gets the raw bytes that contain the value of the serialized instance of the
current value.
public BufferInMemory NextRawValue() { ... }
/// Gets the position of the file pointer as an offset from its beginning.
public long CurrentPosition() { ... }
/// Gets the size of the file attached to this instance of SeqReader.
public long StreamLength() { ... }
/// Moves file pointer to position. If value of position is 0 or -ve, returns current
position of file pointer.
public long Seek(long position) { ... }
/// Closes the SeqReader instanceand releases all resources that have been
allocated to read fromthe file.
public void Close() { ... }

Subject: Cloud Computing 147 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
// private implementation follows
}
/// Class SeqWriter. This class implements a file writer for the sequence file,
which is a standard file split used by
///MapReduce.NET to store a partition of a fixed size of a data file. This
classprovides an interface to add a
///sequence of key-value pair incrementally.
public class SeqWriter
{
/// Creates a SeqWriter instance for writing to file. This constructor initializes
/// the instance with the default value for the internal buffers.
public SeqWriter(string file) : this(file, SequenceFile.DefaultBufferSize){ ... }
/// Creates a SeqWriter instance, attachesit to the given file, and sets the
/// internal buffer size to bufferSize.
public SeqWriter(string file, int bufferSize) { ... }
/// Appends a key-value pair to the data file split.
public void Append(object key, object value) { ... }
/// Appends a key-value pair to the data file split.
public void AppendRaw(byte[] key, byte[] value) { ... }
/// Appends a key-value pair to the data file split.
public void AppendRaw(byte[] key, int keyPos, int keyLen,
byte[] value, int valuePos, int valueLen) { ... }
/// Gets the length of the internal buffer or 0 if no buffer has been allocated.
public longLength() { ... }
/// Gets the length of data file split on disk so far.
public long FileLength() { ... }
/// Closes SeqReader instance and releases all the resources that have been
allocated to write to the file.
public void Close() { ... }
// private implementation follows
}
}
LISTING 8.8 SeqReader and SeqWriter Classes.

Listing 8.9 shows a practical use of the SeqReader class by implementing the
callback used in the word-counter example. To visualize the results of the
application, we use the SeqReader class to read the content of the output files
and dump it into a proper textual form that can be visualized with any text
editor, such as the Notepad application.

using System.IO;
using Aneka.Entity;

Subject: Cloud Computing 148 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
using Aneka.MapReduce;
namespace Aneka.MapReduce.Examples.WordCounter
{
/// Class Program. Application driver for the Word Counter sample.
public class Program
{
/// Reference to the configuration object.
private static Configuration configuration = null;
/// Location of the configuration file.
private static string confPath = "conf.xml";
/// Processes the arguments given to the application and according
/// to the parameters read runs the application or shows the help.
private static void Main(string[] args)
{
try
{
Logger.Start();
// get the configuration
Program.configuration = Configuration.GetConfiguration(confPath);
// configure MapReduceApplication
MapReduceApplication<WordCountMapper, WordCountReducer> application
=
new MapReduceApplication<WordCountMapper,
WordCountReducer>("WordCounter",
configuration);
// invoke and wait for result
application.InvokeAndWait(newEventHandler<ApplicationEventArgs>(OnDone
));
// alternatively we can use the following call
}
catch(Exception ex)
{
}
finally
{
Program.Usage();
IOUtil.DumpErrorReport(ex, "Aneka WordCounter Demo - Error Log");
Logger.Stop();
}
}
/// Hooks the ApplicationFinished events and process the results
/// if the application has been successful.

Subject: Cloud Computing 149 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
private static void OnDone(object sender, ApplicationEventArgs e)
{
}
if (e.Exception != null)
{
IOUtil.DumpErrorReport(e.Exception, "Aneka WordCounter Demo - Error");
}
else
{
string outputDir = Path.Combine(configuration.Workspace, "output");
try
{
FileStream resultFile = new FileStream("WordResult.txt",FileMode.Create,
FileAccess.Write);
Stream WritertextWriter = new StreamWriter(resultFile);
DirectoryInfo sources = new DirectoryInfo(outputDir);
FileInfo[] results = sources.GetFiles();
foreach(FileInfo result in results)
{
SeqReader seqReader = newSeqReader(result.FullName);
seqReader.SetType(typeof(string), typeof(int));
while(seqReader.HaxNext() == true)
{
object key = seqReader.NextKey();
object value = seqReader.NextValue();
textWriter.WriteLine("{0}\t{1}", key, value);
}
seqReader.Close();
}
textWriter.Close();
resultFile.Close();
// clear the output directory
sources.Delete(true);
Program.StartNotePad("WordResult.txt");
}
catch(Exception ex)
{
IOUtil.DumpErrorReport(e.Exception, "Aneka WordCounter Demo - Error");
}
}
}
/// Starts the notepad process and displays the given file.

Subject: Cloud Computing 150 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
private static voidStartNotepad(string file) { ... }
/// Displays a simple informative message explaining the usage of the
application.
private static void Usage() { ... }
}
LISTING 8.9 WordCounter Job.

Subject: Cloud Computing 151 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

UNIT-IV
Cloud Platforms in Industry
Q1) Amazon Web Services
o AWS stands for Amazon Web Services.
o The AWS service is provided by the Amazon that uses distributed IT
infrastructure to provide different IT resources available on demand. It
provides different services such as infrastructure as a service (IaaS),
platform as a service (PaaS) and packaged software as a service (SaaS).
o Amazon launched AWS, a cloud computing platform to allow the different
organizations to take advantage of reliable IT infrastructure.
Uses of AWS
o A small manufacturing organization uses their expertise to expand their
business by leaving their IT management to the AWS.
o A large enterprise spread across the globe can utilize the AWS to deliver
the training to the distributed workforce.
o An architecture consulting company can use AWS to get the high-compute
rendering of construction prototype.
o A media company can use the AWS to provide different types of content
such as ebox or audio files to the worldwide files.
Pay-As-You-Go
Based on the concept of Pay-As-You-Go, AWS provides the services to the
customers.
AWS provides services to customers when required without any prior
commitment or upfront investment. Pay-As-You-Go enables the customers to
procure services from AWS.
o Computing
o Programming models
o Database storage
o Networking

Subject: Cloud Computing 152 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Advantages of AWS
1) Flexibility
o We can get more time for core business tasks due to the instant
availability of new features and services in AWS.
o It provides effortless hosting of legacy applications. AWS does not require
learning new technologies and migration of applications to the AWS
provides the advanced computing and efficient storage.
o AWS also offers a choice that whether we want to run the applications
and services together or not. We can also choose to run a part of the IT
infrastructure in AWS and the remaining part in data centres.
2) Cost-effectiveness
AWS requires no upfront investment, long-term commitment, and minimum
expense when compared to traditional IT infrastructure that requires a huge
investment.
3) Scalability/Elasticity
Through AWS, autoscaling and elastic load balancing techniques are
automatically scaled up or down, when demand increases or decreases
respectively. AWS techniques are ideal for handling unpredictable or very high
loads. Due to this reason, organizations enjoy the benefits of reduced cost and
increased user satisfaction.
4) Security
o AWS provides end-to-end security and privacy to customers.

Subject: Cloud Computing 153 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o AWS has a virtual infrastructure that offers optimum availability while
managing full privacy and isolation of their operations.
o Customers can expect high-level of physical security because of Amazon's
several years of experience in designing, developing and maintaining
large-scale IT operation centers.
o AWS ensures the three aspects of security, i.e., Confidentiality, integrity,
and availability of user's data.
History of AWS

o 2003: In 2003, Chris Pinkham and Benjamin Black presented a paper on


how Amazon's own internal infrastructure should look like. They
suggested to sell it as a service and prepared a business case on it. They
prepared a six-page document and had a look over it to proceed with it or
not. They decided to proceed with the documentation.
o 2004: SQS stands for "Simple Queue Service" was officially launched in
2004. A team launched this service in Cape Town, South Africa.
o 2006: AWS (Amazon Web Services) was officially launched.
o 2007: In 2007, over 180,000 developers had signed up for the AWS.
o 2010: In 2010, amazon.com retail web services were moved to the AWS,
i.e., amazon.com is now running on AWS.
o 2011: AWS suffered from some major problems. Some parts of volume of
EBS (Elastic Block Store) was stuck and were unable to read and write
requests. It took two days for the problem to get resolved.
o 2012: AWS hosted a first customer event known as re:Invent conference.
First re:invent conference occurred in which new products were
Subject: Cloud Computing 154 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
launched. In AWS, another major problem occurred that affects many
popular sites such as Pinterest, Reddit, and Foursquare.
o 2013: In 2013, certifications were launched. AWS started a certifications
program for software engineers who had expertise in cloud computing.
o 2014: AWS committed to achieve 100% renewable energy usage for its
global footprint.
o 2015: AWS breaks its revenue and reaches to $6 Billion USD per annum.
The revenue was growing 90% every year.
o 2016: By 2016, revenue doubled and reached $13Billion USD per annum.
o 2017: In 2017, AWS re: invent releases a host of Artificial Intelligence
Services due to which revenue of AWS doubled and reached $27 Billion
USD per annum.
o 2018: In 2018, AWS launched a Machine Learning Speciality Certs. It
heavily focussed on automating Artificial Intelligence and Machine
learning.
Features of AWS

The following are the features of AWS:


o Flexibility
o Cost-effective
o Scalable and elastic
o Secure
o Experienced
1) Flexibility
o The difference between AWS and traditional IT models is flexibility.

Subject: Cloud Computing 155 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o The traditional models used to deliver IT solutions that require large
investments in a new architecture, programming languages, and
operating system. Although these investments are valuable, it takes time
to adopt new technologies and can also slow down your business.
o The flexibility of AWS allows us to choose which programming models,
languages, and operating systems are better suited for their project, so
we do not have to learn new skills to adopt new technologies.
o Flexibility means that migrating legacy applications to the cloud is easy,
and cost-effective. Instead of re-writing the applications to adopt new
technologies, you just need to move the applications to the cloud and tap
into advanced computing capabilities.
o Building applications in aws are like building applications using existing
hardware resources.
o The larger organizations run in a hybrid mode, i.e., some pieces of the
application run in their data center, and other portions of the application
run in the cloud.
o The flexibility of aws is a great asset for organizations to deliver the
product with updated technology in time, and overall enhancing the
productivity.
2) Cost-effective
o Cost is one of the most important factors that need to be considered in
delivering IT solutions.
o For example, developing and deploying an application can incur a low
cost, but after successful deployment, there is a need for hardware and
bandwidth. Owing our own infrastructure can incur considerable costs,
such as power, cooling, real estate, and staff.
o The cloud provides on-demand IT infrastructure that lets you consume
the resources what you actually need. In aws, you are not limited to a set
amount of resources such as storage, bandwidth or computing resources
as it is very difficult to predict the requirements of every resource.
Therefore, we can say that the cloud provides flexibility by maintaining
the right balance of resources.
o AWS provides no upfront investment, long-term commitment, or
minimum spend.
o You can scale up or scale down as the demand for resources increases or
decreases respectively.
o An aws allows you to access the resources more instantly. It has the ability
to respond the changes more quickly, and no matter whether the changes

Subject: Cloud Computing 156 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
are large or small, means that we can take new opportunities to meet the
business challenges that could increase the revenue, and reduce the cost.
3) Scalable and elastic
o In a traditional IT organization, scalability and elasticity were calculated
with investment and infrastructure while in a cloud, scalability and
elasticity provide savings and improved ROI (Return On Investment).
o Scalability in aws has the ability to scale the computing resources up or
down when demand increases or decreases respectively.
o Elasticity in aws is defined as the distribution of incoming application
traffic across multiple targets such as Amazon EC2 instances, containers,
IP addresses, and Lambda functions.
o Elasticity load balancing and scalability automatically scale your AWS
computing resources to meet unexpected demand and scale down
automatically when demand decreases.
o The aws cloud is also useful for implementing short-term jobs, mission-
critical jobs, and the jobs repeated at the regular intervals.
4) Secure
o AWS provides a scalable cloud-computing platform that provides
customers with end-to-end security and end-to-end privacy.
o AWS incorporates the security into its services, and documents to
describe how to use the security features.
o AWS maintains confidentiality, integrity, and availability of your data
which is the utmost importance of the aws.
Physical security: Amazon has many years of experience in designing,
constructing, and operating large-scale data centers. An aws infrastructure is
incorporated in AWS controlled data centers throughout the world. The data
centers are physically secured to prevent unauthorized access.
Secure services: Each service provided by the AWS cloud is secure.
Data privacy: A personal and business data can be encrypted to maintain data
privacy.

AWS Global Infrastructure


o AWS is a cloud computing platform which is globally available.
o Global infrastructure is a region around the world in which AWS is based.
Global infrastructure is a bunch of high-level IT services which is shown
below:
o AWS is available in 19 regions, and 57 availability zones in December 2018
and 5 more regions 15 more availability zones for 2019.

Subject: Cloud Computing 157 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

The following are the components that make up the AWS infrastructure:
o Availability Zones
o Region
o Edge locations
o Regional Edge Caches

Availability zone as a Data Center


o An availability zone is a facility that can be somewhere in a country or in
a city. Inside this facility, i.e., Data Centre, we can have multiple servers,
switches, load balancing, firewalls. The things which interact with the
cloud sits inside the data centers.
o An availability zone can be a several data centers, but if they are close
together, they are counted as 1 availability zone.
Region
o A region is a geographical area. Each region consists of 2 more availability
zones.
o A region is a collection of data centers which are completely isolated from
other regions.
o A region consists of more than two availability zones connected to each
other through links.

o Availability zones are connected through redundant and isolated metro


fibers.

Subject: Cloud Computing 158 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Edge Locations
o Edge locations are the endpoints for AWS used for caching content.
o Edge locations consist of CloudFront, Amazon's Content Delivery Network
(CDN).
o Edge locations are more than regions. Currently, there are over 150 edge
locations.
o Edge location is not a region but a small location that AWS have. It is used
for caching the content.
o Edge locations are mainly located in most of the major cities to distribute
the content to end users with reduced latency.
o For example, some user accesses your website from Singapore; then this
request would be redirected to the edge location closest to Singapore
where cached data can be read.
Regional Edge Cache
o AWS announced a new type of edge location in November 2016, known
as a Regional Edge Cache.
o Regional Edge cache lies between CloudFront Origin servers and the edge
locations.
o A regional edge cache has a large cache than an individual edge location.
o Data is removed from the cache at the edge location while the data is
retained at the Regional Edge Caches.
o When the user requests the data, then data is no longer available at the
edge location. Therefore, the edge location retrieves the cached data
from the Regional edge cache instead of the Origin servers that have high
latency.
S3-101
o S3 is one of the first services that has been produced by aws.
o S3 stands for Simple Storage Service.
o S3 provides developers and IT teams with secure, durable, highly scalable
object storage.
o It is easy to use with a simple web services interface to store and retrieve
any amount of data from anywhere on the web.
What is S3?
o S3 is a safe place to store the files.
o It is Object-based storage, i.e., you can store the images, word files, pdf
files, etc.
o The files which are stored in S3 can be from 0 Bytes to 5 TB.
o It has unlimited storage means that you can store the data as much you
want.
Subject: Cloud Computing 159 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o Files are stored in Bucket. A bucket is like a folder available in S3 that
stores the files.
o S3 is a universal namespace, i.e., the names must be unique globally.
Bucket contains a DNS address. Therefore, the bucket must contain a
unique name to generate a unique DNS address.
If you create a bucket, URL look like:

o If you upload a file to S3 bucket, then you will receive an HTTP 200 code
means that the uploading of a file is successful.
Advantages of Amazon S3

o Create Buckets: Firstly, we create a bucket and provide a name to the


bucket. Buckets are the containers in S3 that stores the data. Buckets
must have a unique name to generate a unique DNS address.
o Storing data in buckets: Bucket can be used to store an infinite amount
of data. You can upload the files as much you want into an Amazon S3
bucket, i.e., there is no maximum limit to store the files. Each object can
contain upto 5 TB of data. Each object can be stored and retrieved by
using a unique developer assigned-key.

Subject: Cloud Computing 160 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o Download data: You can also download your data from a bucket and can
also give permission to others to download the same data. You can
download the data at any time whenever you want.
o Permissions: You can also grant or deny access to others who want to
download or upload the data from your Amazon S3 bucket.
Authentication mechanism keeps the data secure from unauthorized
access.
o Standard interfaces: S3 is used with the standard interfaces REST and
SOAP interfaces which are designed in such a way that they can work with
any development toolkit.
o Security: Amazon S3 offers security features by protecting unauthorized
users from accessing your data.
S3 is a simple key-value store
S3 is object-based. Objects consist of the following:
o Key: It is simply the name of the object. For example, hello.txt,
spreadsheet.xlsx, etc. You can use the key to retrieve the object.
o Value: It is simply the data which is made up of a sequence of bytes. It is
actually a data inside the file.
o Version ID: Version ID uniquely identifies the object. It is a string
generated by S3 when you add an object to the S3 bucket.
o Metadata: It is the data about data that you are storing. A set of a name-
value pair with which you can store the information regarding an object.
Metadata can be assigned to the objects in Amazon S3 bucket.
o Subresources: Subresource mechanism is used to store object-specific
information.
o Access control information: You can put the permissions individually on
your files.
EC2:-
o EC2 stands for Amazon Elastic Compute Cloud.
o Amazon EC2 is a web service that provides resizable compute capacity in
the cloud.
o Amazon EC2 reduces the time required to obtain and boot new user
instances to minutes rather than in older days, if you need a server then
you had to put a purchase order, and cabling is done to get a new server
which is a very time-consuming process. Now, Amazon has provided an
EC2 which is a virtual machine in the cloud that completely changes the
industry.
o You can scale the compute capacity up and down as per the computing
requirement changes.
Subject: Cloud Computing 161 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
o Amazon EC2 changes the economics of computing by allowing you to pay
only for the resources that you actually use. Rather than you previously
buy physical servers, you would look for a server that has more CPU
capacity, RAM capacity and you buy a server over 5 year term, so you have
to plan for 5 years in advance. People spend a lot of capital in such
investments. EC2 allows you to pay for the capacity that you actually use.
o Amazon EC2 provides the developers with the tools to build resilient
applications that isolate themselves from some common scenarios.
EC2 Pricing Options

On Demand
o It allows you to pay a fixed rate by the hour or even by the second with
no commitment.
o Linux instance is by the second and windows instance is by the hour.
o On Demand is perfect for the users who want low cost and flexibility of
Amazon EC2 without any up-front investment or long-term commitment.
o It is suitable for the applications with short term, spiky or unpredictable
workloads that cannot be interrupted.
o It is useful for the applications that have been developed or tested on
Amazon EC2 for the first time.
o On Demand instance is recommended when you are not sure which
instance type is required for your performance needs.

Subject: Cloud Computing 162 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Reserved
o It is a way of making a reservation with Amazon or we can say that we
make a contract with Amazon. The contract can be for 1 or 3 years in
length.
o In a Reserved instance, you are making a contract means you are paying
some upfront, so it gives you a significant discount on the hourly charge
for an instance.
o It is useful for applications with steady state or predictable usage.
o It is used for those applications that require reserved capacity.
o Users can make up-front payments to reduce their total computing costs.
For example, if you pay all your upfronts and you do 3 years contract, then
only you can get a maximum discount, and if you do not pay all upfronts
and do one year contract then you will not be able to get as much discount
as you can get If you do 3 year contract and pay all the upfronts.
Types of Reserved Instances:
o Standard Reserved Instances
o Convertible Reserved Instances
o Scheduled Reserved Instances

Standard Reserved Instances


o It provides a discount of up to 75% off on demand. For example, you are
paying all up-fronts for 3 year contract.
o It is useful when your Application is at the steady-state.
Convertible Reserved Instances
o It provides a discount of up to 54% off on demand.
o It provides the feature that has the capability to change the attributes of
RI as long as the exchange results in the creation of Reserved Instances of
equal or greater value.
o Like Standard Reserved Instances, it is also useful for the steady state
applications.

Subject: Cloud Computing 163 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Scheduled Reserved Instances
o Scheduled Reserved Instances are available to launch within the specified
time window you reserve.
o It allows you to match your capacity reservation to a predictable recurring
schedule that only requires a fraction of a day, a week, or a month.

Spot Instances
o It allows you to bid for a price whatever price that you want for instance
capacity, and providing better savings if your applications have flexible
start and end times.
o Spot Instances are useful for those applications that have flexible start
and end times.
o It is useful for those applications that are feasible at very low compute
prices.
o It is useful for those users who have an urgent need for large amounts of
additional computing capacity.
o EC2 Spot Instances provide less discounts as compared to On Demand
prices.
o Spot Instances are used to optimize your costs on the AWS cloud and scale
your application's throughput up to 10X.
o EC2 Spot Instances will continue to exist until you terminate these
instances.

Dedicated Hosts
o A dedicated host is a physical server with EC2 instance capacity which is
fully dedicated to your use.
o The physical EC2 server is the dedicated host that can help you to reduce
costs by allowing you to use your existing server-bound software licenses.
For example, Vmware, Oracle, SQL Server depending on the licenses that
you can bring over to AWS and then they can use the Dedicated host.
o Dedicated hosts are used to address compliance requirements and
reduces host by allowing to use your existing server-bound server
licenses.
o It can be purchased as a Reservation for up to 70% off On-Demand price.

Subject: Cloud Computing 164 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q2) What is Google App Engine?
Google App Engine (GAE) is a platform-as-a-service product that
provides web app developers and enterprises with access to
Google's scalable hosting and tier 1 internet service.
GAE requires that applications be written in Java or Python, store data in
Google Bigtable and use the Google query language. Noncompliant applications
require modification to use GAE.
GAE provides more infrastructure than other scalable hosting services,
such as Amazon Elastic Compute Cloud (EC2). GAE also eliminates some system
administration and development tasks to make writing scalable applications
easier.
Google provides GAE free up to a certain amount of use for the following
resources:
• processor (CPU)
• storage
• application programming interface (API) calls
• concurrent requests
Users exceeding the per-day or per-minute rates can pay for more of these
resources.
What is Google App Engine?
Google App Engine (GAE) is a platform-as-a-service product that provides web
app developers and enterprises with access to Google's scalable hosting and tier
1 internet service.
GAE requires that applications be written in Java or Python, store data in
Google Bigtable and use the Google query language. Noncompliant applications
require modification to use GAE.
GAE provides more infrastructure than other scalable hosting services, such as
Amazon Elastic Compute Cloud (EC2). GAE also eliminates some system
administration and development tasks to make writing scalable applications
easier.
Google provides GAE free up to a certain amount of use for the following
resources:
• processor (CPU)
• storage
• application programming interface (API) calls
• concurrent requests

Subject: Cloud Computing 165 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Users exceeding the per-day or per-minute rates can pay for more of these
resources.

What are GAE's key features?


Key features of GAE include the following:
API selection. GAE has several built-in APIs, including the following five:
• Blobstore for serving large data objects;
• GAE Cloud Storage for storing data objects;
• Page Speed Service for automatically speeding up webpage load
times;
• URL Fetch Service to issue HTTP requests and receive responses for
efficiency and scaling; and
• Memcache for a fully managed in-memory data store.
Managed infrastructure. Google manages the back-end infrastructure for users.
This approach makes GAE a serverless platform and simplifies API management.
Several programming languages. GAE supports a number of languages,
including GO, PHP, Java, Python, NodeJS, .NET and Ruby. It also supports
custom runtimes.
Support for legacy runtimes. GAE supports legacy runtimes, which are versions
of programming languages no longer maintained. Examples include Python 2.7,
Java 8 and Go 1.11.
Application diagnostics. GAE lets users record data and run diagnostics on
applications to gauge performance.
Subject: Cloud Computing 166 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Security features. GAE enables users to define access policies with the GAE
firewall and managed Secure Sockets Layer/Transport Layer Security certificates
for free.
Traffic splitting. GAE lets users route requests to different application versions.
Versioning. Applications in Google App Engine function as a set
of microservices that refer back to the main source code. Every time code is
deployed to a service with the corresponding GAE configuration files, a version
of that service is created.
Google App Engine benefits and challenges
GAE extends the benefits of cloud computing to application development, but it
also has drawbacks.
Benefits of GAE
• Ease of setup and use. GAE is fully managed, so users can write code
without considering IT operations and back-end infrastructure. The
built-in APIs enable users to build different types of applications.
Access to application logs also facilitates debugging and monitoring in
production.
• Pay-per-use pricing. GAE's billing scheme only charges users daily for
the resources they use. Users can monitor their resource usage and
bills on a dashboard.
• Scalability. Google App Engine automatically scales as workloads
fluctuate, adding and removing application instances or application
resources as needed.
• Security. GAE supports the ability to specify a range of
acceptable Internet Protocol (IP) addresses. Users
can allowlist specific networks and services and blocklist specific IP
addresses.
GAE challenges
• Lack of control. Although a managed infrastructure has advantages, if
a problem occurs in the back-end infrastructure, the user is dependent
on Google to fix it.
• Performance limits. CPU-intensive operations are slow and expensive
to perform using GAE. This is because one physical server may be
serving several separate, unrelated app engine users at once who need
to share the CPU.
• Limited access. Developers have limited, read-only access to the GAE
filesystem.
Subject: Cloud Computing 167 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Java limits. Java apps cannot create new threads and can only use a
subset of the Java runtime environment standard edition classes.
Examples of Google App Engine
One example of an application created in GAE is an Android messaging app that
stores user log data. The app can store user messages and write event logs to
the Firebase Realtime Database and use it to automatically synchronize data
across devices.
Java servers in the GAE flexible environment connect to Firebase and receive
notifications from it. Together, these components create a back-end streaming
service to collect messaging log data.
GAE can be used in many different application contexts. Additional sample
application code in GitHub includes the following:
• a Python application that uses Blobstore;
• a program that uses MySQL connections from GAE to Google Cloud
Platform SQL; and
• code that shows how to set up unit tests in GAE.

Q3) What is Microsoft Azure?


Azure is a cloud computing platform which was launched by Microsoft in
February 2010. It is an open and flexible cloud platform which helps in
development, data storage, service hosting, and service management. The
Azure tool hosts web applications over the internet with the help of Microsoft
data centers.
Types of Azure Clouds
There are mainly three types of clouds in Microsoft Azure are:

1. PAAS
2. SAAS
3. IASS

Subject: Cloud Computing 168 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Azure as IaaS
IaaS(Infrastructure as a Service) is the foundational cloud platform layer. This
Azure service is used by IT administrators for processing, storage, networks or
any other fundamental computer operations. It is one of the Azure topics to
learn that allows users to run arbitrary software.
Advantages:

• It offers efficient design time portability


• It is advisable for the application which needs complete control
• IaaS offers quick transition of services to clouds
• The apparent benefit of laaS is that it frees you from the concerns of
setting up many physical or virtual machines.
• Helps you to access, monitor and manage datacenters
Disadvantages of Iaas:

• Plenty of security risks from unpatched servers


• Some companies have defined processes for testing and updating on-
premise servers vulnerabilities. This cannot be done with Azure.

Subject: Cloud Computing 169 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Azure as PaaS
PaaS is a computing platform which includes an operating system,
programming language execution environment, database or web services. This
Azure service is used by developers and application providers.
As its name suggests, this platform is provided to the client to develop and
deploy software. It is one of the Azure basic concepts which allows the client to
focus on application development instead of worrying about hardware and
infrastructure. It also takes care of operating systems, networking and servers
issues.

Advantages:

• The total cost is low as the resources are allocated on demand and servers
are automatically added or subtracted.
• Azure is less vulnerable because servers are automatically checked
for all known security issues
• The entire process is not visible to the developer, so it does not have a
risk of a data breach
Disadvantages:

• Portability issues can occur when you use PaaS services


• There may be different environment at Azure, so the application needs to
adapt accordingly.
Azure As SaaS
SaaS (Software as a Service) is software which is centrally hosted and
managed. It is a single version of the application is used for all customers. You
can scale out to multiple instances. This helps you to ensure the best
performance in all locations. The software is licensed through a monthly or
annual subscription. MS Exchange, Office, Dynamics are offered as a SaaS

Subject: Cloud Computing 170 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Azure key Concepts
Concept Name Description
Azure is a global cloud platform which is available across various regions
around the world. When you request a service, application, or VM in
Regions
Azure, you are first asked to specify a region. The selected region
represents datacenter where your application runs.
In Azure, you can deploy your applications into a variety of data centers
around the globe. So, it is advisable to select a region which is closer to
Datacenter
most of your customers. It helps you to reduce latency in network
requests.
The Azure portal is a web-based application which can be used to create,
Azure portal manage and remove Azure resource and services. It is located
at https://portal.azure.com.
Azure resource is an individual computer, networking data or app
Resources hosting services which charged individually. Some common resources
are virtual machines( VM), storage account, or SQL databases.
An Azure resource group is a container which holds related resource for
Resource
an Azure solution. It may include every resource or just resource which
groups
you wants to manage.
Resource
It is a JSON which defines one or more resource to deploy to a resource
Manager
group. It also establishes dependencies between deployed resources.
templates
Azure allows you to automate the process of creating, managing and
Automation: deleting resource by using PowerShell or the Azure command-line
Interface(CLI).
PowerShell is a set of modules that offer cmdlets to manage Azure. In
Azure
most cases, you are allowed to use, the cmdlets command for the same
PowerShell
tasks which you are performing in the Azure portal.
Azure
The Azure CLI is a tool that you can use to create, manage, and remove
command-line
Azure resources from the command line.
interface(CLI)
Azure is built on a set of REST APIs help you perform the same operation
REST APIs that you do in Azure portal Ul. It allows your Azure resources and apps
to be manipulated via any third party software application.

Subject: Cloud Computing 171 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Azure Domains (Components)

Compute
It offers computing operations like app hosting, development, and
deployment in Azure Platform. It has the following components:

• Virtual Machine: Allows you to deploy any language, workload in any


operating system
• Virtual Machine Scale Sets: Allows you to create thousands of similar
virtual machines in minutes
• Azure Container Service: Create a container hosting solution which is
optimized for Azure. You scale and arrange applications using Kube,
DC/OS, Swarm or Docker
• Azure Container Registry: This service store and manage container images
across all types of Azure deployments
• Functions: Let’s you write code regardless of infrastructure and
provisioning of servers. In the situation when your functions call rate
scales up.
• Batch: Batch processing helps you scale to tens, hundreds or thousands
of virtual machines and execute computer pipelines.
• Service Fabric: Simplify microservice-based application development and
lifecycle management. It supports Java, PHP, Node.js, Python, and Ruby.
Storage
Azure store is a cloud storage solution for modern applications. It is designed
to meet the needs of their customer’s demand for scalability. It allows you to
Subject: Cloud Computing 172 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
store and process hundreds of terabytes of data. It has the following
components:

• Blob Storage: Azure Blob storage is a service which stores unstructured


data in the cloud as objects/blobs. You can store any type of text or binary
data, such as a document, media file, or application installer.
• Queue Storage: It provides cloud messaging between application
components. It delivers asynchronous messaging to establish
communication between application components.
• File Storage: Using Azure File storage, you can migrate legacy applications.
It relies on file shares to Azure quickly and without costly rewrites.
• Table Storage: Azure Table storage stores semi-structured NoSQL data in
the cloud. It provides a key/attribute store with a schema-less design
Database
This category includes Database as a Service (DBaaS) which offers SQL and
NoSQL tools. It also includes databases like Azure Cosmos DB and Azure
Database for PostgreSQL. It has the following components:

• SQL Database: It is a relational database service in the Microsoft cloud


based on the market-leading Microsoft SQL Server engine.
• DocumentDB: It is a fully managed NoSQL database service which is It built
for fast and predictable performance and ease of development.
• Redis Cache: It is a secure and highly advanced key-value store. It stores
data structures like strings, hashes, lists, etc.
Content Delivery Network
Content Delivery Network (CDN) caches static web content at strategically
placed locations. This helps you to offer speed for delivering content to users. It
has the following components:

• VPN Gateway: VPN Gateway sends encrypted traffic across a public


connection.
• Traffic Manager: It helps you to control and allows you to do the
distribution of user traffic for services like WebApps, VM, Azure, and cloud
services in different Datacenters
• Express Route: Helps you to extend your on-premises networks into the
Microsoft cloud over a dedicated private connection to Microsoft Azure,
Office 365, and CRM Online.

Subject: Cloud Computing 173 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Security + Identify sevices
It provides capabilities to identify and respond to cloud security threats.
It also helps you to manage encryption keys and other sensitive assets. It has the
following components:

• Key Vault: Azure Key Vault allows you to safeguard cryptographic keys and
helps you to create secrets used by cloud applications and services.
• Azure Active Directory: Azure Active Directory and identity management
service. This includes multi-factor authentication, device registration, etc.
• Azure AD B2C: Azure AD B2C is a cloud identity management solution for
your consumer-facing web and mobile applications. It allows you to scales
hundreds of millions of consumer identities.

Enterprise Integration Services:

• Service Bus: Service Bus is an information delivery service which works on


the third-party communication system.
• SQL Server Stretch Database: This service helps you migrates any cold
data securely and transparently to the Microsoft Azure cloud
• Azure AD Domain Services: It offers managed domain services like domain
join, group policy, LDAP, etc. This authentication which is compatible with
Windows Server Active Directory.
• Multi-Factor Authentication: Azure Multi-Factor Authentication (MFA) is
two-step verification. It helps you to access data and applications to offers
a simple sign-in process.

Monitoring + Management Services


These services allow easy management of Azure deployment.

• Azure Resource Manager: It makes it easy for you to manage and visualize
resource in your app. You can even control who is your organization can
act on the resources.
• Automation: Microsoft Azure Automation is a way to automate the
manual, long-running, error-free, and constantly repeated tasks. These
tasks are commonly performed in a cloud and enterprise environment.

Subject: Cloud Computing 174 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Azure Networking

• Virtual Network: Perform Network isolation and segmentation. It offers


filter and Route network traffic.
• Load Balancer: Offers high availability and network performance of any
application. Load balance information Internet traffic to Virtual machines.
• Application Gateway: It is a dedicated virtual appliance that offers an
Application Delivery Controller (ADC) as a service.
• Azure DNS: Azure DNS hosting service offers name resolution using
Microsoft Azure infrastructure.

Web and Mobile Services:

• Web Apps: Web Apps allows you to build and host websites in the
programming language of your choice without the need to manage its
infrastructure.
• Mobile Apps: Mobile Apps Service offers a highly scalable, globally
available mobile app development platform for users.
• API Apps: API apps make it easier to develop, host and consume APIs in
the cloud and on-premises.
• Logic Apps: Logic Apps helps you to simplify and implement scalable
integrations

Workflows in the cloud


It provides a visual designer to create and automate your process as a series of
steps known as a workflow

• Notification Hubs: Azure Notification Hubs offers an easy-to-use, multi-


platform, scaled-out push engine
• Event Hubs: Azure Event Hubs is data streaming platform which can
manage millions of events per second. Data sent to an event hub can be
transformed and stored using any real-time analytics offers
batching/storage adapters.
• Azure Search: It is a cloud search-as-a-service solution which offers server
and infrastructure management. It offers ready-to-use service that you
can populate with your data. This can be used to add search to your web
or mobile application.

Subject: Cloud Computing 175 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Migration
Migration tools help an organization estimate workload migration costs.
It also helps to perform the migration of workloads from your local data centers
to the Azure cloud.

Applications of Azure
Now in this Azure for beginners tutorial, we will learn the applications of
Azure.
Microsoft Azure is used in a broad spectrum of applications like:

• Infrastructure Services
• Mobile Apps
• Web Applications
• Cloud Services
• Storage, Backup, and Recovery
• Data Management
• Media Services

Advantages of Azure
Now in this MS Azure tutorial, we will cover the advantages of Azure.
Here, are the advantages of using Azure:

• Azure infrastructure will cost-effectively enhance your business


continuity strategy
• It allows you to access the application without buying a license for the
individual machine
• Windows Azure offers the best solution for your data needs, from SQL
database to blobs to tables
• Offers scalability, flexibility, and cost-effectiveness
• Helps you to maintain consistency across clouds with familiar tools and
resources
• Allows you to extend data center with a consistent management toolset
and familiar development and identity solutions.
• You can deploy premium virtual machines in minutes which also include
Linux and Windows servers
• Helps you to scale your IT resources up and down based on your needs

Subject: Cloud Computing 176 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• You are not required to run the high-powered and high-priced computer
to run cloud computing’s web-based applications.
• You will not require processing power or hard disk space if you are using
Azure
• Cloud computing offers virtually limitless storage
• If your personal computer or laptop crashes, all your data is still out there
in the cloud, and it is still accessible
• Sharing documents leads directly to better collaboration
• If you change your device your computers, applications and documents
follow you through the cloud

Disadvantages of Azure

• Cloud computing is not possible if you can’t connect to the Internet


• Azure is a web-based application which requires a lot of bandwidth to
download, as do large documents
• Web-based applications can sometimes be slower compared to accessing
a similar software program on your desktop PC

Q5) Business Applications of Cloud.


Business Applications
Business applications are based on cloud service providers. Today, every
organization requires the cloud business application to grow their business. It
also ensures that business applications are 24X7 available to users.
There are the following business applications of cloud computing -
i. MailChimp
MailChimp is an email publishing platform which provides various
options to design, send, and save templates for emails.
ii. Salesforce
Salesforce platform provides tools for sales, service, marketing, e-
commerce, and more. It also provides a cloud development platform.

iii. Chatter
Chatter helps us to share important information about the organization
in real time.
iv. Bitrix24

Subject: Cloud Computing 177 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Bitrix24 is a collaboration platform which provides communication,
management, and social collaboration tools.
v. Paypal
Paypal offers the simplest and easiest online payment mode using a
secure internet account. Paypal accepts the payment through debit cards, credit
cards, and also from Paypal account holders.
vi. Slack
Slack stands for Searchable Log of all Conversation and Knowledge. It
provides a user-friendly interface that helps us to create public and private
channels for communication.
vii. Quickbooks
Quickbooks works on the terminology "Run Enterprise anytime,
anywhere, on any device." It provides online accounting solutions for the
business. It allows more than 20 users to work simultaneously on the same
system.

Scientific applications of cloud computing:-


With cloud computing, it becomes possible to simulate various scientific
mysteries and turn these into means of better discoveries. These experiments
also make scientific research much safer. With the use of data, it becomes much
safer as no animals are hurt in the process.
The experiments are purely based on data, figures, and information, and as such,
they do not hurt anyone in the process. They are also easier to manage and, with
the power of the cloud, take less time. The results from these experiments are
also easier to present to interested parties and other stakeholders as they are
all in digital form.
Sharing them on boards on the internet and during meetings is simple and takes
very little time to spread the findings from a scientific experiment to the other
parties involved.
The fact that modern experiments that are carried out on the cloud are
entirely virtual means that they are also very effective. Carrying them out is
clean and does not leave a lot of hazardous materials lying around.
It is also much safer and more effective at finding medicines and breakthroughs
in science as opposed to working in the laboratory and experimenting on lab rats
and other guinea pigs. With the power of the cloud, scientists do not have to
worry about getting the result.
The cloud computer can be left to chug out a solution to an equation that
it has been presented with while the scientists go on with their other activities.
The power of the cloud is also useful whenever the researcher is gathering
materials for their experiments. During their studies, the researchers will require
Subject: Cloud Computing 178 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
somewhere to store their learning materials. Connecting between various
references is also possible, and the researcher will realize they take less time
whenever they are working out complex equations.
Cloud storage makes it possible to store all the findings from the research and
make backups of the datasets and other valuable programs used in the course
of the research. The researcher has access to much more computation power
with the cloud than they were used to in the past. Additionally, the use of the
cloud makes it possible for scientists to produce more accurate and valuable
research for the real world.
Most of the modern applications used in scientific experiments are known
to require a lot of computational resources. They also occupy a lot of space on
the computers they are installed on. With the cloud, the scientist has a better
opportunity to offload their computations to a more powerful computer.
Whenever the researcher uses the infrastructure as a service, they are free to
specify the amount of computing power and storage space they require to carry
out their experiment. As such, they will be able to create powerful enough
machines for the various tasks that they will have to complete before they can
come up with conclusions and inferences from their experiments.
Long-running equations that are known to take longer to complete are
known to require computers that are on for much longer. These computers will
also need to be monitored for uptime, and the cloud happens to be one of the
types of computers that you can leave on for the entire month without any of
your programs stopping or services shutting down.
The computer will not shut down, and the program that you had left running on
it will still be running when you connect to the computer again. A computer
running on the cloud can stay on for much longer, and whenever you are
carrying out scientific research, you will be able to accomplish a lot more.
The cloud also ensures that the scientist is more organized. Most of the
complex calculations will be carried out in the cloud while the rest of the work,
such as documentation and getting organized, will take place on their local
machine.
Their workflow will be significantly enhanced by using the cloud, which will be
one of the main ways to ensure that they can come up with findings much faster.
The cloud computer will enable the scientist to develop faster results and
solutions to their complex equation. It is responsible for reducing the time
between coming up with theories and calculating their proof.

Subject: Cloud Computing 179 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

For the scientific community, the use of the cloud makes it possible to innovate
faster and create better, more effective solutions to most of the world’s
problems. It also makes progress faster in various fields since the research takes
less time, and as a result, the pace of innovation increases.
The researchers and scientists are fully equipped to carry out their work
on cloud computers, and they are responsible for driving innovation forward.
When it comes to developing vaccines and medicines for some of the most
dangerous diseases globally, the use of cloud computing is a huge help.
Simulations are also reducing the dependency of the researcher on
physical laboratories to carry out their experiments. With the power of cloud
computing, virtual experiments are now possible. These are more effective, and
the results from them are more accurate and effective for making research
effective.
Cloud computing makes it possible for these simulations to be carried out on a
broader scale. As such, it becomes possible to simulate events taking place all
across the globe and, as such, be able to come up with measures that will reduce
the spread of certain diseases while also ensuring that medication gets to the
most critical areas of the world first.
With modern cloud computing, most of the work that used to take
scientists years to accomplish is now taking months. This is because most of the
calculations that need to be done and the figures that have to be confirmed are
all computed on a larger scale which means that the scientist will make
inferences and conclusions in less time.
Subject: Cloud Computing 180 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The use of the cloud for research has also brought about new insights into many
research areas. For instance, combinations and permutations of any scientific
approach mean that it is now possible to develop unique new solutions to some
of the world’s most complicated problems.
Keeping track of experiments has also been made much easier, and with this in
place, scientists can make better progress. No longer do they have blockades
and impediments standing in their way. They can innovate faster, and with this
in place, they will also get to come up with solutions for some of the most
complicated problems in their area of interest much faster.
The use of the cloud also means that most of the data is converted into a
digital form, which means sharing it among the research community will be
made even more effective. It takes less time to distribute anything in digital
format, and research findings are no different.
Using Cloud Storage to Backup Research Findings
A lot of data is generated during research and scientific work. For one, the
research and experiments are known to take up much space on the computers
they run on. Additionally, they are known to generate many results that help
further the work of the researchers and make sure that they can improve on
their work and even create better outcomes.
The main challenge lies in the preservation of all this information. The
scientists will need to have a means of storing the data from all their research
work that will be safe and reliable.
The use of cloud storage is one of the most reliable means of storing
information on the internet. Cloud storage can be used to store backups of
information and research findings from the research and experiments. The cloud
has got massive amounts of space, making it one of the most effective ways of
storing data.

consumer applications of cloud

Cloud computing offerings targeted toward individuals for personal use,


such as Dropbox or iCloud. End users interact with consumer cloud’s through
highly interactive applications. When you store photos or documents in an app
like Dropbox, you are interacting with their cloud, hence the consumer cloud.
1. Evernote. Bill Gates was known to have overused the word “great” during his
press appearances as the head of Microsoft, so there are probably thousands of
sound bites of the phrase “great apps” just waiting to be compiled into the next
great, annoying YouTube mash-up. Only a few apps get to be described as things
of beauty.
2. iCloud. The establishment of Apple’s stronghold in devices, and the services
that support them, was deliberate, systematic, and in almost every aspect of its
Subject: Cloud Computing 181 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
execution, brilliant. The exception was MobileMe, a service whose frequent slip-
ups and uncharacteristically dramatic failures led Steve Jobs to openly declare
its launch “not our finest hour.”
3. Spotify. The reason for the decline, if not yet outright collapse, of the global
recording industry is that it is has not been meaningful or desirable for
consumers to own music. The industry’s principal delivery system for music,
even to this date, remains a container that consumers no longer want; and the
system that consumers prefer, and which a majority of them now actually use,
is something that the industry has yet to truly embrace. Services like Last.fm and
Pandora are more convenient than music ownership and, for more users today,
more interesting than radio.
4. Do.com. Nothing more thoroughly demonstrates the rapidly changing state
of the applications market in general than the fact that Microsoft Outlook’s
greatest competition in over a decade comes from something that isn’t really an
e-mail client. Do.com from Salesforce includes the level and ease of functionality
for file sharing and collaboration that enterprises may have already attached to
Outlook by way of add-ons, but which aren’t available for everyday Outlook
users.
5. Audiobox.fm. My wife and I are both Pandora fans, although last year I found
it ironic that both of us had been working – albeit without admitting it to
ourselves – to make Pandora play music we actually already owned. Yes, that
sounds like a pathetic waste of precious seconds
Top 7 applications of cloud computing.
1. Online Data Storage
Cloud Computing allows storage and access to data like files, images, audio, and
videos on the cloud storage. In this age of big data, storing huge volumes of
business data locally requires more and more space and escalating costs. This is
where cloud storage comes into play, where businesses can store and access
data using multiple devices.
The interface provided is easy to use, convenient, and has the benefits of high
speed, scalability, and integrated security.
2. Backup and Recovery
Cloud service providers offer safe storage and backup facility for data and
resources on the cloud. In a traditional computing system, data backup is a
complex problem, and often, in case of a disaster, data can be permanently lost.
But with cloud computing, data can be easily recovered with minimal damage in
case of a disaster.
Subject: Cloud Computing 182 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
3. Big Data Analysis
One of the most important applications of cloud computing is its role in
extensive data analysis. The extremely large volume of big data makes it
impossible to store using traditional data management systems. Due to the
unlimited storage capacity of the cloud, businesses can now store and analyze
big data to gain valuable business insights.
4. Testing and Development
Cloud computing applications provide the easiest approach for testing and
development of products. In traditional methods, such an environment would
be time-consuming, expensive due to the setting up of IT resources and
infrastructure, and needed manpower. However, with cloud computing,
businesses get scalable and flexible cloud services, which they can use for
product development, testing, and deployment.
5. Antivirus Applications
With Cloud Computing comes cloud antivirus software which is stored in the
cloud from where they monitor viruses and malware in the organization’s
system and fixes them. Earlier, organizations had to install antivirus software
within their system and detect security threats.
6. E-commerce Application
Ecommerce applications in the cloud enable users and e-businesses to respond
quickly to emerging opportunities. It offers a new approach to business leaders
to make things done with minimum amount and minimal time. They use cloud
environments to manage customer data, product data, and other operational
systems.
7. Cloud Computing in Education
E-learning, online distance learning programs, and student information portals
are some of the key changes brought about by applications of cloud computing
in the education sector. In this new learning environment, there’s an attractive
environment for learning, teaching, experimenting provided to students,
teachers, and researchers so they can connect to the cloud of their
establishment and access data and information.

Subject: Cloud Computing 183 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

Q9) Energy Efficiency in Cloud Computing


Cloud computing is an internet based computing which provides metering
based services to consumers. It means accessing data from a centralized pool of
compute resources that can be ordered and consumed on demand. It also
provides computing resources through virtualization over internet.
Data center is the most prominent in cloud computing which contains collection
of servers on which Business information is stored and applications run. Data
center which includes servers, cables, air conditioner, network etc.. consumes
more power and releases huge amount of Carbon-di-oxide (CO2) to the
environment. One of the most important challenge faced in cloud computing is
the optimization of Energy Utilization. Hence the concept of green cloud
computing came into existence.
There are multiple techniques and algorithms used to minimize the energy
consumption in cloud.

Techniques include:
1. Dynamic Voltage and Frequency Scaling (DVFS)
2. Virtual Machine (VM)
3. Migration and VM Consolidation
Algorithms are:
1. Maximum Bin Packing
2. Power Expand Min-Max and Minimization Migrations
3. Highest Potential growth
The main purpose of all these approaches is to optimize the energy utilization in
cloud.
Cloud Computing as per NIST is, “Cloud Computing is a model for enabling
ubiquitous, convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage, applications
and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction.” Now-a-days most of the
business enterprises and individual IT Companies are opting for cloud in order
to share business information.

Subject: Cloud Computing 184 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The main expectation of cloud service consumer is to have a reliable
service. To satisfy consumer’s expectation several Data centers are established
all over the world and each Data center contains thousands of servers. Small
amount of workload on server consumes 50% of the power supply. .Cloud
service providers ensure that reliable and load balancing services to the
consumers around the world by keeping servers ON all the time. To satisfy this
SLA provider has to supply power continuously to data centers leads to huge
amount of energy utilization by the data center and simultaneously increases
the cost of investment.
The major challenge is utilization of energy efficiently and hence develops an
eco-friendly cloud computing.
The idle servers and resources in data center wastes huge amount of energy.
Energy also wasted when the server is overloaded.Few techniques such as load
balancing, VM virtualization, VM migration, resource allocation and job
scheduling etc. are used to solve the problem. It is also found that transporting
data between data centers and home computers can consume even larger
amounts of energy than storing it.

Q9) Market Based Management of Clouds


As consumers rely on Cloud providers to supply all their computing needs,
they will require specific QoS to be maintained by their providers in order to
meet their objectives and sustain their operations. Cloud providers will need to
consider and meet different QoS parameters of each individual consumer as
negotiated in specific SLAs. To achieve this, Cloud providers can no longer
continue to deploy traditional system-centric resource management
architecture that do not provide incentives for them to share their resources
and still regard all service requests to be of equal importance. Instead,
marketoriented resource management is necessary to regulate the supply and
Demand of Cloud resources at market equilibrium, provide feedback in terms of
economic incentives for both Cloud consumers and providers, and promote
QoS-based resource allocation mechanisms that differentiate service requests
based on their utility. Figure shows the high-level architecture for supporting
market-oriented resource allocation in Data Centers and Clouds.

Subject: Cloud Computing 185 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

There are basically four main entities involved:


Users/Brokers:
Users or brokers acting on their behalf submit service requests from
anywhere in the world to the Data Center and Cloud to be processed.
SLA Resource Allocator:
The SLA Resource Allocator acts as the interface between the Data
Center/Cloud service provider and external users/brokers. It requires the
interaction of the following mechanisms to support SLA-oriented resource
management:

• Service Request Examiner and Admission Control : When a service


request is first

Subject: Cloud Computing 186 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
submitted, the Service Request Examiner and Admission Control mechanism
interprets the submitted request for QoS requirements before determining
whether to accept or reject the request. Thus, it ensures that there is no
overloading of resources whereby many service requests cannot be fulfilled
successfully due to limited resources available. It also needs the latest status
information regarding resource availability (from VM Monitor mechanism) and
workload processing (from Service Request Monitor mechanism) in order to
make resource allocation decisions effectively. Then, it assigns requests to VMs
and determines resource entitlements for allocated VMs.

Pricing:
The Pricing mechanism decides how service requests are charged. For
instance, requests can be charged based on submission time (peak/off-peak),
pricing rates (fixed/changing) or availability of resources (supply/demand).
Pricing serves as a basis for managing the supply and demand of computing
resources within the Data Center and facilitates in prioritizing resource
allocations effectively.

Accounting:
The Accounting mechanism maintains the actual usage of resources by
requests so that the final cost can be computed and charged to the users. In
addition, the maintained historical usage information can be utilized by the
Service Request Examiner and Admission Control mechanism to improve
resource allocation decisions.

VM Monitor:
The VM Monitor mechanism keeps track of the availability of VMs and
their resource entitlements. Dispatcher: The Dispatcher mechanism starts the
execution of accepted service requests on allocated VMs.
Service Request Monitor: The Service Request Monitor mechanism keeps track
of the execution progress of service requests.

VMs:
Multiple VMs can be started and stopped dynamically on a single physical
machine to meet accepted service requests, hence providing maximum
flexibility to configure various partitions of resources on the same physical
machine to different specific requirements of service requests. In addition,
Subject: Cloud Computing 187 | P a g e
Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
multiple VMs can concurrently run applications based on different operating
system environments on a single physical machine since every VM is completely
isolated from one another on the same physical machine.

•Physical Machines:
The Data Center comprises multiple computing servers that provide
resources to meet service demands. Commercial offerings of market-oriented
Clouds must be able to: support customer-driven service management based
on customer profiles and requested service requirements, define computational
risk management tactics to identify, assess, and manage risks involved in the
execution of applications with regards to service requirements and customer
needs, derive appropriate market-based resource management strategies that
encompass both customer-driven service management and computational risk
management to sustain SLA-oriented resource allocation, incorporate
autonomic resource management models that effectively self-manage changes
in service requirements to satisfy both new service demands and existing service
obligations, and leverage VM technology to dynamically assign resource shares
according to service requirements.

Q10) FEDERATED CLOUDS/INTER CLOUDS

Characterization and definition


The terms cloud federation and Inter Cloud; often used interchangeably,
convey the general meaning of an aggregation of cloud computing providers
that have separate administrative domains.
It is important to clarify what these two terms mean and how they apply
to cloud computing.
The term federation implies the creation of an organization that
supersedes the decisional and administrative power of the single entities and
that acts as a whole.
Within a cloud computing context, the word federation does not have
such a strong connotation but implies that there are agreements between the
various cloud providers, allowing them to leverage each other’s services in a
privileged manner.
A definition of the term cloud federation was given by Reuven Cohen,
founder and CTO of Enomaly Inc.
“Cloud federation manages consistency and access controls when two or more
independent geographically distinct Clouds share either authentication, files,
computing resources, command and control or access to storage resources.”

Subject: Cloud Computing 188 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

This definition is broad enough to include all the different expressions of


cloud services aggregations that are governed by agreements between cloud
providers, rather than composed by the user.
Inter Cloud is a term that is often used interchangeably to express the
concept of Cloud federation. It was introduced by Cisco for expressing a
composition of clouds that are interconnected by means of open standards to
provide a universal environment that leverages cloud computing services. By
mimicking the Internet term, often referred as the “network of networks,” Inter
Cloud represents a “Cloud of Clouds” and therefore expresses the same concept
of federating together clouds that belong to different administrative
organizations. Whereas this is in many cases acceptable, some practitioners and
experts—like Ellen Rubin, founder and VP of Products at Cloud Switch —prefer
to give different connotations to the two terms:

The primary difference between the Inter Cloud and federation is that
the Inter Cloud is based on future standards and open interfaces, while
federation uses a vendor version of the control plane. With the Inter Cloud
vision, all Clouds will have a common understanding of how applications
should be deployed. Eventually workloads submitted to a Cloud will include
enough of a definition (resources, security, service level, geo-location, etc.)
that the Cloud is able to process the request and deploy the application. This
will create the true utility model, where all the requirements are met by the
definition and the application can execute “as is” in any Cloud with the
resources to support it.

Therefore, the term Inter Cloud refers mostly to a global vision in which
interoperability among different cloud providers is governed by standards, thus
creating an open platform where applications can shift workloads and freely
compose services from different sources. On the other hand, the concept of a
cloud federation is more general and includes ad hoc aggregations between
cloud providers on the basis of private agreements and proprietary interfaces.

Subject: Cloud Computing 189 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV

WHY CLOUD FEDERATION?

CLOUD FEDERATION STACK

Creating a cloud federation involves research and development at different


levels: conceptual, logical and operational, and infrastructural.

Subject: Cloud Computing 190 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Figure provides a comprehensive view of the challenges faced in designing
and implementing an organizational structure that coordinates together cloud
services that belong to different administrative domains and makes them
operate within a context of a single unified service middleware.

Each cloud federation level presents different challenges and operates at


a different layer of the IT stack. It then requires the use of different approaches
and technologies. Taken together, the solutions to the challenges faced at each
of these levels constitute a reference model for a cloud federation.

CONCEPTUAL LEVEL

The conceptual level addresses the challenges in presenting a cloud


federation as a favourable solution with respect to the use of services leased by
single cloud providers. In this level it is important to clearly identify the
advantages for either service providers or service consumers in joining a
federation and to delineate the new opportunities that a federated environment
creates with respect to the single-provider solution.

Elements of concern at this level are:

· Motivations for cloud providers to join a federation.


· Motivations for service consumers to leverage a federation.
· Advantages for providers in leasing their services to other providers.
· Obligations of providers once they have joined the federation.
· Trust agreements between providers.
· Transparency versus consumers.
Among these aspects, the most relevant are the motivations of both service
providers and consumers in joining a federation.

LOGICAL & OPERATIONAL LEVEL

The logical and operational level of a federated cloud identifies and


addresses the challenges in devising a framework that enables the aggregation
of providers that belong to different administrative domains within a context of
a single overlay infrastructure, which is the cloud federation.

At this level, policies and rules for interoperation are defined. Moreover, this is
the layer at which decisions are made as to how and when to lease a service to—
or to leverage a service from— another provider.

Subject: Cloud Computing 191 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
The logical component defines a context in which agreements among
providers are settled and services are negotiated, whereas the operational
component characterizes and shapes the dynamic behaviour of the federation
as a result of the single providers’ choices.
This is the level where MOCC is implemented and realized. It is important
at this level to address the following challenges:

• How should a federation be represented?


• How should we model and represent a cloud service, a cloud provider, or an
agreement?
• How should we define the rules and policies that allow providers to join a
federation?
• What are the mechanisms in place for settling agreements among providers?
• What are provider’s responsibilities with respect to each other?
• When should providers and consumers take advantage of the federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which fraction of
resources should we lease? The logical and operational level provides
opportunities for both academia and industry.

INFRASTRUCTURE LEVEL

The infrastructural level addresses the technical challenges involved in


enabling heterogeneous cloud computing systems to interoperate seamlessly.

It deals with the technology barriers that keep separate cloud computing
systems belonging to different administrative domains. By having standardized
protocols and interfaces, these barriers can be overcome.

At this level it is important to address the following issues:

• What kind of standards should be used?


• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and
services enabling interoperability?

Interoperation and composition among different cloud computing vendors is


possible only by means of open standards and interfaces. Moreover, interfaces
and protocols change considerably at each layer of the Cloud Computing
Reference Model.

Subject: Cloud Computing 192 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
Q11) What is the definition of a third-party service provider?
A third-party service provider is generally defined as an external person
or company who provides a service or technology as part of a contract. In the IT
space, a third-party service provider typically provides a technology used to
store, process, and/or transmit data that enhances an organization’s
operational efficiency.

What is an example of a third-party service provider?


Since any “as-a-Service” technology solution is a third-party service provider,
nearly every organization uses at least one, if not more.

Software-as-a-Service Providers (SaaS)


Software-as-a-service is the most commonly recognized third-
party service provider because people use them at work and home. A SaaS
provider offers an application delivered through the internet and often uses a
subscription pricing model. One way to think about SaaS services is that they are
like rental furniture; you only get the pieces you need for the length of time you
want them.

Some typical SaaS service providers include:

• Google Suite
• O365
• GoToMeeting
• Salesforce
Platform-as-a-Service (PaaS)
Platform-as-a-Service is a little trickier. A PaaS provider offers a cloud-
based location where organizations can build their own software without
worrying about maintenance like operating systems, software updates, storage,
or infrastructure. They also use a subscription pricing model. To follow the rental
analogy, you can consider PaaS services as similar to renting a furnished house;
they give you everything you need to do what you need to do so that you don’t
have to think about it.

Some typical PaaS providers include:

• Google App Engine


• Apache Stratos

Subject: Cloud Computing 193 | P a g e


Sri Harshini Degree & P.G College M.Sc(Comp):: Sem-IV
• Force.com
• SAP Cloud Platform
Infrastructure-as-a-Service (IaaS)
Distinguishing between Infrastructure-as-a-Service and Platform-as-a-
Service can be a little more complicated because many of the services overlap.
An IaaS provider sells computing infrastructure such as servers, storage,
networking firewalls/security, and data center services using a subscription-
based model. Finally, to bring the analogy to completion, IaaS services offer you
a rental house without the furniture.

Some typical IaaS providers include:


• Amazon Web Services
• Microsoft Azure
• Google Cloud
• IBM Cloud
• Oracle Cloud
Why do you need third-party service providers?
Since most companies focus on their primary product, building their version of
an existing technology becomes a financial and operational burden. The choice
to purchase public cloud services rather than build a private cloud offers a good
example.
When an organization chooses to build a private cloud, it needs to consider the
following costs:
• Hardware costs: the number of servers you need to purchase
• Maintenance costs: the time the IT department will spend keeping it
functional and updated
• Capacity: the amount of current and future resources you need to
optimize the use and cost
• Compute power: the amount of power necessary to meet current needs
and additional power for high-use periods
• Access: the ability for off-site users to access remotely
Many companies find it difficult to estimate these costs. Additionally, even if
they estimate correctly, building a private cloud comes with operational costs
that can be difficult to quantify.

THE END

Subject: Cloud Computing 194 | P a g e

You might also like