CLOUDING COMPUTING LAB MANUAL
1. Installation and configuration of Hadoop/Euceliptus etc.
Hadoop is a fault-tolerant distributed system for data storage which is highly scalable. The
scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym
of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing,
known as MapReduce.
(Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as
MapReduce.
Why Hadoop as part of the IT?
It processes and analyzes variety of new and older data to extract meaningful business operations
wisdom. Traditionally data moves to the computation node. In Hadoop, data is processed where the
data resides . The type of questions one Hadoop helps answer are:
Event analytics — what series of steps lead a purchase or registration
Large scale web click stream analytics
Revenue assurance and price optimizations
Financial risk management and affinity engine
Many other... The Hadoop cluster or cloud is disruptive in data center. Some grid software
resource managers can be integrated with Hadoop. The main advantage is that Hadoop jobs
can be submitted orderly from within the data center. See below the integration with Oracle
Grid Engine.
What types of data we handle today?
Human-generated data that fits well into relational tables or arrays. Examples are conventional
transactions – purchase/sale, inventory/manufacturing, employment status change, etc. This is the
core data managed by OLTP relational DBMS everywhere. In the last decade, humans generated
other kinds of data as well, like text, documents (text or otherwise), pictures, videos, slideware.
Traditional relational databases are a poor home for this kind of data because:
It often deals with opinions or aesthetic judgments – there is little concept of perfect
accuracy.
There is little concept of perfect completeness.
There’s also little concept of perfectly, unarguably accurate query results –
Different people will have different opinions as to what comprises good results for a
search.
No clear cut binary answers; documents can have differing degrees of relevancy
Another type of data is the machine generated data, machine that human created and that produce
unstoppable streams of data
1. Computer logs
2. Satellite telemetry (espionage or science)
3. GPS outputs
4. Temperature and environmental sensors
5. Industrial sensors
6. Video from security cameras
7. Outputs from medical devises
8. Seismic and Geo-phisical sensors
According to Gartner , Enterprise Data will grow 650% by 2014. 85% of these data will be
“unstructured data”, and this segment has a CAGR of 62% per year, far larger than transactional
data.
Example of Hadoop usage
Netflix (NASDAQ: NFLX) is a service offering online flat rate DVD and Blu-ray disc rental-by-
mail and video streaming in the United States. It has over 100,000 titles and 10 million subscribers.
The company has 55 million discs and, on average, ships 1.9 million DVDs to customers each day.
Netflix offers Internet video streaming, enabling the viewing of films directly on a PC or TV at
home.
Netflix’s movie recommendation algorithm uses Hive (underneath using Hadoop, HDFS &
MapReduce) for query processing and Business Intelligence. Netflix collects all logs from website
which are streaming logs collected using Hunu.
They parse 0.6TB of data running on Amazon S3 50 nodes. All data are processed for Business
Intelligence using a software called MicroStrategy.
Hadoop challenges
Traditionally, Hadoop was opened for developers. But the wide adoption and success of Hadoop
depends on business users, not developers.
Commercial distributions will have to make it even easier for business analysts to use Hadoop.
Templates for business scripts are a start, but getting away from scripting altogether should be the
long term goal for the business user segment. This has not happened yet. Nevertheless Cloudera is
trying to win the business user segment, and if they succeed they will create an enterprise Hadoop
market.
To best illustrate, here it is a quote from Yahoo Hadoop development team:
“The way Yahoo! uses Hadoop is changing. Previously, most Hadoop users at Yahoo! were
researchers. Researchers are usually hungry for scalability and features, but they are fairly tolerant
of failures. Few scientists even know what "SLA" means, and they are not in the habit of counting
the number of nines in your uptime. Today, more and more of Yahoo! production applications have
moved to Hadoop. These mission-critical applications control every aspect of Yahoo!'s operation,
from personalizing user experience to optimizing ad placement. They run 24/7, processing many
terabytes of data per day. They must not fail. So we are looking for software engineers who want to
help us make sure Hadoop works for Yahoo! and the numerous Hadoop users outside Yahoo!”
Hadoop Integration with resource management cloud software
One such example is Oracle Grid Engine 6.2 Update 5. Cycle Computing also announced an
integration with Hadoop. It reduces the cost of running Apache Hadoop applications by enabling
them to share resources with other data center applications, rather than having to maintain a
dedicated cluster for running Hadoop applications. Here is a relevant customer quote
“The Grid Engine software has dramatically lowered for us the cost of data intensive, Hadoop
centered, computing. With its native understanding of HDFS data locality and direct support for
Hadoop job submission, Grid Engine allows us to run Hadoop jobs within exactly the same
scheduling and submission environment we use for traditional scalar and parallel loads. Before we
were forced to either dedicate specialized clusters or to make use of convoluted, adhoc, integration
schemes; solutions that were both expensive to maintain and inefficient to run. Now we have the
best of both worlds: high flexibility within a single, consistent and robust, scheduling system"”
Getting Started with Hadoop
Hadoop is an open source implementation of the MapReduce algorithms and distributed file system.
Hadoop is primarily developed in Java. Writing a Java application, obviously, will give you much
more control and presumably improved performance. However, it can be used with other
environments including scripting languages using “streaming”. Streaming applications simply reads
data from stdin and write their output to stdout.
Installing Hadoop
To install Hadoop, you will need to download Hadoop Common (also referred as Hadoop Core)
from http://hadoop.apache.org/common/. The binaries are available from Open Source under an
Apache License. Once you have downloaded the Hadoop Common, follow the installation and
configuration instructions.
Hadoop With Virtual Machine
If you have no experience playing with Hadoop, there is an easier way to install and experiment
with Hadoop. Rather than installing a local copy of Hadoop, install a virtual machine from Yahoo!
Virtual machine comes with Hadoop pre-installed and pre-configured and is almost ready to use.
The virtual machine is available from their Hadoop tutorial. This tutorial includes well documented
instructions for running the virtual machine and running Hadoop applications. The virtual machine,
in addition to Hadoop, includes Eclipse IDE for writing Java based Hadoop applications.
Hadoop Cluster
By default, Hadoop distributions are configured to run on single machine and the Yahoo virtual
machine is a good way to get going. However, the power of Hadoop comes from its inherent
distributed nature and deploying distributed computing on a single machine misses its very point.
For any serious processing with Hadoop, you’ll need many more machines. Amazon’s Elastic
Compute Cloud (EC2) is perfect for this. An alternative option to running Hadoop on EC2 is to use
the Cloudera distribution. And of course, you can set up your own cluster of Hadoop by following
the Apache instructions. Resources
There is a large active developer community who created many scripted languages such as HBase,
Hive, Pig and others). Cloudera, has a supported distribution.
2. Service deployment & Usage over cloud.
In the Management Portal, click Cloud Services. Then click the name of the cloud service to open
the dashboard.
1. Click Quick Start (the icon to the left of Dashboard) to open the Quick Start page, shown
below. (You can also deploy your cloud service by using Upload on the dashboard.)
2. If you haven't installed the Windows Azure SDK, click Install Azure SDK to open the
Windows Azure Downloads page, and then download the SDK for the language in which
you prefer to develop your code.
On the downloads page, you can also install client libraries and source code for developing
web apps in Node.js, Java, PHP, and other languages, which you can deploy as scalable
Windows Azure cloud services.
Note For cloud services created earlier (known earlier as hosted services), you'll need to
make sure the guest operating systems on the virtual machines (role instances) are
compatible with the Windows Azure SDK version you install. For more information, see the
Windows Azure SDK release notes.
3. Click either New Production Deployment or New Staging Deployment.
If you'd like to test your cloud service in Windows Azure before deploying it to production,
you can deploy to staging. In the staging environment, the cloud service's globally unique
identifier (GUID) identifies the cloud service in URLs (GUID.cloudapp.net). In the
production environment, the friendlier DNS prefix that you assign is used (for example,
myservice.cloudapp.net). When you're ready to promote your staged cloud service to
production, use Swap to redirect client requests to that deployment.
When you select a deployment environment, Upload a Package opens.
4. In Deployment name, enter a name for the new deployment - for example,
MyCloudServicev1.
5. In Package, use Browse to select the service package file (.cspkg) to use.
6. In Configuration, use Browse to select the service configure file (.cscfg) to use.
7. If the cloud service will include any roles with only one instance, select the Deploy even if
one or more roles contain a single instance check box to enable the deployment to
proceed.
Windows Azure can only guarantee 99.95 percent access to the cloud service during
maintenance and service updates if every role has at least two instances. If needed, you can
add additional role instances on the Scale page after you deploy the cloud service. For more
information, see Service Level Agreements.
8. Click OK (checkmark) to begin the cloud service deployment.
You can monitor the status of the deployment in the message area. Click the down arrow to
hide the message.
To verify that your deployment completed successfully
1. Click Dashboard.
2. Under quick glance, click the site URL to open your cloud service in a web browser.
3. Management of cloud resources.
In theory, cloud computing services-based resources should be no different from the resources in
your own environment, except that they live remotely. Ideally, you have a complete view of the
cloud computing resources you use today or may want to use in the future.
In most cloud environments, the customer is able to access only the services they’re entitled to use.
Entire applications may be used on a cloud services basis. Development tools are sometimes cloud
based. In fact, testing and monitoring environments can be based on the cloud.
Performance management is all about how your software services run effectively inside your own
environment and through the cloud.
If you start to connect software that runs in your own data center directly to software that runs in the
cloud, you create a potential bottleneck at the point of connection.
Services connected between the cloud and your computing environment can impact performance if
they aren’t well planned. This is especially likely to be the case if there are data translations or
specific protocols to adhere to at the cloud gateway.
As a customer, your ability to directly control the resources will be much lower in the cloud.
Therefore,
The connection points between various services must be monitored in real time. A
breakdown may impact your ability to provide a business process to your customers.
There must be expanded bandwidth at connection points.
With Software as a Service (SaaS), a customer expects provisioning (to request a resource for
immediate use) of extra services to be immediate, automatic, and effortless. The cloud service
provider is responsible for maintaining an agreed-on level of service and provisions resources
accordingly.
The normal situation in a data center is that software workloads vary throughout the day, week,
month, and year. So the data center has to be built for the maximum possible workload, with a little
bit of extra capacity thrown in to cover unexpectedly high peaks.
Service management in this context covers all the data center operations activities. This broad
discipline considers the necessary techniques and tools for managing services by both cloud
providers and the internal data center managers across these physical, IT and virtual environments.
Service management encompasses many different disciplines, including
Configuration management
Asset management
Network management
Capacity planning
Service desk
Root cause analysis
Workload management
Patch and update management
The cloud itself is a service management platform. Well-designed cloud service portfolios include a
tight integration of the core service management capabilities and well-defined interfaces.
4. Using existing cloud characteristics & Service models
Cloud Adoption Strategy Services
The Cloud Adoption Strategy Services recognize the importance of developing a
cloud strategy with expert guidance and preparing a business justification based on
requirements, business and financial needs, and other success factors. These
services include:
Cloud Strategy Workshop for XaaS Adoption: Cisco Cloud Adoption Strategy
Workshop utilizes a collaborative discussion process to examine and evaluate
industry-leading practices around cloud adoption, as well as identify the areas of
interest and importance for you to successfully adopt a cloud model. This two- to
four-hour session, which can be virtual or face to face with Cisco cloud subject
matter experts, as well as partners if applicable, helps you to frame and
understand your current situation, challenges, implications, and benefits before
migration to a cloud model. The workshop also will introduce a cloud migration
approach recommended by Cisco for your environment and your business needs
and goals.
Cloud Strategy and Business Justification Service for XaaS Adoption: Cloud
Adoption Strategy and Business Justification Service introduces an interview format
and process as a forum for assessing and evaluating your application, network,
compute, and storage architectures. The emphasis of the service is on the
application portfolio as well as gathering success factors; cloud use cases; and
financial, business, and technology requirements from your business and IT teams.
In addition to collecting these requirements, the service provides the business case
and justification for a cloud migration and discovers any business and/or technical
effects that would result from implementing the cloud model. Finally, this service
helps define the risk and dependency analysis for your cloud model.
The services
Your state of readiness to adopt a cloud model and the challenges, implications,
and benefits of adopting a cloud model
How a cloud model aligns to or affects your business, IT goals, and operational
objectives
How a cloud model affects business partners, such as vendors, suppliers,
resellers, and customers
How a cloud model can enable the ability to provide new user services with
optimal service delivery or consume new business services from others
5. Cloud Security Management.
Cloud Security Controls
Cloud security architecture is effective only if the correct defensive implementations are in place.
An efficient cloud security architecture should recognize the issues that will arise with security
management. The security management addresses these issues with security controls. These
controls are put in place to safeguard any weaknesses in the system and reduce the effect of an
attack. While there are many types of controls behind a cloud security architecture, they can usually
be found in one of the following categories:
Deterrent Controls
These controls are set in place to prevent any purposeful attack on a cloud system. Much like a
warning sign on a fence or a property, these controls do not reduce the actual vulnerability of a
system.
Preventative Controls
These controls upgrade the strength of the system by managing the vulnerabilities. The preventative
control will safeguard vulnerabilities of the system. If an attack were to occur, the preventative
controls are in place to cover the attack and reduce the damage and violation to the system's
security.
Corrective Controls
Corrective controls are used to reduce the effect of an attack. Unlike the preventative controls, the
corrective controls take action as an attack is occurring.
Detective Controls
Detective controls are used to detect any attacks that may be occurring to the system. In the event of
an attack, the detective control will signal the preventative or corrective controls to address the
issue.
Dimensions of cloud security
Correct security controls should be implemented according to asset, threat, and vulnerability risk
assessment matrices.While cloud security concerns can be grouped into any number of dimensions
(Gartner names seven while the identifies fourteen areas of concern) these dimensions have been
aggregated into three general areas: Security and Privacy, Compliance, and Legal or Contractual
Issues.
Security and privacy
Identity managemen
Every enterprise will have its own identity management system to control access to
information and computing resources. Cloud providers either integrate the customer’s identity
management system into their own infrastructure, using federation or SSO technology, or
provide an identity management solution of their own.
Physical and personnel security
Providers ensure that physical machines are adequately secure and that access to these
machines as well as all relevant customer data is not only restricted but that access is
documented.
Availability
Cloud providers assure customers that they will have regular and predictable access to their
data and applications.
Application security
Cloud providers ensure that applications available as a service via the cloud are secure by
implementing testing and acceptance procedures for outsourced or packaged application code.
It also requires application security measures be in place in the production environment.
Privacy
Finally, providers ensure that all critical data (credit card numbers, for example) are masked
and that only authorized users have access to data in its entirety. Moreover, digital identities
and credentials must be protected as should any data that the provider collects or produces
about customer activity in the cloud.
Legal issues
In addition, providers and customers must consider legal issues, such as Contracts and E-
Discovery, and the related laws, which may vary by country.
Legal and contractual issues
Aside from the security and compliance issues enumerated above, cloud providers and their
customers will negotiate terms around liability (stipulating how incidents involving data loss or
compromise will be resolved, for example), intellectual property, and end-of-service (when data and
applications are ultimately returned to the customer).
Public records
Legal issues may also include records-keeping requirements in the public sector, where many
agencies are required by law to retain and make available electronic records in a specific fashion.
This may be determined by legislation, or law may require agencies to conform to the rules and
practices set by a records-keeping agency. Public agencies using cloud computing and storage must
take these concerns into the account.