0% found this document useful (0 votes)
97 views

DA Lab - Manual

The document outlines the course objectives and content for a laboratory course on data analytics and cloud computing. The course aims to provide students practical experience with data analytics algorithms and familiarize them with developing web services and applications in the cloud. The course content includes experiments on running virtual machines with different configurations, attaching virtual blocks to virtual machines, installing a C compiler in a virtual machine, demonstrating virtual machine migration, and installing and interacting with storage controllers in the cloud.

Uploaded by

Anushiya R
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

DA Lab - Manual

The document outlines the course objectives and content for a laboratory course on data analytics and cloud computing. The course aims to provide students practical experience with data analytics algorithms and familiarize them with developing web services and applications in the cloud. The course content includes experiments on running virtual machines with different configurations, attaching virtual blocks to virtual machines, installing a C compiler in a virtual machine, demonstrating virtual machine migration, and installing and interacting with storage controllers in the cloud.

Uploaded by

Anushiya R
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA ANALYTICS AND CLOUD L T P C

20IT019 LABORATORY 0 0 3 1.5

COURSE OBJECTIVES
The course will enable the students to obtain practical experience with data analytics
algorithms and get familiarized with the development of web services and
applications in the cloud framework.

COURSE CONTENT:
List of Experiments
1. Find procedure to run the virtual machine of different configuration. Check how
many virtual machines can be utilized at particular time.
There are several ways to run a virtual machine with different configurations, depending
on the virtualization software being used. Here are some general steps for running a
virtual machine with different configurations:

1. Install a virtualization software such as VirtualBox, VMware, or Hyper-V.


2. Create a new virtual machine by specifying the desired configuration, such as the amount
of RAM, number of CPU cores, and disk space.
3. Install an operating system on the virtual machine, such as Windows or Linux.
4. Configure the virtual machine's network settings, such as assigning it a static IP address or
connecting it to a virtual network.
5. Start the virtual machine and access it via the virtualization software's management
interface or through remote access protocols like RDP or SSH.

The number of virtual machines that can be utilized at a particular time depends on the
resources available on the host machine, such as CPU, RAM, and disk space. It also
depends on the specific virtualization software being used and its capabilities. For
example, some virtualization software can allow you to run multiple virtual machines
simultaneously on the same host, while others may limit the number of running virtual
machines.

You can check the number of virtual machines that can be utilized at a particular time by
checking the host machine's resource usage and the virtualization software's settings. For
example, in VirtualBox, you can check the number of virtual machines running under the
"Machine" menu, and also check the host machine's resource usage by going to the
"Help" menu, then "System Information".

Page 1 of 120
2. Find procedure to attach virtual block to the virtual machine and check whether it
holds the data even after the release of the virtual machine.
The procedure to attach a virtual block to a virtual machine can vary depending on the
virtualization software being used. Here are some general steps for attaching a virtual
block to a virtual machine in VirtualBox:

1. Open VirtualBox and select the virtual machine to which you want to attach the virtual
block.
2. Click on the "Settings" button to open the virtual machine's settings.
3. Click on the "Storage" tab.
4. Click on the "+" button to add a new storage controller.
5. Select "Hard Disk" as the storage type and click "Create new hard disk" button.
6. Follow the steps to create a new virtual hard disk.
7. Select the new virtual hard disk you've created and click on the "Add Attachment" button.
8. Select the new hard disk and click "Choose Virtual Optical Disk File" and navigate to the
location of the virtual block you want to attach.
9. Click "Open" and then click "OK" to close the settings window.
10. Start the virtual machine and check whether it recognizes the new virtual block, it
should be available as a new disk.

Once the virtual block is attached to the virtual machine, the virtual machine will treat it as
a regular physical disk and can use it to store data. The data stored on the virtual block
will persist even after the virtual machine is released, as long as the virtual block is not
deleted.

To check whether the data holds after the release of the virtual machine, you can take a
snapshot of the virtual machine before shutting it down, after that, you can start the
virtual machine from that snapshot and check the data.

Alternatively, you can also use the export feature of the virtualization software to create a
copy of the virtual machine, including the virtual block, and then use the copy to start the
virtual machine and check the data.

3. Install a C compiler in the virtual machine and execute a sample program.


The procedure to install a C compiler in a virtual machine can vary depending on the
operating system being used. Here are some general steps for installing a C compiler in a
Linux-based virtual machine:

1. Start the virtual machine and log in as an administrator or a user with root/sudo access.
Page 2 of 120
2. Open a terminal window.
3. Update the package manager by running the command "sudo apt-get update" (for
Ubuntu/Debian) or "sudo yum update" (for Fedora/CentOS)
4. Install the C compiler by running the command "sudo apt-get install gcc" (for
Ubuntu/Debian) or "sudo yum install gcc" (for Fedora/CentOS)
5. Verify the installation by running the command "gcc --version"

To execute a sample program, follow these steps:

1. Open a terminal window in the virtual machine


2. Create a new file named "example.c" using a text editor like nano or vim, and type in or
copy a sample C program into the file.
3. Compile the program by running the command "gcc example.c -o example"
4. Run the program by running the command "./example"

You should now see the program's output displayed in the terminal window.

the above steps are for Linux-based virtual machine and may vary depending on the
specific operating system and version of the virtual machine.

Also, Windows-based virtual machine can use compilers such as MinGW, Visual Studio or
GCC for Windows, to install it you can download the installer and run it, then you can use
the command prompt or the integrated development environment (IDE) to run and
compile C programs.

4. Show the virtual machine migration based on the certain condition from one node to
the other.
Virtual machine migration is the process of moving a running virtual machine from one
physical host to another without any interruption to the running services. The process of
migrating a virtual machine from one node to another can vary depending on the
virtualization software being used. Here are some general steps for migrating a virtual
machine using the live migration feature in VMware vSphere:

1. Log in to the vSphere web client and navigate to the host or cluster where the virtual
machine is currently running.
2. Right-click on the virtual machine and select "Migrate."
3. Select "Change host" and select the destination host or cluster where you want to migrate
the virtual machine.

Page 3 of 120
4. Select the migration type. If you want the virtual machine to be migrated without
any interruption, then you should choose "Ensure the virtual machine is powered
on"
5. Select the storage where the virtual machine's files should be located on the
destination host.
6. Click "Next" and review the migration settings, then click "Finish" to start the
migration process.

The virtual machine migration can also be triggered based on certain conditions such
as resource utilization, power consumption, or a specific time schedule. This is known
as automatic or scheduled migration.

In order to set the condition-based migration, you will have to use the vSphere
Distributed Resource Scheduler (DRS) which automatically balances the virtual machine
workloads across the hosts in a cluster. DRS uses the current resource usage, resource
reservations, and constraints, to determine which host is the best for a virtual machine.

You can also use vSphere HA (High availability) that restarts virtual machines
automatically on other hosts in the event of a host failure.

In summary, virtual machine migration can be done manually or automatically based


on certain conditions using specific features of the virtualization software. The process
of migrating a virtual machine can be done with minimal interruption to the running
services and can be used to balance the workloads, increase resource utilization and
ensure high availability.

5. Find procedure to install storage controller and interact with it.


To install a storage controller in cloud computing, you can follow these general steps:

1. Choose a cloud provider and create an account if you do not already have one.
2. Select the appropriate storage service for your needs, such as an object storage service or
a block storage service.
3. Create a storage container or volume, depending on the service you have chosen.
4. Configure any necessary settings, such as access controls or performance tiers.
5. Obtain the credentials needed to interact with the storage service, such as access keys or
connection strings.
6. Use a programming language or command-line tool to interact with the storage service,
such as the AWS SDK for your language of choice or the AWS Command Line Interface
(CLI).

Page 4 of 120
7. Once the storage controller is installed, you can use it to create, read, update, and
delete data stored in the storage container or volume.
8. Configure backup and replication policies and monitor the storage usage.

It is important to note that the specific steps and tools you will use will depend on the cloud
provider and storage service you have chosen.
6. Find procedure to set up the one node Hadoop cluster.

7. Mount the one node Hadoop cluster using FUSE.


8. Write a word count program to demonstrate the use of Map and Reduce tasks.
9. Implementation of Regression Techniques (Linear, Multiple and Logistic).
Linear Regression
 import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# load the Iris dataset


iris = pd.read_csv("iris.csv")

# define X and y
X = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = iris['petal_length']

# split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=0)

# create a linear regression model


linreg = LinearRegression()

# fit the model to the training data


linreg.fit(X_train, y_train)

# make predictions on the testing set

Page 5 of 120
y_pred = linreg.predict(X_test)

# evaluate the model


print("Mean squared error:", metrics.mean_squared_error(y_test, y_pred)) 
  

Multiple Regression
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the Boston Housing dataset


boston = load_boston()

# Define the feature matrix and the target vector


X = boston.data
y = boston.target

# Create a Linear Regression model


model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Print the coefficients


print(model.coef_)

# Print the mean squared error and the R-squared score


Page 6 of 120
print("Mean squared error: %.2f" % mean_squared_error(y, y_pred))
print("R-squared score: %.2f" % r2_score(y, y_pred))

Logistic Regression
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the dataset


data = pd.read_csv('data.csv')

# Split the data into features and target


X = data.drop('target', axis=1)
y = data['target']

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create the logistic regression model


model = LogisticRegression()

# Train the model on the training data


model.fit(X_train, y_train)

# Make predictions on the test data


y_pred = model.predict(X_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

import pandas as pd
Page 7 of 120
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

# load the Iris dataset


iris = pd.read_csv("iris.csv")

# define X and y
X = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = iris['species']

# split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# create a logistic regression model


logreg = LogisticRegression()

# fit the model to the training data


logreg.fit(X_train, y_train)

# make predictions on the testing set


y_pred = logreg.predict(X_test)

# evaluate the model


print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

10. Implementation of Decision Tree learning.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

Page 8 of 120
# load the dataset
data = pd.read_csv("data.csv")

# define X and y
X = data[['feature1', 'feature2', 'feature3', 'feature4']]
y = data['target']

# split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# create a Decision Tree model


dtree = DecisionTreeClassifier()

# fit the model to the training data


dtree.fit(X_train, y_train)

# make predictions on the testing set


y_pred = dtree.predict(X_test)

# evaluate the model


print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

11. Implementation of Random Forest.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

# load the dataset


data = pd.read_csv("data.csv")

# define X and y
Page 9 of 120
X = data[['feature1', 'feature2', 'feature3', 'feature4']]
y = data['target']

# split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# create a Random Forest model


rforest = RandomForestClassifier(n_estimators=100)

# fit the model to the training data


rforest.fit(X_train, y_train)

# make predictions on the testing set


y_pred = rforest.predict(X_test)

# evaluate the model


print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

12. Implementation of Clustering (K-Means, Hierarchical).


from sklearn.cluster import KMeans
import pandas as pd

# load dataset
data = pd.read_csv("data.csv")

# perform k-means clustering


kmeans = KMeans(n_clusters=3)
kmeans.fit(data)

# print cluster labels


print(kmeans.labels_)

Page 10 of 120
#predict cluster of new data
new_data = [[1, 2, 3], [4, 5, 6]]
predictions = kmeans.predict(new_data)
print(predictions)

Hierarchical Clustering
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn import datasets

# Load the iris dataset


iris = datasets.load_iris()
X = iris.data

# Perform hierarchical clustering


agg_clustering = AgglomerativeClustering(n_clusters=3)
agg_clustering.fit(X)
# Print the cluster labels
print(agg_clustering.labels_)

agg_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')


from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

# Perform linkage
Z = linkage(X, method='ward')
# Plot dendrogram
dendrogram(Z)
plt.show()

13. Implementation of Association Rule Mining.


Page 11 of 120
from apyori import apriori

# Data set
transactions = [
['milk', 'bread', 'butter'],
['milk', 'bread', 'butter', 'cheese'],
['milk', 'bread', 'eggs'],
['milk', 'bread', 'eggs', 'cheese'],
['milk', 'bread', 'butter', 'cheese', 'eggs'],
]

# Association rule mining


rules = apriori(transactions, min_support=0.5, min_confidence=0.7, min_lift=1)

# Print results
for item in rules:
pair = item[0]
items = [x for x in pair]
print("Rule: " + items[0] + " -> " + items[1])
print("Support: " + str(item[1]))
print("Confidence: " + str(item[2][0][2]))
print("Lift: " + str(item[2][0][3]))
print("=====================================")

This program uses the apriori function from the apyori library to perform association
rule mining on the given dataset of transactions. The minimum support, confidence, and
lift values are set to 0.5, 0.7, and 1 respectively. The resulting rules, support, confidence,
and lift values are then printed.

14. Implementation of k-nearest neighbour’s algorithm.

from sklearn.datasets import load_iris

Page 12 of 120
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# KNN classification
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Evaluate the model


accuracy = knn.score(X_test, y_test)
print("Accuracy: {:.2f}%".format(accuracy * 100))

# Make predictions
predictions = knn.predict(X_test)
print("Predictions:", predictions)

This program uses the load_iris function from the sklearn.datasets module to load the iris
dataset. It then splits the dataset into training and test sets using the train_test_split
function. The KNeighborsClassifier class from the sklearn.neighbors module is then used
to perform KNN classification on the training data with 5 nearest neighbors. The model's
accuracy is then evaluated on the test data, and predictions are made on the test data.

15. Implementation of classification using SVM.

import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm

Page 13 of 120
# Load iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# SVM classification
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X_train, y_train)

# Evaluate the model


accuracy = clf.score(X_test, y_test)
print("Accuracy: {:.2f}%".format(accuracy * 100))

# Make predictions
predictions = clf.predict(X_test)
print("Predictions:", predictions)

This program uses the load_iris function from the sklearn.datasets module to load the iris dataset. It
then splits the dataset into training and test sets using the train_test_split function. The SVC class
from the sklearn.svm module is then used to perform SVM classification on the training data with a
linear kernel and a regularization parameter C of 1. The model's accuracy is then evaluated on the
test data, and predictions are made on the test data.

Page 14 of 120
COURSE OUTCOMES:

CO1: To develop the ability to build and assess data-based models.

CO2: Data analyses with professional statistical software.


CO3: Demonstrate skill in data management.
CO4: To use the cloud tool kits.
CO5: To design and implement applications on the Cloud.

REFERENCES:
1. Bart Baesens, “Analytics in a Big Data World: The Essential Guide to Data
Science and its Applications”, Wiley Publication, 1st Edition, 2014.
2. Subhashini Chellappan, Seema Acharya, “Big Data and Analytics”, Wiley
Publication, 2nd Edition, 2019.
3. Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, “Mining of Massive
Datasets”, Cambridge University Press, 2nd Edition, 2014.
4. Rajkumar Buyya, Christian Vecchiola, S. Thamarai Selvi, “Mastering Cloud
Computing”. McGraw Hill Education, 1st Edition, 2017
5. Rajiv Misra, Yashwant Singh Patel, “Cloud and Distributed Computing:
Algorithms and Systems”, Wiley, 1st Edition. 2020.

Page 15 of 120

You might also like