ML Lab Manual (CSE)
ML Lab Manual (CSE)
LAB MANUAL
MACHINE LEARNING
PROGRAM OUTCOMES (POs)
I Course Outcomes
1
II Syllabus
2
1 The probability that it is Friday and that a student is absent is 3 %.
Since there are 5 school days in a week, the probability that it is
Friday is 20 %. What is theprobability that a student is absent given
that today is Friday? Apply Baye’s rule in python to get the result. 4
(Ans: 15%)
CO Course Outcomes
CO 1 Compare Machine Learning algorithms based on their advantages and
limitations and use the best one according to situation
1 2 3 4 5 6 7 8 9 10 11 12 i ii
CO1 3 3 2 3 3 2 1 2 2 0 1 3 3 3
CO2 2 3 2 3 3 2 1 2 2 0 1 3 2 2
CO3 3 2 3 3 2 1 2 2 0 1 3 2 2 3
CO4 3 2 3 3 2 1 2 2 0 1 3 2 2 2
CO5 3 2 3 3 2 1 2 2 0 1 3 2 3 2
CO6 3 2 3 3 2 1 2 2 0 1 3 2 2 3
Avg 2.83 2.3 2.6 3 2.3 1.3 1.6 2 0.6 0.6 2.3 2.3 2.3 2.5
1|Page
Machine Learning Lab Manual
Course Objective: The objective of this lab is to get an overview of the various
machine learning techniques and can able to demonstrate them using python.
Course Outcomes: After the completion of the course the student can able to:
List of Experiments
1. The probability that it is Friday and that a student is absent is 3 %. Since there are
5 school days in a week, the probability that it is Friday is 20 %. What is the
probability that a student is absent given that today is Friday? Apply Baye’s rule in
python to get the result. (Ans: 15%)
4. Given the following data, which specify classifications for nine combinations of
VAR1 and VAR2 predict a classification for a case where VAR1=0.906 and
VAR2=0.606, using the result of k- means clustering with 3 means (i.e., 3 centroids)
2|Page
Machine Learning Lab Manual
3|Page
Machine Learning Lab Manual
1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is theprobability that a student
is absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans:
15%)
probAbsentFriday=0.03
probFriday=0.2
# bayes Formula
#p(Absent|Friday)=p(Friday|Absent)p(Absent)/p(Friday)
#p(Friday|Absent)=p(Friday∩Absent)/p(Absent)
bayesResult=(probAbsentFriday/probFriday)
print(bayesResult * 100)
Output: 15
4|Page
Machine Learning Lab Manual
You’ll learn the following MySQL SELECT operations from Python using a ‘MySQL Connector
Python’ module.
• Execute the SELECT query and process the result set returned by the query in
Python.
• Use Python variables in a where clause of a SELECT query to pass dynamic
values.
• Use fetchall(), fetchmany(), and fetchone() methods of a cursor class to fetch all or
limited rows from a table.
Python Select from MySQL Table
You’ll learn the following MySQL SELECT operations from Python using a ‘MySQL Connector
Python’ module.
• Execute the SELECT query and process the result set returned by the query in
Python.
• Use Python variables in a where clause of a SELECT query to pass dynamic
values.
• Use fetchall(), fetchmany(), and fetchone() methods of a cursor class to fetch all or
limited rows from a table.
Next, prepare a SQL SELECT query to fetch rows from a table. You can select all
or limited rows based on your requirement. If the where condition is used, then it
decides the number of rows to fetch.
For example, SELECT col1, col2,…colnN FROM MySQL_table WHERE id =
10;. This will return row number 10.
5|Page
Machine Learning Lab Manual
Iterate a row list using a for loop and access each row individually (Access each
row’s column data using a column name or index number.)
import pymysql
def mysqlconnect():
# To connect MySQL database
conn = pymysql.connect(
host='localhost',
user='root',
password = "pass",
db='College',
)
cur = conn.cursor()
cur.execute("select @@version")
output = cur.fetchall()
print(output)
conn.close()
6|Page
Machine Learning Lab Manual
7|Page
Machine Learning Lab Manual
# Loading data
irisData = load_iris()
knn = KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train, y_train)
8|Page
Machine Learning Lab Manual
9|Page
Machine Learning Lab Manual
4. Given the following data, which specify classifications for nine combinations of
VAR1 and VAR2 predict a classification for a case where VAR1=0.906 and
VAR2=0.606, using the result of kmeans clustering with 3 means (i.e., 3 centroids)
import numpy as np
y=np.array([0,1,1,0,1,0,1,1,1])
kmeans.predict([[0.906, 0.606]])
10 | P a g e
Machine Learning Lab Manual
5. The following training examples map descriptions of individuals onto high, medium and
low credit-worthiness.
Input attributes are (from left to right) income, recreation, job, status, age-group, home-
owner. Find the unconditional probability of `golf' and the conditional probability of `single'
given `medRisk' in the dataset?
totalRecords=10
numberGolfRecreation=4
probGolf=numberGolfRecreation/totalRecords
# bayes Formula
#p(single|medRisk)=p(medRisk|single)p(single)/p(medRisk)
#p(medRisk|single)=p(medRisk ∩ single)/p(single)
numberMedRiskSingle=2
numberMedRisk=3
11 | P a g e
Machine Learning Lab Manual
probMedRiskSingle=numberMedRiskSingle/totalRecords
probMedRisk=numberMedRisk/totalRecords
conditionalProbability=(probMedRiskSingle/probMedRisk)
Output:
12 | P a g e
Machine Learning Lab Manual
Regression
Regression analysis is one of the most important fields in statistics and machine learning. There
are many regression methods available. Linear regression is one of them
What Is Regression?
Regression analysis is one of the most important fields in statistics and machine learning. There
are many regression methods available. Linear regression is one of them.
For example, you can observe several employees of some company and try to understand how
their salaries depend on the features, such as experience, level of education, role, city they work
in, and so on.
This is a regression problem where data related to each employee represent one observation. The
presumption is that the experience, education, role, and city are the independent features, while
the salary depends on them.
Generally, in regression analysis, you usually consider some phenomenon of interest and have a
number of observations. Each observation has two or more features. Following the assumption
that (at least) one of the features depends on the others, you try to establish a relation among them.
you need to find a function that maps some features or variables to others sufficiently well.
The dependent features are called the dependent variables, outputs, or responses.
The independent features are called the independent variables, inputs, or predictors.
Linear Regression
Linear regression is probably one of the most important and widely used regression techniques.
It’s among the simplest regression methods. One of its main advantages is the ease of
interpreting results.
When implementing linear regression of some dependent variable 𝑦 on the set of independent
variables 𝐱 = (𝑥₁, …, 𝑥ᵣ), where 𝑟 is the number of predictors, you assume a linear relationship
between 𝑦 and 𝐱: 𝑦 = 𝛽₀ + 𝛽₁𝑥₁ + ⋯ + 𝛽ᵣ𝑥ᵣ + 𝜀. This equation is the regression equation. 𝛽₀, 𝛽₁,
…, 𝛽ᵣ are the regression coefficients, and 𝜀 is the random error.
Linear regression calculates the estimators of the regression coefficients or simply the predicted
weights, denoted with 𝑏₀, 𝑏₁, …, 𝑏ᵣ. They define the estimated regression function 𝑓(𝐱) = 𝑏₀ +
𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ. This function should capture the dependencies between the inputs and output
sufficiently well.
13 | P a g e
Machine Learning Lab Manual
When implementing simple linear regression, you typically start with a given set of input-output
(𝑥-𝑦) pairs (green circles). These pairs are your observations. For example, the leftmost
observation (green circle) has the input 𝑥 = 5 and the actual output (response) 𝑦 = 5. The next
one has 𝑥 = 15 and 𝑦 = 20, and so on.
The estimated regression function (black line) has the equation 𝑓(𝑥) = 𝑏₀ + 𝑏₁𝑥. Your goal is to
calculate the optimal values of the predicted weights 𝑏₀ and 𝑏₁ that minimize SSR and determine
the estimated regression function. The value of 𝑏₀, also called the intercept, shows the point where
the estimated regression line crosses the 𝑦 axis. It is the value of the estimated response 𝑓(𝑥) for 𝑥
= 0. The value of 𝑏₁ determines the slope of the estimated regression line.
The predicted responses (red squares) are the points on the regression line that correspond to the
input values. For example, for the input 𝑥 = 5, the predicted response is 𝑓(5) = 8.33 (represented
with the leftmost red square).
The residuals (vertical dashed gray lines) can be calculated as 𝑦ᵢ - 𝑓(𝐱ᵢ) = 𝑦ᵢ - 𝑏₀ - 𝑏₁𝑥ᵢ for 𝑖 = 1,
…, 𝑛. They are the distances between the green circles and red squares. When you implement
linear regression, you are actually trying to minimize these distances and make the red squares as
close to the predefined green circles as possible.
14 | P a g e
Machine Learning Lab Manual
The package scikit-learn is a widely used Python library for machine learning, built on top of
NumPy and some other packages. It provides the means for preprocessing data, reducing
dimensionality, implementing regression, classification, clustering, and more. Like NumPy, scikit-
learn is also open source.
If you want to implement linear regression and need the functionality beyond the scope of scikit-
learn, you should consider statsmodels. It’s a powerful Python package for the estimation of
statistical models, performing tests, and more. It’s open source as well.
There are five basic steps when you’re implementing linear regression:
import numpy as np
# number of observations/points
n = np.size(x)
15 | P a g e
Machine Learning Lab Manual
return(b_0, b_1)
# putting labels
16 | P a g e
Machine Learning Lab Manual
plt.xlabel('x')
plt.ylabel('y')
plt.show()
def main():
# observations
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
plot_regression_line(x, y, b)
if __name__ == "__main__":
main()
Output:
Estimated coefficients:
b_0 = -0.0586206896552
17 | P a g e
Machine Learning Lab Manual
b_1 = 1.45747126437
18 | P a g e
Machine Learning Lab Manual
import pandas as pd
X = msg.message
y = msg.labelnum
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
19 | P a g e
Machine Learning Lab Manual
pred = clf.predict(Xtest_dm)
print('Accuracy Metrics:')
document.csv:
He is my sworn enemy,neg
My boss is horrible,neg
I love to dance,pos
20 | P a g e
Machine Learning Lab Manual
Output:
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.6666666666666666
Precision: 0.6666666666666666
Confusion Matrix:
[[1 1]
[1 2]]
21 | P a g e
Machine Learning Lab Manual
import numpy
# The fitness function calulates the sum of products between each input and its corresponding
weight.
return fitness
# Selecting the best individuals in the current generation as parents for producing the offspring
of the next generation.
max_fitness_idx = max_fitness_idx[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = -99999999999
return parents
offspring = numpy.empty(offspring_size)
# The point at which crossover takes place between two parents. Usually, it is at the center.
crossover_point = numpy.uint8(offspring_size[1]/2)
22 | P a g e
Machine Learning Lab Manual
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
# The new offspring will have its first half of its genes taken from the first parent.
# The new offspring will have its second half of its genes taken from the second parent.
return offspring
gene_idx = mutations_counter - 1
return offspring_crossover
23 | P a g e
Machine Learning Lab Manual
import numpy
"""
y = w1x1+w2x2+w3x3+w4x4+w5x5+6wx6
where (x1,x2,x3,x4,x5,x6)=(4,-2,3.5,5,-11,-4.7)
We are going to use the genetic algorithm for the best possible values after a number of
generations.
"""
equation_inputs = [4,-2,3.5,5,-11,-4.7]
num_weights = len(equation_inputs)
"""
Population size
"""
sol_per_pop = 8
num_parents_mating = 4
24 | P a g e
Machine Learning Lab Manual
print(new_population)
"""
"""
best_outputs = []
num_generations = 1000
print("Fitness")
print(fitness)
25 | P a g e
Machine Learning Lab Manual
best_outputs.append(numpy.max(numpy.sum(new_population*equation_inputs, axis=1)))
num_parents_mating)
print("Parents")
print(parents)
offspring_crossover = crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
print("Crossover")
print(offspring_crossover)
print("Mutation")
print(offspring_mutation)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
26 | P a g e
Machine Learning Lab Manual
#At first, the fitness is calculated for each solution in the final generation.
# Then return the index of that solution corresponding to the best fitness.
import matplotlib.pyplot
matplotlib.pyplot.plot(best_outputs)
matplotlib.pyplot.xlabel("Iteration")
matplotlib.pyplot.ylabel("Fitness")
matplotlib.pyplot.show()
Output:
Generation : 0
Fitness
27 | P a g e
Machine Learning Lab Manual
Parents
Crossover
Mutation
Generation : 999
Fitness
Parents
28 | P a g e
Machine Learning Lab Manual
Crossover
-1.93705571e+00 -3.36865291e+02]
-1.93705571e+00 -3.36672197e+02]
-1.93705571e+00 -3.37108802e+02]]
Mutation
-1.93705571e+00 -3.36222272e+02]
-1.93705571e+00 -3.37417363e+02]
-1.93705571e+00 -3.36866918e+02]
-1.93705571e+00 -3.37331663e+02]]
-1.93705571e+00 -3.37417363e+02]]]
29 | P a g e
Machine Learning Lab Manual
30 | P a g e
Machine Learning Lab Manual
import pandas as pd
X = msg.message
y = msg.labelnum
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
31 | P a g e
Machine Learning Lab Manual
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
print('Accuracy Metrics:')
document.csv:
He is my sworn enemy,neg
My boss is horrible,neg
I love to dance,pos
32 | P a g e
Machine Learning Lab Manual
Output:
Accuracy Metrics:
Accuracy: 0.8
Recall: 1.0
Precision: 0.75
Confusion Matrix:
[[1 1]
[0 3]]
33 | P a g e
Machine Learning Lab Manual
34 | P a g e