0% found this document useful (0 votes)
95 views

L. D. College of Engineering: Lab Manual For

The document provides information about implementing linear regression using gradient descent. It discusses linear regression models and defines the cost function J(θ) that represents the error between predictions and actual values. Gradient descent is introduced as an algorithm that can be used to minimize this cost function and find the optimal parameters θ that best fit the data. Pseudocode is provided for a gradient descent algorithm to optimize a linear regression model with one variable. Examples are given to demonstrate calculating the cost function J(θ) for sample data and finding the minimum where the regression line best fits the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

L. D. College of Engineering: Lab Manual For

The document provides information about implementing linear regression using gradient descent. It discusses linear regression models and defines the cost function J(θ) that represents the error between predictions and actual values. Gradient descent is introduced as an algorithm that can be used to minimize this cost function and find the optimal parameters θ that best fit the data. Pseudocode is provided for a gradient descent algorithm to optimize a linear regression model with one variable. Examples are given to demonstrate calculating the cost function J(θ) for sample data and finding the minimum where the regression line best fits the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 70

190280117004

L. D. College of Engineering

Lab Manual for

Artificial Intelligence and Machine


Learning
(Subject Code: 3161710)

B.E. 6th Semester - IC

Name of Faculty: Dr. Dipesh Makwana & Prof. Kruti Joshi

Enrolment No: 190280117004

Name of Student: Dhrumil Bhavsar

1
190280117004

Index
Sr. Page Date
Name Of Experiment Sign
No. No.

1. Implementation of linear regression

Implementation of gradient descent with single


2.
variable
Implementation of gradient descent with multi-
3.
variable

4. Polynomial Regression

5. Logistic Regression

An Implementation of Artificial Neural Networks


6.
using Back propagation

7. Implementation of SVM with simple features

8. Implementing K -means clustering algorithm

Introduction to Python Programming for Machine


9.
Learning
Write a program for the concept of decision tree to
10.
develop a piecewise linear model and test it as well.
Write a program for kNN algorithm for
11.
classification of IRIS dataset
Write a program using Bayes algorithm for email
classification (spam or non-spam) for the open-
12.
sourced data set from the UC Irvine Machine
Learning Repository
Write a program using SVM on IRIS dataset and
13.
carry out classification.
Write a program using SVM algorithm for Boston
14. house price prediction dataset to predict price of
houses from certain features

2
190280117004

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING – 3161710

SEMESTER VI

PRACTICAL – 1

AIM: Implementation of linear regression using MATLAB.

THEORY:

Regression analysis is a form of predictive modelling technique which


investigates the relationship between a dependent and independent variable.

Linear Regression

Linear regression is a basic and commonly used type of predictive analysis


which usually works on continuous data.

A Linear regression graph is made by generating a scattered plot between the


independent and dependent variable, and making a regression line, which is the
line closest to most of the points on the graph plot. This line has minimum error,
and will be used for prediction for newer data. This line is a linear
representation of the plot and is described by:

y = B1x + B0 + E0
Where,

y = Dependent Variable

x = Independent Variable

B1 = Slope of the regression line

B0 = Y intercept

E0 = error in data of the variables

3
190280117004

Example of a Linear Regression plot

Types of Linear Regression:


 
Based on the number of independent variables, there are two types of linear
regression-
 

4
190280117004

1. Simple Linear Regression:


 
In simple linear regression, the dependent variable depends only on a single
independent variable.
 
For simple linear regression, the form of the model is-
Y = β0 + β1X

Here,
 Y is a dependent variable.
 X is an independent variable.
 β0 and β1 are the regression coefficients.
 β0 is the intercept or the bias that fixes the offset to a line.
 β1 is the slope or weight that specifies the factor by which X has an
impact on Y.

There are following 3 cases possible-


 
Case-01: β1 < 0
 
 It indicates that variable X has negative impact on Y.
 If X increases, Y will decrease and vice-versa.

5
190280117004

Case-02: β1 = 0
 
 It indicates that variable X has no impact on Y.
 If X changes, there will be no change in Y.

Case-03: β1 > 0
 
 It indicates that variable X has positive impact on Y.
 If X increases, Y will increase and vice-versa.

6
190280117004

2. Multiple Linear Regression-


 
In multiple linear regressions, the dependent variable depends on more than one
independent variables.
 
For multiple linear regression, the form of the model is-

Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

Here,
 Y is a dependent variable.
 X1, X2, …., Xn are independent variables.
 β0, β1,…, βn are the regression coefficients.
 βj (1<=j<=n) is the slope or weight that specifies the factor by which Xj has an
impact on Y.

MATLAB Code/Program:
x = [10;15;20;25;30;35;40;45;50;55;60];
y = [365;387;451;499;567;609;677;725;777;808;989];

plot (x,y, 'b.');

xlabel ('Time (hours)');


ylabel ('Output of Process (gm/mol)');

X = [ones(length(x),1) x];

[A,~,~,~,STATS] = regress(y,X);

hold on

xplot = [min(x), max(x)];


yplot = A(1) +A(2)*xplot;
plot (xplot,yplot, 'r');
legend('Data', 'Model');

7
190280117004

Output:

Example 1:
The table below shows some data from the early days of the Italian clothing company
Benetton. Each row in the table shows Benetton’s sales for a year and the amount spent on
advertising that year. In this case, our outcome of interest is sales—it is what we want to
predict. If we use advertising as the predictor variable, linear regression estimates that Sales =
168 + 23 Advertising. That is, if advertising expenditure is increased by one million Euro, then
sales will be expected to increase by 23 million Euros, and if there was no advertising, we
would expect sales of 168 million Euros.

8
190280117004

Code:
x = [23;26;30;34;43;48;52;57;58];
y = [651;762;856;1063;1190;1298;1421;1440;1518];

plot (x,y, 'b.');

xlabel ('Advertising (Million Euros)');


ylabel ('Sales (Million Euros)');

X = [ones(length(x),1) x];

[A,~,~,~,STATS] = regress(y,X);

hold on

xplot = [min(x), max(x)];


yplot = A(1)+A(2)*xplot;

plot(xplot,yplot, 'r');
legend('Data', 'Model');

Output:

Conclusion:
In this experiment we were able to implement linear regression
using MATLAB.

9
190280117004

PRACTICAL – 2

AIM: Implementation of Gradient descent algorithm for single variable.

Theory:

Model Representation
First, the goal of most machine learning algorithms is to construct a model: a
hypothesis that can be used to estimate Y based on X. The hypothesis, or model,
maps inputs to outputs. So, for example, say I train a model based on a bunch of
housing data that includes the size of the house and the sale price. By training a
model, I can give you an estimate on how much you can sell your house for
based on its size. This is an example of a regression problem — given some
input, we want to predict a continuous output.
The hypothesis is usually presented as

The theta values are the parameters.


Some quick examples of how we visualize the hypothesis:

This yields h(x) = 1.5 + 0x.


0x means no slope, and y will always be the constant 1.5.
This looks like:

10
190280117004

The goal of creating a model is to choose parameters, or theta values, so that h(x)
is close to y for the training data, x and y. So for this data

X = [1, 1, 2, 3, 4, 3, 4, 6, 4]

Y = [2, 1, 0.5, 1, 3, 3, 2, 5, 4]

11
190280117004

Cost Function

We need a function that will minimize the parameters over our dataset. One
common function that is often used is mean squared error, which measure the
difference between the estimator (the dataset) and the estimated value (the
prediction). It looks like this:

It turns out we can adjust the equation a little to make the calculation down the
track a little simpler. We end up with:

Let’s apply this cost function to the follow data:

12
190280117004

For now we will calculate some theta values, and plot the cost function by hand.
Since this function passes through (0, 0), we are only looking at a single value of
theta. From here on out, I’ll refer to the cost function as J(ϴ).
For J(1), we get 0. No surprise — a value of J(1) yields a straight line that fits
the data perfectly. How about J(0.5)?

The MSE function gives us a value of 0.58. Let’s plot both our values so far:

J(1) = 0
J(0.5) = 0.58

13
190280117004

We will go ahead and calculate some more values of J(ϴ).

And if we join the dots together nicely,

14
190280117004

We can see that the cost function is at a minimum when theta = 1. This makes
sense — our initial data is a straight line with a slope of 1 (the orange line in the
figure above).

Gradient Descent

We minimized J(ϴ) by trial and error above — just trying lots of values and
visually inspecting the resulting graph. There must be a better way?
Queue gradient descent. Gradient Descent is a general function for minimizing
a function, in this case the Mean Squared Error cost function.
Gradient Descent basically just does what we were doing by hand — change the
theta values, or parameters, bit by bit, until we hopefully arrived a minimum.
We start by initializing theta0 and theta1 to any two values, say 0 for both, and
go from there. Formally, the algorithm is as follows:

where α, alpha, is the learning rate, or how quickly we want to move towards the
minimum. If α is too large, however, we can overshoot.

15
190280117004

MATLAB Code/Program:
clc
clear
close all
x = [0,0.14,0.18,0.28,0.37,0.45,0.56,0.69,0.78,1.00];
y = [0, 0.27,0.56,0.21,0.66,0.32,0.65,0.87,1.09, 0.51];
n = length(x);
b0 = 0;
b1 = 0;
plot(x,y, '.b');
hold on
for c = 1:500
y0 = b1*x +b0;
CF = sum(((y0-y).^2)/2*n);
e = sum(y0-y);
n = sum((y0-y).*x);

b0 = b0-(e*0.01);
b1 = b1-(n*0.01);

Y = b1*x + b0;
tem = plot(x,y,'r');

%plot(b1,CF,'.e')
pause(0.1);
if(c~=500)
delete(ten)
end
end
Output:

Conclusion:
In this experiment we were able to implement Gradient Descent for
single variable using MATLAB.

16
190280117004

PRACTICAL – 3

AIM: Implementation of gradient for Multiple Variables.

Theory:

In case of multivariate linear regression output value is dependent on multiple


input values. The relationship between input values, format of different input
values and range of input values plays important role in linear model creation
and prediction.

𝒉(𝒙) = 𝜽𝟎 + 𝜽𝟏𝒙𝟏 + 𝜽𝟐𝒙𝟐 + ⋯ 𝜽𝒏𝒙𝒏

Where: x1, x2, … Xn are multiple input values.

If we consider the house price example then the factors affecting its price like
house size, no of bedrooms, location etc are nothing but input variables of
above hypothesis function.

Cost Function

Our cost function remains same as used in single variable linear regression.

Gradient Descent Algorithm

Gradient descent algorithm function format remains same as used in Univariate


linear regression. But here we have to do it for all the theta values. (No of theta
values = no of features + 1)

17
190280117004

MATLAB Code/Program:
clc
clear all
close all
figure;
x=[1,5,7,11,3];
y=[1,3,9,6,4];
format long
m=0;
c=0;
u=[];
v=[];
plot(x,y,'bo','linewidth',3);
axis([0 6 0 8]);
hold on;
pause(1);
for ua=1:6
for i=1:length(x)
predicted=m*x(i)+c;
error=predicted-y(i);
g=m-0.01*error*x(i);
m=g;
k=c-0.01*error;
c=k;
u=[u m];
v=[v c];
e=g*x+k;
rs=plot(x,e,'k','linewidth',3);
axis([0 6 0 8]);
pause(0.3);
if(ua~=6 || i~=length(x))
delete(rs);
end

18
190280117004

end
end
Output:

Conclusion:

In this experiment we were able to implement Gradient Descent


for multi variable using MATLAB.

19
190280117004

PRACTICAL – 4

AIM: Implementation of Polynomial regression.

Theory:
 Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as
nth degree polynomial. The Polynomial Regression equation is given
below:

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n


 It is also called the special case of Multiple Linear Regression in ML.
Because we add some polynomial terms to the Multiple Linear
regression equation to convert it into Polynomial Regression.
 It is a linear model with some modification in order to increase the
accuracy.
 The dataset used in Polynomial regression for training is of non-linear
nature.
 It makes use of a linear regression model to fit the complicated and
non-linear functions and datasets.
 Hence, "In Polynomial regression, the original features are converted
into Polynomial features of required degree (2,3,n) and then modelled
using a linear model."

Need for Polynomial Regression:

The need of Polynomial Regression in ML can be understood in the below


points:
 If we apply a linear model on a linear dataset, then it provides us a
good result as we have seen in Simple Linear Regression, but if we
apply the same model without any modification on a non-linear
dataset, then it will produce a drastic output. Due to which loss
function will increase, the error rate will be high, and accuracy will be
decreased.

20
190280117004

 So, for such cases, where data points are arranged in a non-linear


fashion, we need the Polynomial Regression model. We can
understand it in a better way using the below comparison diagram of
the linear dataset and non-linear dataset.
 In the below image, we have taken a dataset which is arranged non-
linearly. So, if we try to cover it with a linear model, then we can
clearly see that it hardly covers any data point. On the other hand, a
curve is suitable to cover most of the data points, which is of the
Polynomial model.
 Hence, if the datasets are arranged in a non-linear fashion, then we
should use the Polynomial Regression model instead of Simple Linear
Regression.

 To understand the need for polynomial regression, let’s generate some


random dataset first.

 The data generated looks like:

21
190280117004

 Let’s apply a linear regression model to this dataset.

 The plot of the best fit line is:

 We can see that the straight line is unable to capture the patterns in the
data. This is an example of under-fitting. Compute the RMSE and R²-
score of the linear line.
 To overcome under-fitting, we need to increase the complexity of the
model.
 To generate a higher order equation, we can add powers of the original
features as new features. The linear model,

 can be transformed to

22
190280117004

 This is still considered to be linear model as the coefficients/weights


associated with the features are still linear. x² is only a feature. However,
the curve that we are fitting is quadratic in nature.
 To convert the original features into their higher order terms we will use
the Polynomial Features class provided by scikit-learn. Next, we train the
model using Linear Regression.
 Fitting a linear regression model on the transformed features gives the
below plot.

 It is quite clear from the plot that the quadratic curve is able to fit the data
better than the linear line. Compute the RMSE and R²-score of the
quadratic plot.
 If we try to fit a cubic curve (degree=3) to the dataset, we can see that it
passes through more data points than the quadratic and the linear plots.

23
190280117004

 Below is a comparison of fitting linear, quadratic and cubic curves on the

dataset.
 If we further increase the degree to 20, we can see that the curve passes
through more data points. Below is a comparison of curves for degree 3 and
20.

24
190280117004

 For degree=20, the model is also capturing the noise in the data. This is an
example of over-fitting. Even though this model passes through most of
the data, it will fail to generalize on unseen data.
 To prevent over-fitting, we can add more training samples so that the
algorithm doesn’t learn the noise in the system and can become more
generalized.

How do we choose an optimal model?


 To answer this question we need to understand the bias vs variance trade-
off.
 The Bias vs Variance trade-off
 Bias refers to the error due to the model’s simplistic assumptions in fitting
the data. A high bias means that the model is unable to capture the patterns
in the data and this results in under-fitting.
 Variance refers to the error due to the complex model trying to fit the data.
High variance means the model passes through most of the data points and
it results in over-fitting the data.
 The below picture summarizes our learning.

25
190280117004

 From the above picture we can observe that as the model complexity
increases, the bias decreases and the variance increase and vice-versa.
Ideally, a machine learning model should have low variance and low
bias. But practically it’s impossible to have both. Therefore, to achieve a
good model that performs well both on the train and unseen data, a trade-
off is made.

 Polynomial provides the best approximation of the relationship between the


dependent and independent variable.
 A Broad range of function can be fit under it.
 Polynomial basically fits a wide range of curvature.
Disadvantages of using Polynomial Regression:

26
190280117004

 The presence of one or two outliers in the data can seriously affect the
results of the nonlinear analysis.
 These are too sensitive to the outliers.
 In addition, there are unfortunately fewer model validation tools for the
detection of outliers in nonlinear regression than there are for linear
regression.

MATLAB Code/Program:
clc
x = [1 3 5 6 7 2 5 4 9 7 8 3 5 6];
y = [10 20 40 87 62 56 71 22 29 91 29 35 23 30];
plot(x,y,'ro','linewidth', 2);
hold on
p5 = polyfit(x,y,5);
xc = 1:.1:10;
y5 = polyval(p5,xc);
plot(xc,y5, 'g.-','linewidth',3)
grid
legend('original data','5th order fit')

Output:

27
190280117004

Conclusion:

In this experiment we were able to implement polynomial


regression using MATLAB.

PRACTICAL – 5

Aim: Implementation of logistic regression.

Theory:

Logistic regression is a supervised learning classification algorithm used to


predict the probability of a target variable. The nature of target or dependent
variable has two possible classes.

• In simple words, the dependent variable is binary in nature having data coded
as either 1 or 0.

28
190280117004

• Mathematically, a logistic regression model predicts P(Y=1) as a function of


X. It is one of the simplest ML algorithms that can be used for various
classification problems such as spam detection, Diabetes prediction, cancer
detection etc.

Curve of Logistic Regression

Type of Logistic Regression:

On the basis of the categories,

Logistic Regression can be classified into three types:

Binomial: In binomial Logistic regression, there can be only two possible types
of the dependent variables, such as 0 or 1, Pass or Fail, etc.

Multinomial: In multinomial Logistic regression, there can be 3 or more


possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"

Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered


types of dependent variables, such as "low", "Medium", or "High".

Logistic Function (Sigmoid Function):

29
190280117004

The sigmoid function is a mathematical function used to map the predicted


values to probabilities.

 It maps any real value into another value within a range of 0 and 1.
 The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the "S" form. The S-
form curve is called the Sigmoid function or the logistic function.
 In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold values tends to 0.

Regression Models:

• Binary Logistic Regression Model − The simplest form of logistic regression


is binary or binomial logistic regression in which the target or dependent
variable can have only 2 possible types, either 1 or 0.

• Multinomial Logistic Regression Model − Another useful form of logistic


regression is multinomial logistic regression in which the target or dependent
variable can have 3 or more possible unordered types i.e. the types having no
quantitative significance.

Logistic Regression Equation:

The Logistic regression equation can be obtained from the Linear Regression
equation. The mathematical steps to get Logistic Regression equations are given
below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's


divide the above equation by (1-y):

30
190280117004

o But we need range between -[infinity] to +[infinity], then take logarithm


of the equation it will become:

The above equation is the final equation for Logistic Regression.

Binary Logistic Regression Model of ML

• The simplest form of logistic regression is binary or binomial logistic


regression in which the target or dependent variable can have only 2 possible
types either 1 or 0. It allows us to model a relationship between multiple
predictor variables and a binary/binomial target variable. In case of logistic
regression, the linear function is basically used as an input to another function
such as 𝑔 in the following relation:

ℎ∅(𝑥) = 𝑔(∅𝑇𝑥)𝑤ℎ𝑒𝑟𝑒0 ≤ ℎ∅ ≤ 1ℎ∅(𝑥) = 𝑔(∅𝑇𝑥) 𝑤ℎ𝑒𝑟𝑒 0 ≤ ℎ∅ ≤ 1

• Here, 𝑔 is the logistic or sigmoid function which can be given as follows:

𝑔(𝑧) = 11 + 𝑒 − 𝑧𝑤ℎ𝑒𝑟𝑒𝑧 = ∅𝑇𝑥𝑔(𝑧) = 11 + 𝑒 – 𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧 = ∅𝑇𝑥

• To sigmoid curve can be represented with the help of following graph. We can
see the values of y-axis lie between 0 and 1 and crosses the axis at 0.5.

The classes can be divided into positive or negative. The output comes under
the probability of positive class if it lies between 0 and 1. For our
implementation, we are interpreting the output of hypothesis function as
positive if it is ≥0.5, otherwise negative.

• We also need to define a loss function to measure how well the algorithm
performs using the weights on functions, represented by theta as follows:

ℎ = 𝑔(𝑋∅)ℎ = 𝑔(𝑋∅)

𝐽(∅) = 1𝑚. (−𝑦𝑇𝑙𝑜𝑔(ℎ) − (1 − 𝑦)𝑇𝑙𝑜𝑔(1 − ℎ)𝐽(∅)

= 1𝑚. (−𝑦𝑇𝑙𝑜𝑔(ℎ) − (1 − 𝑦)𝑇𝑙𝑜𝑔(1 − ℎ)

31
190280117004

• Now, after defining the loss function our prime goal is to minimize the loss
function. It can be done with the help of fitting the weights which means by
increasing or decreasing the weights. With the help of derivatives of the loss
function w.r.t each weight, we would be able to know what parameters should
have high weight and what should have smaller weight.

• The following gradient descent equation tells us how loss would change if we
modified the parameters −

𝜕𝐽(∅)𝜕∅𝑗 = 1𝑚𝑋𝑇(𝑔(𝑋∅) − 𝑦)𝜕𝐽(∅)𝜕∅𝑗 = 1𝑚𝑋𝑇(𝑔(𝑋∅) − 𝑦)

MATLAB Code/Program:

x = rand (100, 1);


y = x > 0.5;
y (1:60) = x (1:60) > 0.3; %to avoid perfect seperation
% fit model
mdl = fitglm (x, y, "distribution", "binomial");
xnew =linspace (0, 1, 1000)'; %test data
ynew = predict (mdl, xnew);
scatter (x,y);
hold on
plot(xnew,ynew)

Output:

32
190280117004

Conclusion:

In this experiment we were able to implement logistic regression


using MATLAB.

PRACTICAL – 6

AIM: Implementation of Artificial Neural Network using Back


propagation.

Theory:

Artificial Neural Networks

• A neural network is a group of connected I/O units where each connection has
a weight associated with its computer programs. It helps you to build predictive
models from large databases. This model builds upon the human nervous
33
190280117004

system. It helps you to conduct image understanding, human learning, computer


speech, etc.

Back-propagation

• Back-propagation is the essence of neural net training. It is the method of fine-


tuning the weights of a neural net based on the error rate obtained in the
previous epoch (i.e. iteration). Proper tuning of the weights allows you to
reduce error rates and to make the model reliable by increasing its
generalization.

• Back-propagation is a short form for "backward propagation of errors." It is a


standard method of training artificial neural networks. This method helps to
calculate the gradient of a loss function with respects to all the weights in the
network.

Working of Back-propagation: Simple Algorithm

Consider the following diagram:

1. Inputs X, arrive through the reconnected path

2. Input is modeled using real weights W. The weights are usually randomly
selected.

3. Calculate the output for every neuron from the input layer, to the hidden
layers, to the output layer.

4. Calculate the error in the outputs

Error = Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the weights
such that the error is decreased.

6. Keep repeating the process until the desired output is achieved

34
190280117004

Need of Back-propagation (Advantages)

Most prominent advantages of Back-propagation are:

• Back-propagation is fast, simple and easy to program.

• It has no parameters to tune apart from the numbers of input.

• It is a flexible method as it does not require prior knowledge about the


network.

• It is a standard method that generally works well.

• It does not need any special mention of the features of the function to be
learned.

Disadvantages of using Back-propagation

• The actual performance of back-propagation on a specific problem is


dependent on the input data.

• Back-propagation can be quite sensitive to noisy data

• You need to use the matrix-based approach for back-propagation instead of


mini-batch.

Types of Back-propagation Networks

Two Types of Back-propagation Networks are:

• Static Back-propagation

• Recurrent Back-propagation

Static back-propagation:

• It is one kind of back-propagation network which produces a mapping of a


static input for static output. It is useful to solve static classification issues like
optical character recognition.
35
190280117004

Recurrent Back-propagation:

• Recurrent back-propagation is fed forward until a fixed value is achieved.


After that, the error is computed and propagated backward.

The main difference between both of these methods is: that the mapping is rapid
in static back propagation while it is non-static in recurrent back-propagation.

MATLAB Code:
input = [0 0; 0 1; 1 0; 1 1];
output = [0;1;1;0];
bias = [-1 -1 -1];
coeff = 1;
iterations = 9;
weights = [0 0 0; 1 3 4; 9 5 2];
for i = 1:iterations
out = zeros(4,1);
numln = length(input(:,1));
for j = 1:numln
H1 =
bias(1,1)*weights(1,1)+input(j,1)*weights(1,2)+input(j,2)*weights(1,3);
x2(1) = sigma(H1);
H2 =
bias(1,2)*weights(2,1)+input(j,1)*weights(2,2)+input(j,2)*weights(2,3);
x2(2)= sigma(H2);
x3_1 = bias(1,3)*weights(3,1)+x2(1)*weights(3,2)+x2(2)*weights(3,3);
out(j)=sigma(x3_1);
delta3_1=out(j)*(1-out(j))*(output(j)-out(j));
delta2_1 = x2(1)*(1-x2(1))*weights(3,2)*delta3_1;
delta2_2 = x2(2)*(1-x2(2))*weights(3,3)*delta3_1;

for k = 1:3
if k == 1
weights(1:k) = weights(1,k)+coeff*bias(1,1)*delta2_1;
weights(2:k) = weights(2,k)+coeff*bias(1,2)*delta2_2;
weights(3:k) = weights(3,k)+coeff*bias(1,3)*delta3_1;

else
weights(1:k) = weights(1,k)+coeff*bias(j,1)*delta2_1;
weights(2:k) = weights(2,k)+coeff*bias(j,2)*delta2_2;
weights(3:k) = weights(3,k)+coeff*x2(k-1)*delta3_1;

end
end
end
end
disp(out)

36
190280117004

function y = sigma(x)
y = 1./(1+exp(x))
end

Output:

aiml6

y =

0.5000

y =

0.7311

y =

0.9936

y =

0.5020

y =

0.5006

y =

0.1801

Conclusion:
In this experiment we were able to implement Artificial Neural
Network by implying back regression using MATLAB.

PRACTICAL – 7

AIM: Implementation of Support Vector Machines using simple features.

Theory:

37
190280117004

Support vector machines (SVMs) are powerful yet flexible supervised machine
learning algorithms which are used both for classification and regression. But
generally, they are used in classification problems.

An SVM model is basically a representation of different classes in a plane in


multidimensional space. The plane will be generated in an iterative manner by
SVM so that the error can be minimized. The goal of SVM is to divide the
datasets into classes to find a maximum marginal plane.

The followings are important concepts in SVM −


 Support Vectors − Datapoints that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these
data points.
 Plane − As we can see in the above diagram, it is a decision plane or
space which is divided between a set of objects having different classes.
 Margin − It may be defined as the gap between two lines on the closet
data points of different classes. It can be calculated as the perpendicular
distance from the line to the support vectors. Large margin is considered
as a good margin and small margin is considered as a bad margin.

38
190280117004

The main goal of SVM is to divide the datasets into classes to find a maximum
marginal plane and it can be done in the following two steps –

1. First, SVM will generate planes iteratively that segregates the classes in
best way.
2. Then, it will choose the hyperplane that separates the classes correctly.

MATLAB Code/Program:

rng(1); % For reproducibility


r = sqrt(rand(100,1)); % Radius
t = 2*pi*rand(100,1); % Angle
data1 = [r.*cos(t), r.*sin(t)]; % Points
r2 = sqrt(3*rand(100,1)+1); % Radius
t2 = 2*pi*rand(100,1); % Angle
data2 = [r2.*cos(t2), r2.*sin(t2)]; % points
figure;
plot(data1(:,1),data1(:,2),'g.','MarkerSize',15)
hold on
plot(data2(:,1),data2(:,2),'b.','MarkerSize',15)
ezpolar(@(x)1);ezpolar(@(x)2);
axis equal
hold off

%Train the SVM Classifier


cl = fitcsvm(data3,theclass,'KernelFunction','rbf',...
'BoxConstraint',Inf,'ClassNames',[-1,1]);

% Predict scores over the grid


d = 0.02;
[x1Grid,x2Grid] = meshgrid(min(data3(:,1)):d:max(data3(:,1)),...
min(data3(:,2)):d:max(data3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)];
[~,scores] = predict(cl,xGrid);

% Plot the data and the decision boundary


figure;
h(1:2) = gscatter(data3(:,1),data3(:,2),theclass,'rb','.');
hold on
ezpolar(@(x)1);
h(3) =
plot(data3(cl.IsSupportVector,1),data3(cl.IsSupportVector,2),'ko');
contour(x1Grid,x2Grid,reshape(scores(:,2),size(x1Grid)),[0 0],'k');
legend(h,{'-1','+1','Support Vectors'});
axis equal

39
190280117004

hold off

Output:

Conclusion:
In this experiment we were able to implement support vector
machine for simple feature using MATLAB.

PRACTICAL – 8

Aim: Implementing K -means clustering algorithm.

Theory:

40
190280117004

K-Means Clustering is an unsupervised learning algorithm that is used to solve


the clustering problems in machine learning or data science.

In this algorithm, the unlabeled dataset is classified into different clusters. Here
K defines the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will be three
clusters, and so on.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative


process.
o Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away
from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

41
190280117004

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input
dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4.

Step-7: The model is ready.

42
190280117004

Let's understand the above steps by considering the visual plots:

 Suppose we have two variables M1 and M2. The x-y axis scatter plot of
these two variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify the dataset and to
put them into different clusters. It means here we will try to group
these datasets into two different clusters.
o We need to choose some random k points or centroid to form the
cluster. These points can be either the points from the dataset or any
other point. So, here we are selecting the below two points as k points,
which are not the part of our dataset. Consider the below image:

43
190280117004

 Now we will assign each data point of the scatter plot to its closest K-
point or centroid. We will compute it by applying some mathematics that
we have studied to calculate the distance between two points. So, we will
draw a median between both the centroids. Consider the below image:

 From the above image, it is clear that points left side of the line is near to
the K1 or blue centroid, and points to the right of the line are close to the
yellow centroid. Let's color them as blue and yellow for clear
visualization.

44
190280117004

 As we need to find the closest cluster, so we will repeat the process by


choosing a new centroid. To choose the new centroids, we will compute
the center of gravity of these centroids, and will find new centroids as
below:

 Next, we will reassign each datapoint to the new centroid. For this, we
will repeat the same process of finding a median line. The median will be
like below image:

45
190280117004

 From the above image, we can see, one yellow point is on the left side of
the line, and two blue points are right to the line. So, these three points
will be assigned to new centroids.

 As reassignment has taken place, so we will again go to the step-4, which


is finding new centroids or K-points.
 We will repeat the process by finding the center of gravity of centroids,
so the new centroids will be as shown in the below image:

46
190280117004

 As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:

 We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:

47
190280117004

 As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:

 The performance of the K-means clustering algorithm depends upon


highly efficient clusters that it forms. But choosing the optimal number of
clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate
method to find the number of clusters or value of K. The method is given
below:

Elbow Method

 The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster. The formula to calculate the value of
WCSS (for 3 clusters) is given below:

WCSS=∑ Pi∈Cluster 1 distance (Pi C 1)2+∑ Pi∈Cluster 2 distance(P iC 2) 2+∑ Pi∈CLuster 3 distance(P iC

In the above formula of WCSS,

 ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances


between each data point and its centroid within a cluster1 and the same
for the other two terms.

 To measure the distance between data points and centroid, we can use
any method such as Euclidean distance or Manhattan distance.

48
190280117004

 To find the optimal value of clusters, the elbow method follows the below
steps:

o It executes the K-means clustering on a given dataset for different K


values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of
clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then
that point is considered as the best value of K.

 Since the graph shows the sharp bend, which looks like an elbow, hence
it is known as the elbow method. The graph for the elbow method looks
like the below image:

Advantages of K-Means Clustering


 There’s a reason why top professionals prefer the K-Means clustering
algorithm. Some benefits it offers:
o It is a fast, robust, and easier to understand the algorithm.
o The end-efficiency is relatively high
o Offers phenomenal results when data sets are different from each
other. For higher variables values, K-Means works comparatively
quicker
o The clusters produced with K-Means are relatively tighter than
other clustering methods.

49
190280117004

K-Means Algorithm Using MATLAB:


 K-Means is a largely used algorithm used by many professionals dealing
with data science, machine learning, artificial intelligence, cryptography,
and cyber security.
 The core objective of using this algorithm is to find out the centroid of
each cluster. The data given to a programmer is heterogeneous. Here is
the MATLAB code for plotting the centroid of each cluster and assign the
coordinates of each centroid:

MATLAB Code:
rng default; % For reproducibility
X = [randn(100,2)*0.75+ones(100,2);
randn(100,2)*0.5-ones(100,2)];
opts=statset('Display','final');
[idx,C]=kmeans(X,4,'Distance','cityblock','Replicates','5','Options',opts);
plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12);
hold on;
plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12);
plot(X(idx==3,1),X(idx==3,2),'g.','MarkerSize',12);
plot(X(idx==4,1),X(idx==4,2),'y.','MarkerSize',12);
plot(C(:,1),C(2), 'Kx','MarkerSize','15','LineWidth',3);
legend('Cluster 1','Cluster 2','Cluster 3','Cluster
4','Centroids','Location','NW');
title('Cluster Assignments and centroids');
hold off;
for i=1:size(C, 1)
display(['Centroid, num2str(i), : X1 = ', num2str(C(i, 1)), '; X2 = ',
num2str(C(1, 2))]);
end

Output:

50
190280117004

Results:
 The centroids obtained are as follows:
o The value of X1 & X2 for Centroid 1: 1.3661; 1.7232
o The value of X1 & X2 for Centroid 2: -1.015; -1.053
o The value of X1 & X2 for Centroid 3: 1.6565; 0.36376The value of
X1 & X2 for Centroid 4: 0.35134; 0.85358

Conclusion:
In this experiment we were able to implement K-mean clustering
algorithm using MATLAB.

51
190280117004

PRACTICAL – 9

Aim: Introduction to Python Programming for Machine Learning.

Problem Statement: Run the commands using Anaconda- Jupyter notebook.

Theory:

Important Libraries

1. NumPy : Numerical Python

 It is useful component that makes Python as one of the favourite


languages for Data Science.
 It basically stands for Numerical Python and consists of
multidimensional array objects.
 By using NumPy, we can perform the following important operations −
 Mathematical and logical operations on arrays.
 Fourier transformation
 Operations associated with linear algebra.
We can also see NumPy as the replacement of MATLAB because NumPy is
mostly used along with Scipy (Scientific Python) and Mat-plotlib (plotting
library).

2. Pandas

Pandas is an open-source Python Library used for high-performance data


manipulation and data analysis using its powerful data structures. 

With the help of Pandas, in data processing we can accomplish the following
five steps −

 Load
 Prepare
 Manipulate

52
190280117004

 Model
 Analyze

Key Features of Pandas

 Fast and efficient DataFrame object with default and customized


indexing.
 Tools for loading data into in-memory data objects from different file
formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and sub setting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.

Pandas deals with the following data structures −

 Series
 DataFrame

3. Scipy: Scientific Python

The SciPy library of Python is built to work with NumPy arrays and provides
many user-friendly and efficient numerical practices such as routines for
numerical integration and optimization. Together, they run on all popular
operating systems, are quick to install and are free of charge. NumPy and SciPy
are easy to use, but powerful enough to depend on by some of the world's
leading scientists and engineers.

4. Scikit-learn

The following are some features of Scikit-learn that makes it so useful −


 It is built on NumPy, SciPy, and Matplotlib.
 It is an open source

53
190280117004

 Wide range of machine learning algorithms covering major areas of


ML like classification, clustering, regression, dimensionality reduction,
model selection etc. can be implemented with the help of it.

5. Matplotlib
 Matplotlib is a python library used to create 2D graphs and plots by
using python scripts.
 It has a module named pyplot which makes things easy for plotting by
providing feature to control line styles, font properties, formatting axes
etc.
 It supports a very wide variety of graphs and plots namely - histogram,
bar charts, power spectra, error charts etc.
 It is used along with NumPy to provide an environment that is an
effective open source alternative for MatLab. It can also be used with
graphics toolkits like PyQt and wxPython.

Python Code:

54
190280117004

Output :

Conclusion:

55
190280117004

PRACTICAL – 10

Aim: Write a program for the concept of decision tree to develop a piecewise
linear model and test it as well.

Problem Statement: Generate a synthetic data set using following function, and
split it into training, validation, and testing sample points. Write a program for
the concept of decision tree to develop a piecewise linear model and test it as
well. 
x
  y= +sin ( x ) +ℵ
2

Steps:
1. Import libraries
2. Prepare data
3. Split the data into training, validation and test sets
4. Fit model 
5. Evaluate the model.

Python Program:
1. Import libraries

import numpy as np
from sklearn import linear_model, datasets, tree
import matplotlib.pyplot as plt %matplotlib inline

2. Prepare data:

number_of_samples = 100 
x = np.linspace(-np.pi, np.pi, number_of_samples) 
y = 0.5*x+np.sin(x)+np.random.random(x.shape) 
plt.scatter(x,y,color='black') #Plot y-vs-x in dots 
plt.xlabel('x-input feature') 
plt.ylabel('y-target values') 
plt.title('Fig 5: Data for linear regression') 
plt.show() 

56
190280117004

3. Split the data into training, validation and test sets


 
random_indices = np.random.permutation(number_of_samples) 
#Training set 
x_train = x[random_indices[:70]] 
y_train = y[random_indices[:70]] 
#Validation set 
x_val = x[random_indices[70:85]] 
y_val = y[random_indices[70:85]] 
#Test set 
x_test = x[random_indices[85:]] 
y_test = y[random_indices[85:]] 

4. Fit a line to the data 


 
maximum_depth_of_tree = np.arange(10)+1 
train_err_arr = [] 
val_err_arr = [] 
test_err_arr = [] 
 
for depth in maximum_depth_of_tree: 
     
    model = tree.DecisionTreeRegressor(max_depth=depth) 
    #sklearn takes the inputs as matrices. Hence, we reshape the arrays into column matrices 
    x_train_for_line_fitting = np.matrix(x_train.reshape(len(x_train),1)) 
    y_train_for_line_fitting = np.matrix(y_train.reshape(len(y_train),1)) 
 
    #Fit the line to the training data 
    model.fit(x_train_for_line_fitting, y_train_for_line_fitting) 
 
    #Plot the line 
    plt.figure() 
    plt.scatter(x_train, y_train, color='black') 
    plt.plot(x.reshape((len(x),1)),model.predict(x.reshape((len(x),1))),color='blue') 
    plt.xlabel('x-input feature') 
    plt.ylabel('y-target values') 
    plt.title('Line fit to training data with max_depth='+str(depth)) 
    plt.show() 
5. Evaluate the model.

    mean_train_error = np.mean( (y_train - model.predict(x_train.reshape(len(x_train),1)))**2 ) 

57
190280117004

    mean_val_error = np.mean( (y_val - model.predict(x_val.reshape(len(x_val),1)))**2 ) 
    mean_test_error = np.mean( (y_test - model.predict(x_test.reshape(len(x_test),1)))**2 ) 
     
    train_err_arr.append(mean_train_error) 
    val_err_arr.append(mean_val_error) 
    test_err_arr.append(mean_test_error) 
 
    print ('Training MSE: ', mean_train_error, '\nValidation MSE: ', mean_val_error, '\nTest MSE:
', mean_test_error) 
     
plt.figure() 
plt.plot(train_err_arr,c='red') 
plt.plot(val_err_arr,c='blue') 
plt.plot(test_err_arr,c='green') 
plt.legend(['Training error', 'Validation error', 'Test error']) 
plt.title('Variation of error with maximum depth of tree') 
plt.show() 

Output :

Conclusion :

58
190280117004

PRACTICAL – 11

Aim: Write a program for KNN algorithm for classification of IRIS dataset

Problem Statement: Write a program for kNN algorithm for classification of


IRIS dataset. 

Python Program:
1. Import Libraries

from __future__ import print_function 
 
import numpy as np 
from sklearn import datasets, neighbors, linear_model, tree 
from sklearn.decomposition import PCA 
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.datasets import load_iris, fetch_olivetti_faces 
from sklearn.model_selection import train_test_split 
from sklearn.decomposition import PCA as RandomizedPCA 
from sklearn.metrics import classification_report 
from sklearn.metrics import confusion_matrix 
import matplotlib.pyplot as plt 
from time import time 
%matplotlib inline 

2. Prepare dataset 

First, we will prepare the dataset. The dataset we choose is a modified version
of the Iris dataset. We choose only the first two input feature dimensions
viz sepal-length and sepal-width (both in cm) for ease of visualization. 
iris = load_iris() 
X = iris.data[:,:2] #Choosing only the first two input-features 
Y = iris.target 
 
number_of_samples = len(Y) 
 
print(number_of_samples) 
 
#Splitting into training and test sets 
random_indices = np.random.permutation(number_of_samples) 
#Training set 
num_training_samples = int(number_of_samples*0.75) 
x_train = X[random_indices[:num_training_samples]] 
y_train = Y[random_indices[:num_training_samples]] 
 
#Test set 

59
190280117004

x_test = X[random_indices[num_training_samples:]] 
y_test = Y[random_indices[num_training_samples:]] 
 
#Visualizing the training data 
X_class0 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_
train[i]==0]) #Picking only the first two classes 
Y_class0 = np.zeros((X_class0.shape[0]),dtype=np.int) 
X_class1 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_
train[i]==1]) 
Y_class1 = np.ones((X_class1.shape[0]),dtype=np.int) 
X_class2 = np.asmatrix([x_train[i] for i in range(len(x_train)) if y_
train[i]==2]) 
Y_class2 = np.full((X_class2.shape[0]),fill_value=2,dtype=np.int) 
 
plt.scatter([X_class0[:,0]],[ X_class0[:,1]],color='red') 
plt.scatter([X_class1[:,0]],[ X_class1[:,1]],color='blue') 
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green') 
plt.xlabel('sepal length') 
plt.ylabel('sepal width') 
plt.legend(['class 0','class 1','class 2']) 
plt.title('Fig 1: Visualization of training data') 
plt.show() 

Note that the first class is linearly separable from the other two classes but the
second and third classes are not linearly separable from each other. 

3. K-nearest neighbour classifier algorithm 

Now that our training data is ready, we will jump right into the classification
task. Just to remind you, the K-nearest neighbor is a non-parametric learning
algorithm and does not learn an parameterized function that maps the input
to the output. Rather it looks up the training set every time it is asked to
classify a point and finds out the K nearest neighbors of the query point. The
class corresponding to majority of the points is output as the class of the
query point. 

model = neighbors.KNeighborsClassifier(n_neighbors = 10) # K = 10 
model.fit(x_train, y_train) 

4. Visualize the working of the algorithm 

Let's see how the algorithm works. We choose the first point in the test set as
our query point. 
60
190280117004

query_point = np.array([5.9,2.9]) 
true_class_of_query_point = 1 
predicted_class_for_query_point = model.predict([query_point]) 
print("Query point: {}".format(query_point)) 
print("True class of query point: {}".format(true_class_of_query_point)) 
query_point.shape 

Let's visualize the point and its K=5 nearest neighbors. 

neighbors_object = neighbors.NearestNeighbors(n_neighbors=10) 
neighbors_object.fit(x_train) 
distances_of_nearest_neighbors, indices_of_nearest_neighbors_of_query_point 
= neighbors_object.kneighbors([query_point]) 
nearest_neighbors_of_query_point = x_train[indices_of_nearest_neighbors_of_
query_point[0]] 
print("The query point is: {}\n".format(query_point)) 
print("The nearest neighbors of the query point are:\n {}\
n".format(nearest_neighbors_of_query_point)) 
print("The classes of the nearest neighbors are: {}\
n".format(y_train[indices_of_nearest_neighbors_of_query_point[0]])) 
print("Predicted class for query point:
{}".format(predicted_class_for_query_point[0])) 
 
plt.scatter([X_class0[:,0]], [X_class0[:,1]],color='red') 
plt.scatter([X_class1[:,0]], [X_class1[:,1]],color='blue') 
plt.scatter([X_class2[:,0]], [X_class2[:,1]],color='green') 
plt.scatter(query_point[0], query_point[1],marker='^',s=75,color='black') 
plt.scatter(nearest_neighbors_of_query_point[:,0], nearest_neighbors_of_que
ry_point[:,1],marker='s',s=150,color='yellow',alpha=0.30) 
plt.xlabel('sepal length') 
plt.ylabel('sepal width') 
plt.legend(['class 0','class 1','class 2']) 
plt.title('Fig 3: Working of the K-NN classification algorithm') 
plt.show() 

def evaluate_performance(model, x_test, y_test): 
    test_set_predictions = [model.predict(x_test[i].reshape((1,len(x_test[i
]))))[0] for i in range(x_test.shape[0])] 
    test_misclassification_percentage = 0 
    for i in range(len(test_set_predictions)): 
        if test_set_predictions[i]!=y_test[i]: 
            test_misclassification_percentage+=1 
    test_misclassification_percentage *= 100/len(y_test) 
    return test_misclassification_percentage 

5. Evaluate the performances on the validation and test sets 

print("Evaluating K-NN classifier:") 


test_err = evaluate_performance(model, x_test, y_test) 
print('test misclassification percentage = {}%'.format(test_err)) 

61
190280117004

Output:

Conclusion:

62
190280117004

Practical – 12

Name: Deepankar Patnaik


Enrollment: 190280117057

Aim: Write a program using Bayes algorithm for email classification (spam or
non-spam) for the open-sourced data set from the UC Irvine Machine Learning
Repository.

Problem Statement:
Write a program using Bayes algorithm for email classification (spam or non-
spam) for the open sourced data set from the UC Irvine Machine Learning
Repository

Python Program:
import numpy as np
from sklearn.model_selection import train_test_split

datafile = open('C:/Users/AntennaPC/Desktop/spambase.data','r')

# Download spambase.data from the MSTeam of this course, Save it and


give file path from your pc

data = []
for line in datafile:
line = [float(element) for element in line.rstrip('\n').split(',')]
data.append(np.asarray(line))

num_features = 48
X = [data[i][:num_features] for i in range(len(data))]
y = [int(data[i][-1]) for i in range(len(data))]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,


random_state=42)

#Making likelihood estimations

#Find the two classes

X_train_class_0 = [X_train[i] for i in range(len(X_train)) if


y_train[i]==0]

63
190280117004

X_train_class_1 = [X_train[i] for i in range(len(X_train)) if


y_train[i]==1]

#Find the class specific likelihoods of each feature

likelihoods_class_0 = np.mean(X_train_class_0, axis=0)/100.0


likelihoods_class_1 = np.mean(X_train_class_1, axis=0)/100.0

#Calculate the class priors

num_class_0 = float(len(X_train_class_0))
num_class_1 = float(len(X_train_class_1))

prior_probability_class_0 = num_class_0 / (num_class_0 + num_class_1)


prior_probability_class_1 = num_class_1 / (num_class_0 + num_class_1)

log_prior_class_0 = np.log10(prior_probability_class_0)
log_prior_class_1 = np.log10(prior_probability_class_1)

def calculate_log_likelihoods_with_naive_bayes(feature_vector, Class):


assert len(feature_vector) == num_features
log_likelihood = 0.0 #using log-likelihood to avoid underflow
if Class==0:
for feature_index in range(len(feature_vector)):
if feature_vector[feature_index] == 1: #feature present
log_likelihood +=
np.log10(likelihoods_class_0[feature_index])
elif feature_vector[feature_index] == 0: #feature absent
log_likelihood += np.log10(1.0 -
likelihoods_class_0[feature_index])
elif Class==1:
for feature_index in range(len(feature_vector)):
if feature_vector[feature_index] == 1: #feature present
log_likelihood +=
np.log10(likelihoods_class_1[feature_index])
elif feature_vector[feature_index] == 0: #feature absent
log_likelihood += np.log10(1.0 -
likelihoods_class_1[feature_index])
else:
raise ValueError("Class takes integer values 0 or 1")

return log_likelihood

def calculate_class_posteriors(feature_vector):
log_likelihood_class_0 =
calculate_log_likelihoods_with_naive_bayes(feature_vector, Class=0)
log_likelihood_class_1 =
calculate_log_likelihoods_with_naive_bayes(feature_vector, Class=1)

log_posterior_class_0 = log_likelihood_class_0 + log_prior_class_0


log_posterior_class_1 = log_likelihood_class_1 + log_prior_class_1

64
190280117004

return log_posterior_class_0, log_posterior_class_1

def classify_spam(document_vector):
feature_vector = [int(element>0.0) for element in document_vector]
log_posterior_class_0, log_posterior_class_1 =
calculate_class_posteriors(feature_vector)
if log_posterior_class_0 > log_posterior_class_1:
return 0
else:
return 1

#Predict spam or not on the test set

predictions = []
for email in X_test:
predictions.append(classify_spam(email))

def evaluate_performance(predictions, ground_truth_labels):


correct_count = 0.0
for item_index in range(len(predictions)):
if predictions[item_index] == ground_truth_labels[item_index]:
correct_count += 1.0
accuracy = correct_count/len(predictions)
return accuracy
accuracy_of_naive_bayes = evaluate_performance(predictions, y_test)
print(accuracy_of_naive_bayes)

for i in range(100):
print predictions[i], y_test[i]

Output :

Conclusion :

65
190280117004

Practical – 13

Aim: Write a program using SVM on IRIS dataset and carry out classification.

Problem Statement: Write a program using SVM on IRIS dataset and carry out
classification.

Program:
1. Import Libraries

from __future__ import division, print_function


import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
# from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

2. Prepare dataset 
iris = datasets.load_iris()
X = iris.data[:,:2]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.25, random_state=42)

3. Use Support Vector Machine with different kinds of kernels and


evaluate performance
def evaluate_on_test_data(model=None):
predictions = model.predict(X_test)
correct_classifications = 0
for i in range(len(y_test)):
if predictions[i] == y_test[i]:
correct_classifications += 1
accuracy = 100*correct_classifications/len(y_test) #Accuracy as
a percentage
return accuracy

kernels = ('linear','poly','rbf')
accuracies = []
for index, kernel in enumerate(kernels):
model = svm.SVC(kernel=kernel)
model.fit(X_train, y_train)
acc = evaluate_on_test_data(model)
accuracies.append(acc)

66
190280117004

print("{} % accuracy obtained with kernel = {}".format(acc,


kernel))

4. Visualize the Visualize the decision boundaries


#Train SVMs with different kernels
svc = svm.SVC(kernel='linear').fit(X_train, y_train)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7).fit(X_train, y_train)
poly_svc = svm.SVC(kernel='poly', degree=3).fit(X_train, y_train)

#Create a mesh to plot in


h = .02 # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))

#Define title for the plots


titles = ['SVC with linear kernel',
'SVC with RBF kernel',
'SVC with polynomial (degree 3) kernel']

for i, clf in enumerate((svc, rbf_svc, poly_svc)):


# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
plt.figure(i)

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot


Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)

# Plot also the training points


plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.ocean)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title(titles[i])

plt.show()

5. Check the support vectors

#Checking the support vectors of the polynomial kernel (for example)

print("The support vectors are:\n", poly_svc.support_vectors_)

Evaluate the performances on the validation and test sets 

print("Evaluating K-NN classifier:") 

67
190280117004

test_err = evaluate_performance(model, x_test, y_test) 
print('test misclassification percentage = {}%'.format(test_err)) 

Output :

Conclusion :

68
190280117004

Practical – 14

Name: Deepankar Patnaik


Enrollment: 190280117057

Aim: Write a program using SVM algorithm for Boston house price prediction
dataset to predict price of houses from certain features.

Problem Statement: Write a program using SVM algorithm for Boston house
price prediction dataset to predict price of houses from certain features.

Program:
1. Import Libraries

from __future__ import division, print_function


import numpy as np
from sklearn import datasets, svm
from sklearn.model_selection import train_test_split
# from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

2. Load data from the Boston dataset


boston = datasets.load_boston()
X = boston.data
y = boston.target

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.25, random_state=42)

3. Use Support Vector Machine with different kinds of kernels and


evaluate performance.
def evaluate_on_test_data(model=None):
predictions = model.predict(X_test)
sum_of_squared_error = 0
for i in range(len(y_test)):
err = (predictions[i]-y_test[i]) **2
sum_of_squared_error += err
mean_squared_error = sum_of_squared_error/len(y_test)
RMSE = np.sqrt(mean_squared_error)
return RMSE

69
190280117004

kernels = ('linear','rbf')
RMSE_vec = []
for index, kernel in enumerate(kernels):
model = svm.SVR(kernel=kernel)
model.fit(X_train, y_train)
RMSE = evaluate_on_test_data(model)
RMSE_vec.append(RMSE)
print("RMSE={} obtained with kernel = {}".format(RMSE, kernel))

Output :

Conclusion:

70

You might also like