24CSPC212-PIC Lab Manual
24CSPC212-PIC Lab Manual
YEAR/ SEMESTER: II / IV
PAGE
EX.NO DATE LIST OF EXPERIMENTS MARKS SIGN
NO
CYCLE-I
1. For a given set of training data examples stored
in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a
description of the set of all hypotheses
consistent with the training examples
2. Write a program to demonstrate the working of
the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision
tree and apply this knowledge to classify a new
sample.
3. Build an Artificial Neural Network by
implementing the Backpropagation algorithm
and test the same using appropriate data sets.
To implement and demonstrate the Candidate-Elimination algorithm, for a given set of training data
examples stored in a .CSV file, to output a description of the set of all hypotheses consistent with the
training examples.
ALGORITHM:
import numpy as
np import pandas
as pd
# Loading Data from a CSV File
data = pd.DataFrame(data=pd.read_csv('E:\BALA\AI\Lab programs\pgms\dataset.csv'))
print(data)
def learn(concepts,
target): '''
learn() function implements the learning method of the Candidate elimination
algorithm.
Arguments:
concepts - a data frame with all the features
target - a data frame with corresponding output values'''
# Initialise S0 with the first instance from concepts
# .copy() makes sure a new list is created instead of just pointing to the same memory
location specific_h = concepts[0].copy()
print("\nInitialization of specific_h and
general_h") print(specific_h)
#h=["#" for i in range(0,5)]
#print(h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
# The learning iterations
for i, h in enumerate(concepts):
# Checking if the hypothesis has a positive
target if target[i] == "Yes":
for x in range(len(specific_h)):
# Change values in S & G only if values
change if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
# Checking if the hypothesis has a positive target
if target[i] == "No":
for x in range(len(specific_h)):
# For negative hyposthesis change values only in
G if h[x] != specific_h[x]:
general_h[x][x] =
specific_h[x] else:
general_h[x][x] = '?'
print("\nSteps of Candidate Elimination
Algorithm",i+1) print(specific_h)
print(general_h)
# find indices where we have empty rows, meaning those that are
unchanged indices = [i for i, val in enumerate(general_h) if val == ['?', '?',
'?', '?', '?', '?']] for i in indices:
# remove those rows from general_h
general_h.remove(['?', '?', '?', '?',
'?''?'])
# Return final values
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("\nFinal Specific_h:", s_final, sep="\
n") print("\nFinal General_h:", g_final,
sep="\n")
OUTPUT:
RESULT:
Thus the Candidate-Elimination algorithm, to test all the hypotheses with the training sets
using python was executed and verified successfully.
1. What is Machine Learning?
AIM:
To build Decision tree in ID3 algorithm to classify a new sample using python.
ALGORITHM:
Step:1 Observe the dataset. Import the necessary basic python libraries.
Step:2 Read the dataset.
Step:3 Calculate the Entropy of the whole dataset.
Step:4 Calculate the Entropy of the filtered dataset.
Step:5 Calculate the Information gain for the feature(outlook).
Step:6 Finding the most informative feature (feature with highest information gain).
Step:7 Adding a node to the tree.
Step:8 Perform ID3 algorithm and generate a tree.
Step:9 Finding unique classes of the label.
Step: 10 predicting from the tree.
Step:11 Evaluating the test dataset.
Step:12 Checking the test dataset.
PROGRAM:
import numpy as np
import math
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata,
traindata)
class Node:
def init (self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
for y in
range(data.shape[0]): if
data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|
S32") pos = 0
for y in
range(data.shape[0]): if
data[y, col] == items[x]:
dict[items[x]][pos] =
data[y] pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
return items, dict
def entropy(S):
items =
np.unique(S) if
items.size == 1:
return 0
for x in range(items.shape[0]):
intrinsic = np.zeros((items.shape[0],
1)) for x in range(items.shape[0]):
total_entropy = entropy(data[:, -
1]) iv = -1 * sum(intrinsic)
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
if (np.unique(data[:, -1])).shape[0] ==
1: node = Node("")
node.answer = np.unique(data[:, -1])
[0] return node
gains = np.zeros((data.shape[1] - 1,
1)) for col in range(data.shape[1] -
1):gains[col] = gain_ratio(data, col)
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
items, dict = subtables(data, split,
delete=True) for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node
def
empty(size):
s = ""
for x in range(size):
s += " "
return s
def print_tree(node,
level): if node.answer !
= "":
print(empty(level), node.answer)
return
print(empty(level),
node.attribute) for value, n in
node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)
metadata, traindata = read_data("E:\BALA\AI\Lab programs\pgms\
Tennisdata.csv") data = np.array(traindata)
RESULT:
Thus the program to implement decision tree based ID3 algorithm using python was
executed and verified successfully
Viva Questions:
1. What are data types in C? Why are they important?
2. What is the difference between int, float, and double data types?
5. What is the role of the sizeof() operator in determining data type sizes?
Ex.No:3 IMPLEMENTATION OF BACK PROPAGATION ALGORITHM TO
Date: BUILD AN ARTIFICIAL NEURAL NETWORK
AIM:
ALGORITHM:
1. Inputs X, arrive through the preconnected path.
2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers,
to the output layer.
4. Calculate the error in the outputs
5. Travel back from the output layer to the hidden layer to adjust the weights such
that the errors is decreased. Keep repeating the process until the desired output
is achieved.
PROGRAM:
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i in
range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i in
range(n_outputs)]
network.append(output_laye
r) return network
# Calculate neuron activation for an
input def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] *
inputs[i] return activation
# Transfer neuron
activation def
transfer(activation):
return 1.0 / (1.0 + exp(-
activation)) # Forward propagate input to a
network output def
forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in
layer:
activation = activate(neuron['weights'],
inputs) neuron['output'] =
transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
# Calculate the derivative of an neuron output
def transfer_derivative(output):
return output * (1.0 -
output) # Backpropagate error and store
in neurons
def backward_propagate_error(network,
expected): for i in
reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] *
neuron['delta']) errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(neuron['output'] -
expected[j])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] *
transfer_derivative(neuron['output']) # Update network weights with error
def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-
1] if i != 0:
inputs = [neuron['output'] for neuron in network[i -
1]] for neuron in network[i]:
for j in range(len(inputs)):
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2,
n_outputs) train_network(network, dataset, 0.5,
20, n_outputs) for layer in network:
print(layer)
OUTPUT:
RESULT:
Thus the Back propagation algorithm to build an Artificial Neural networks was
implemented successfully.
Ex.No:4 IMPLEMENTATION OF NAÏVE BAYESIAN CLASSIFIER FOR A
SAMPLE TRAINING DATASET AND TO COMPUTE ACCURACY
Date:
AIM:
To implement Naïve Bayesian classifier for Tennis data set and to compute the
accuracy with few datasets.
ALGORITHM:
Step:1 Convert the data set into a frequency table.
Step:2 Create likelihood table by finding the probabilities like overcast
probability =
0.29 and probability of plating is 0.64.
Step:3 Now, use Naive Bayesian equation to calculate the posterior probability for
each
class. The class with the highest posterior probability is the outcome
oprediction.
probability.
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 =
0.64 Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Step:4 Exit.
PROGRAM:
import pandas as pd from
sklearn import tree
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
data = pd.read_csv("E:\BALA\AI\Lab programs\pgms\Tennis.csv")
print("The first 5 values of data is :\n",data.head())
# obtain Train data and Train output
X = data.iloc[:,:-1]
print("\nThe First 5 values of train data is\n",X.head())
y = data.iloc[:,-1]
print("\nThe first 5 values of Train output is\n",y.head())
X.Outlook = le_outlook.fit_transform(X.Outlook)
le_Temperature = LabelEncoder()
X.Temperature = le_Temperature.fit_transform(X.Temperature)
le_Humidity = LabelEncoder()
X.Humidity = le_Humidity.fit_transform(X.Humidity)
le_Windy = LabelEncoder()
X.Windy = le_Windy.fit_transform(X.Windy) print("\
nNow the Train data is :\n",X.head())
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y) print("\
nNow the Train output is\n",y)
classifier = GaussianNB()
classifier.fit(X_train,y_train)
OUTPUT:
RESULT:
Thus the program to implement Naïve Bayesian classifier to compute the accuracy with few
datasets using python was executed and verified successfully.
Viva Questions:
1. What are input and output statements in C?
AIM:
To classify a set of documents using Naïve Bayesian classifier and to measure the
accuracy and precision
ALGORITHM:
Step:1 Import basic libraries.
Step:2 Importing the dataset.
Step:3 Data preprocessing.
Step:4 Training the model.
Step:5 Testing and evaluation of the model.
Step:6 Visualizing the model.
PROGRAM:
import numpy as np
twenty_train =
fetch_20newsgroups(subset='train',categories=categories,shuffle=True) twenty_test =
fetch_20newsgroups(subset='test',categories=categories,shuffle=True)
print(len(twenty_train.data))
print(len(twenty_test.data))
print(twenty_train.target_names)
print("\n".join(twenty_train.data[0].split("\n")))
print(twenty_train.target[0])
OUTPUT:
count_vect = CountVectorizer()
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf)
X_train_tfidf.shape
twenty_train.target)
X_test_tf = count_vect.transform(twenty_test.data)
X_test_tfidf = tfidf_transformer.transform(X_test_tf)
predicted = mod.predict(X_test_tfidf)
print(classification_report(twenty_test.target,predicted,target_names=twenty_test.target_names))
OUTPUT:
RESULT:
Thus the accuracy and precision was measured by Naïve Bayesian classifier model.
Viva Questions:
Ex.No:6 CONSTRUCTION OF A BAYESIAN NETWORK TO DIAGNOSE CORONA
INFECTION USING STANDARD WHO DATA SET
Date:
AIM:
To construct a Bayesian network to diagnose corona infection using WHO data set.
ALGORITHM:
This Naive Bayes is broken down into parts:
Step1: Separate by Class.
Step2: Summarize Dataset.
Step3: Summarize Data by Class.
Step4: Gaussian Probability Density
PROGRAM:
import numpy as np import pandas as pd
from scipy.stats import randint import pandas as pd
import matplotlib.pyplot as plt from pandas import set_option
plt.style.use('ggplot')
from sklearn.model_selection import train_test_split from sklearn.linear_model import
LogisticRegression from sklearn.feature_selection import RFE
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV from sklearn.preprocessing import
StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score from sklearn.metrics import
confusion_matrix
from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import
DecisionTreeClassifier from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel from sklearn import metrics
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
print(covid_19_data.info()) print()
cm = confusion_matrix(y_test, y_predict)
sns.heatmap(cm, annot=True, cmap='Blues')
print(classification_report(y_test, y_predict))
OUTPUT:
RESULT:
Thus the program to diagnose corona infection using Bayesian network was successfully
implemented using python.
Ex.No:7 COMPARISON OF CLUSTERING IN EM ALGORITHM AND K-MEANS
ALGORITHM USING THE SAME DATA SETS
Date:
AIM:
To compare the clustering in EM algorithm and K-means algorithm using the same data sets.
ALGORITHM:
The K-means implementation is as follows:
Step1: Choose the number of clusters k.
Step:2 Select k random points from the data as centroids.
Step3: Assign all the points to the closest cluster centroid.
Step:4 Recompute the centroids of newly formed clusters.
Step5:Repeat steps 3 and 4.
1. Expectation step (E - step): It involves the estimation (guess) of all missing values in
the dataset so that after completing this step, there should not be any missing value.
2. Maximization step (M - step): This step involves the use of estimated data in the E-
step and updating the parameters.
3. Repeat E-step and M-step until the convergence of the values occurs.
PROGRAM:
dataset=load_iris() #
print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets'] #
print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
# K-PLOT
plt.subplot(1,3,2) model=KMeans(n_clusters=3)
model.fit(X) predY=np.choose(model.labels_,
[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs) y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')
OUTPUT:
RESULT:
Thus the program to compare clustering in EM and K-means algorithm with few datasets
was performed successfully
Ex.No:8 IMPLEMENTATION OF K-NEAREST NEIGHBOUR ALGORITHM TO
CLASSIFY THE IRIS DATA SET
Date:
AIM:
ALGORITHM:
Step1:Calculate the Information Gain of each feature.
Step:2 Considering that all rows don’t belong to the same class, split the dataset S into subsets
using the feature for which the Information Gain is maximum.
Step3: Make a decision tree node using the feature with the maximum Information gain.
Step4:If all rows belong to the same class, make the current node as a leaf node with the class
as its label.
Step5: Repeat for the remaining features until we run out of all features, or the decision tree
has all leaf nodes.
PROGRAM:
x=X_test[i] x_new=np.array([x])
prediction=kn.predict(x_new)
print("TARGET=",y_test[i],dataset["target_names"]
[y_test[i]],"PREDICTED=",prediction,dataset["target_names"][prediction])
print(kn.score(X_test,y_test))
OUTPUT:
RESULT:
Thus the program for K-Nearest Neighbour algorithm was implemented successfully using
an iris data set
Viva Questions:
1. What are the different types of decision-making statements in C?
5. What is the ternary (?:) operator? How does it work as a decision-making tool?
Ex.No:9 IMPLEMENTATION OF THE NON-PARAMETRIC LOCALLY WEIGHTED
REGRESSION ALGORITHM IN ORDER TO FIT DATA POINT
Date:
AIM:
ALGORITHM:
Step1:Read the Given data Sample to X and the curve (linear or non linear) to Y
Step2: Set the value for Smoothening parameter or Free parameter say τ
2. Prediction = x0*β
PROGRAM:
OUTPUT:
RESULT:
Thus the non-parametric Locally Weighted Regression algorithm to fit data points was
implemented successfully.
Viva Questions:
1. What is recursion in C? How does it work?