0% found this document useful (0 votes)

22 views

ML Disha

Ml ur

Uploaded by

saumyabhatia69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

ML Disha

Ml ur

Uploaded by

saumyabhatia69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Experiment no- 1

Install WEKA on your system and brief the below points

1. What is WEKA
2. Advantages & Disadvantages of WEKA
3. Minimum Hardware Requirement
4. Installation steps
What is WEKA

• WEKA is a popular open-source machine learning and data mining software that provides
a collection of machine learning algorithms for data mining tasks.

• The name "WEKA" stands for "Waikato Environment for Knowledge Analysis," as it was
developed at the University of Waikato in New Zealand.

Key features of WEKA include:

• User Interface: WEKA provides a graphical user interface (GUI) that allows users to
interact with machine learning algorithms, build models, and evaluate results visually.

• Comprehensive Library: It includes a comprehensive collection of machine learning

algorithms for tasks such as classification, regression, clustering, association rule mining,
and feature selection.

• Data Preprocessing Tools: WEKA offers various tools for data preprocessing, including
options for handling missing values, transforming data, and selecting relevant features.

• Experimentation Environment: Users can design and conduct experiments to compare

different algorithms and approaches, facilitating the exploration of various machine
learning techniques.

• Extensibility: WEKA is extensible, allowing users to implement and integrate their own
algorithms.

• Data Visualization: It provides visualization tools to help users understand the data and
model results.
Advantages of WEKA:

• Open-Source: WEKA is freely available as an open-source software, allowing users to

access, modify, and distribute the code. This makes it a cost-effective option for
researchers, students, and organizations.

• Comprehensive Set of Algorithms: WEKA provides a wide range of machine learning

algorithms for tasks such as classification, regression, clustering, and association rule

[1]
mining. This diversity makes it suitable for various data mining and machine learning
applications.

• User-Friendly Interface: WEKA features a graphical user interface (GUI) that makes it
accessible to users with varying levels of technical expertise. The GUI facilitates the
exploration of algorithms, building and evaluating models, and experimenting with
different approaches.

• Data Preprocessing Tools: WEKA offers tools for data preprocessing, including handling
missing values, transforming data, and feature selection. These capabilities contribute to
the overall data preparation process.

• Experimentation Environment: The software provides a platform for designing and

conducting experiments to compare different machine learning algorithms, making it useful
for educational purposes and research.

• Platform Independence: Written in Java, WEKA is platform-independent, allowing it to

run on various operating systems.

Disadvantages of WEKA:

• Limited Scalability: While WEKA is suitable for small to medium-sized datasets, it may
face challenges with very large datasets due to limitations in scalability.

• Performance: The performance of some algorithms in WEKA may not be as optimized as

in specialized machine learning libraries, especially for large-scale or complex tasks.

• Advanced Features: Advanced machine learning practitioners may find WEKA lacking
some of the more recent and sophisticated algorithms and features available in other
specialized tools and libraries.

• Steep Learning Curve for Advanced Features: While the basic functionalities are
userfriendly, mastering advanced features and customizing algorithms may require a
steeper learning curve for users who are new to the software.

• Limited Support for Deep Learning: WEKA historically has been more focused on
traditional machine learning algorithms, and as of my last knowledge update in January
2022, it may not provide extensive support for deep learning techniques.

• Community and Documentation: While there is a WEKA community, it may not be as

extensive as some other machine learning ecosystems. Additionally, documentation for
certain features may be limited compared to more widely used machine learning
frameworks.

[2]
Minimum Hardware Requirement
Here are general guidelines for the minimum hardware requirements for running WEKA:

• Processor (CPU): A modern multi-core processor is recommended. The more cores, the
better, as certain machine learning tasks can benefit from parallel processing.

• Memory (RAM): At least 4 GB of RAM is typically recommended. However, for larger

datasets or more memory-intensive tasks, having 8 GB or more can significantly improve
performance.

• Storage:

Disk Space: A few gigabytes of free disk space should be sufficient for installing WEKA and
storing datasets.

Solid State Drive (SSD): While not strictly necessary, using an SSD can enhance the speed of data
access and program loading.

• Operating System: WEKA is written in Java and is platform-independent. It can run on

Windows, macOS, and Linux.

• Java Runtime Environment (JRE): WEKA requires Java to be installed on your system.
Ensure that you have a compatible version of Java installed.

Installation steps

Windows Installation:

• Download WEKA: Visit the official WEKA website: WEKA Download Page

• Choose the version of WEKA you want to download (e.g., stable version).

• Download the Windows version (e.g., .exe installer).

• Run the Installer: Run the downloaded installer.

• Follow the on-screen instructions to complete the installation.

• Java Installation: WEKA requires Java. If Java is not already installed on your system,
the installer may prompt you to download and install Java.

• Launch WEKA: After installation, you can launch WEKA from the Start menu or desktop
shortcut.

[3]
Experiment no- 2

(a) Install Anaconda Distribution on window or ubuntu operating

system.
(b) Get familiarized with arff file format. Create an arff file on your
system and save in the WEKA installed drive of your system.
(a) Install Anaconda Distribution on window or ubuntu operating system.

1. Download the Anaconda installer.

2. (Optional) Anaconda recommends verifying the integrity of

the installer after downloading it.

3. Go to your Downloads folder and double-click the installer to launch. To prevent

permission errors, do not launch the installer from the Favorites folder.

4. Click Next.

5. Read the licensing terms and click I Agree.

6. It is recommended that you install for Just Me, which will install Anaconda Distribution to
just the current user account. Only select an install for All Users if you need to install for
all users’ accounts on the computer (which requires Windows Administrator privileges).
7. Click Next.

8. Select a destination folder to install Anaconda and click Next. Install Anaconda to a
directory path that does not contain spaces or unicode characters. For more information on
destination folders

9. Choose whether to add Anaconda to your PATH environment variable or register Anaconda
as your default Python. We don’t recommend adding Anaconda to your PATH
environment variable, since this can interfere with other software. Unless you plan on
installing and running multiple versions of Anaconda or multiple versions of Python, accept

[4]
the default and leave this box checked. Instead, use Anaconda software by opening
Anaconda Navigator or the Anaconda Prompt from the Start Menu.

10. Click Install. If you want to watch the packages Anaconda is installing, click Show
Details.
11. Click Next.

12. After a successful installation you will see the “Thanks for installing Anaconda” dialog
box:

[5]
13. If you wish to read more about Anaconda.org and how to get started with Anaconda, check
the boxes “Anaconda Distribution Tutorial” and “Learn more about Anaconda”. Click the
Finish button.

14. Verify your installation.

(b) Get familiarized with arff file format. Create an arff file on your system and save in the WEKA
installed drive of your system.

Structure of file.

ARFF file contains 2 sections

1. Header Section

2. Data Section

All the keywords in ARFF file start with @ symbol.

1. Header Section
This section contains various information related to the dataset like the name of the relation,
columns, and type of columns. The header section contains 2 parts
Table/relation and attribute part. @relation: used to give the table name @attribute: used to
give a column name datatypes:

[6]
nominal: represented inside curly brackets (Like constants)
string: data type which accepts only string value numeric:

used to store numbers date: used to store date Syntax:

@relation tablename

@attribute column_name type

example:
@relation "employee"

@attribute f_name string

@attribute l_name string

@attribute contact_num numeric

@attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE}

@attribute DOB date dd-mm-yyyy

@attribute city string

2. Data section
Data section is used to represents the data or entries for available columns. (according to the
order in header section data would be inserted).

data section starts with @data, and this section must be added after Header section. only single
record can be written in single line.

@data: Used to start data section

%: % sign is used to represent the comment in file.

Syntax:
@data

[7]
all the Records must be in the same format as their attributes are defined in Header section Like

example:
1,naman,N,1234556678,IT,02-08-2000,rjt

2,yash,M,1234556679,HR,04-05-2001,amd

3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr

4,?,?,5234556678,IT,03-05-2000,amd

entire file would look like this: emp.arff

file:
@relation "employee"

@attribute id numeric

@attribute f_name string

@attribute l_name string

@attribute contact_num numeric

@attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE}

@attribute DOB date dd-mm-yyyy

@attribute city string

@data

1,naman,N,1234556678,IT,02-08-2000,rjt
2,yash,M,1234556679,HR,04-05-2001,amd

3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr

4,?,?,5234556678,IT,03-05-2000,amd

We separate values by comma(,) and to represent the empty or missing value for a particular
column we use the (?)sign. How to Create and open arff file you need to have weka tool install
on your machine.

Step 1: Open any text editor and paste the above code.

Step 2: Save the file with emp_dm.arff file extension

[8]
Step 3: Open weka tool

Step 4: Click on Explorer

Then click on Open file

Select/Locate arff file from disk then click On Open.

Step 6: file is now Loaded now click on Edit from Preprocess Tab

[9]
Step 7: dataset would be shown like this.

[10]
Experiment no- 3

(a) Execute the Linear Regression algorithm on WEKA with the help
of suitable data set. When you select your data set try to do the splitting
of data set for training and testing as: i) Training 80 % and Testing 20%
ii) Training 60 % and testing 40 % (b) Implement linear regression
using python.
(a) Execute the Linear Regression algorithm on WEKA with the help of suitable data set. When
you select your data set try to do the splitting of data set for training and testing as: i) Training 80
% and Testing 20% ii) Training 60 % and testing 40 %

Figure 1. WEKA startup screen

When you start WEKA, the GUI chooser pops up and lets you choose four ways to work with
WEKA and your data. For all the examples in this article series, we will choose only the Explorer
option.
Figure 2. WEKA Explorer

[11]
Now that you're familiar with how to install and start up WEKA, let's get into our first datamining
technique: regression.

Listing 1. WEKA file format

@RELATION house

@ATTRIBUTE houseSize NUMERIC

@ATTRIBUTE lotSize NUMERIC

@ATTRIBUTE bedrooms NUMERIC

@ATTRIBUTE granite NUMERIC

@ATTRIBUTE bathroom NUMERIC

@ATTRIBUTE sellingPrice NUMERIC

@DATA
3529,9191,6,0,0,205000
3247,10061,5,1,1,224900
4032,10150,5,0,1,197900

2397,14156,4,1,0,189900
2200,9600,4,0,1,195000
3536,19994,6,1,1,325000
2983,9365,5,0,1,230000

Loading the data into WEKA

Now that the data file has been created, it's time to create our regression model. Start WEKA, then
choose the Explorer. You'll be taken to the Explorer screen, with the Preprocess tab selected.
Select the Open File button and select the ARFF file you created in the section above. After
selecting the file, your WEKA Explorer should look similar to the screenshot in Figure 3.

[12]
Figure 3. WEKA with house data loaded

Creating the regression model with WEKA

To create the model, click on the Classify tab. The first step is to select the model we want to build,
so WEKA knows how to work with the data, and how to create the appropriate model:

1. Click the Choose button, then expand the functions branch.

2. Select the LinearRegression leaf.

This tells WEKA that we want to build a regression model. As you can see from the other choices,
though, there are lots of possible models to build. Lots! This should give you a good indication of
how we are only touching the surface of this subject. Also of note: There is another choice called
SimpleLinearRegression in the same branch. Do not choose this because simple regression only
looks at one variable, and we have six.

[13]
Figure 4. Linear regression model in WEKA

Now that the desired model has been chosen, we have to tell WEKA where the data is that it should
use to build the model. Though it may be obvious to us that we want to use the data we supplied
in the ARFF file, there are actually different options, some more advanced than what we'll be using.
The other three choices are Supplied test set, where you can supply a different set of data to build
the model; Cross-validation, which lets WEKA build a model based on subsets of the supplied
data and then average them out to create a final model; and Percentage split, where WEKA takes
a percentile subset of the supplied data to build a final model.

Now we are ready to create our model. Click Start. Figure 5 shows what the output should look
like.

[14]
Figure 5. House price regression model in WEKA

b) Implement linear regression using python.

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:

[15]
Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,
Y= Dependent Variable (Target Variable) X= Independent Variable
(predictor Variable) a0= intercept of the line (Gives an additional
degree of freedom) a1 = Linear regression coefficient (scale factor
to each input value). ε = random error

The values for x and y variables are training datasets for Linear Regression model representation.

Program: -

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt import
seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Get dataset
df_sal = pd.read_csv('Salary_Data.csv')
df_sal.head()

[16]
# Describe data
df_sal.describe()

# Data distribution
plt.title('Salary Distribution Plot')
sns.distplot(df_sal['Salary'])
plt.show()

[17]
# Relationship between Salary and Experience
plt.scatter(df_sal['YearsExperience'], df_sal['Salary'], color = 'lightcoral')
plt.title('Salary vs Experience') plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.box(False)
plt.show()

# Splitting variables
X = df_sal.iloc[:, :1] # independent y
= df_sal.iloc[:, 1:] # dependent

# Splitting dataset into test/train

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Regressor model regressor =

LinearRegression()
regressor.fit(X_train, y_train)

# Prediction result y_pred_test = regressor.predict(X_test) #

predicted value of y_test y_pred_train = regressor.predict(X_train) #
predicted value of y_train

[18]
# Prediction on training set
plt.scatter(X_train, y_train, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Training Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best', facecolor='white')
plt.box(False) plt.show()

# Prediction on test set

plt.scatter(X_test, y_test, color = 'lightcoral')
plt.plot(X_train, y_pred_train, color = 'firebrick')
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['X_train/Pred(y_test)', 'X_train/y_train'], title = 'Sal/Exp', loc='best', facecolor='white')
plt.box(False) plt.show()

[19]
# Regressor coefficients and intercept
print(f'Coefficient: {regressor.coef_}')
print(f'Intercept: {regressor.intercept_}')

# Calculate and print the mean squared error mse

= mean_squared_error(y_pred_test, y_test)
print('Mean Squared Error:', mse)

[20]
Experiment no- 4

(a) Execute the Logistic Regression with the help of properly

identified data set. Analyse the result and identify how well the model
performed on test set. Brief the steps that you have followed for
analyse the data set.
(b) Implement Logistic Regression using python.

(a) Execute the Logistic Regression with the help of properly identified data set. Analyse the result
and identify how well the model performed on test set. Brief the steps that you have followed
for analyse the data set.
(b) Implement Logistic Regression using python.

What is Logistic Regression?

Logistic regression is used for binary classification where we use sigmoid function, that takes input
as independent variables and produces a probability value between 0 and 1.
For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s
referred to as regression because it is the extension of linear regression but is mainly used for
classification problems.
Key Points:
• Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as
0 and 1, it gives the probabilistic values which lie between 0 and 1.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1). Linear Regression Equation:

Where, y is a dependent variable and x1, x2 ... and Xn are explanatory variables.

Sigmoid Function:

Apply Sigmoid function on linear regression:

[21]
Logistic Function – Sigmoid Function
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
• It maps any real value into another value within a range of 0 and 1. The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve
like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a value
below the threshold values tends to 0.

Program: -
#importing libraries import numpy
as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns import matplotlib.pyplot as plt

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.head() #showing top 5 data entry

df.describe() #describes are data

[22]
df.info() #gives information about the columns

df.shape #tells us about no. of rows and column [rows , columns]

(150, 5)

print(df["variety"].value_counts())
sns.countplot(df["variety"])

[23]
plt.figure(figsize=(8,4))
sns.heatmap(df.corr(),annot=True,fmt=".0%") #draws heatmap with input as the correlation matrix
calculted by(df.corr()) plt.show()

# We'll use seaborn's FacetGrid to color the scatterplot by species

sns.FacetGrid(df, hue="variety", height=5).map(plt.scatter, "sepal.length",
"sepal.width").add_legend()

[24]
from sklearn.linear_model import LogisticRegression # for Logistic Regression algorithm from
sklearn.model_selection import train_test_split #to split the dataset for training and testing from
sklearn import metrics #for checking the model accuracy

X=df.iloc[:,0:4]
Y=df["variety"]
X.head()

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.25,random_state=0)# in this our

main data is split into train and test
# the attribute test_size=0.3 splits the data into 70% and 30% ratio. train=70% and test=30%
print("Train Shape",X_train.shape) print("Test Shape",X_test.shape)

[25]
log = LogisticRegression()
log.fit(X_train,Y_train)
prediction=log.predict(X_test)
print('The accuracy of the Logistic Regression is',metrics.accuracy_score(prediction,Y_test))

[26]
Experiment no- 5

Execute the Naïve Bayes algorithm with suitable data set and do proper
analysis on the result. Also implement Naïve Bayes algorithm using
python.
Execute the Naïve Bayes algorithm with suitable data set and do proper analysis on the result.
Also implement Naïve Bayes algorithm using python.
Naïve Bayes Classifier Algorithm
• Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification algorithms
which helps in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental
analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described
as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem: o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the conditional
probability. o The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Program: -
#importing libraries import numpy
as np # linear algebra

[27]
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns import matplotlib.pyplot as plt from
sklearn.naive_bayes import GaussianNB from sklearn import metrics
from sklearn.model_selection import train_test_split #to split the dataset for training and testing

df= pd.read_csv("iris.csv") #importing dataset and making dataframe

df.head() #showing top 5 data entry

df.describe() #describes are data

df.info() #gives information about the columns

[28]
df.shape #tells us about no. of rows and column [rows , columns]
(150, 5)

print(df["variety"].value_counts())
sns.countplot(df["variety"])

plt.figure(figsize=(8,4))
sns.heatmap(df.corr(),annot=True,fmt=".0%") #draws heatmap with input as the correlation matrix
calculted by(df.corr()) plt.show()

[29]
# We'll use seaborn's FacetGrid to color the scatterplot by species
sns.FacetGrid(df, hue="variety", height=5).map(plt.scatter, "sepal.length",
"sepal.width").add_legend()

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.4,random_state=0)# in this our

main data is split into train and test
# the attribute test_size=0.4 splits the data into 60% and 40% ratio. train=60% and test=40%
print("Train Shape",X_train.shape) print("Test Shape",X_test.shape)

Train Shape (90, 4)

Test Shape (60, 4)
#Creating Naive Bayes classifier model
gnb = GaussianNB()

[30]
gnb.fit(X_train, Y_train)
GaussianNB(priors=None, var_smoothing=1e-09)

# making predictions on the testing set y_pred

= gnb.predict(X_test)

print("Gaussian Naive Bayes model accuracy(in %):",

metrics.accuracy_score(Y_test, y_pred)*100)
Gaussian Naive Bayes model accuracy(in %): 93.33333333333333

[31]
Experiment no- 6

Identify a data set for executing the Decision Tree algorithm to

implement using python and analyse the same with cross validation and
percentage split.
Identify a data set for executing the Decision Tree algorithm to implement using python and
analyse the same with cross validation and percentage split.
Decision Tree Algorithm
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree
into subtrees.

Program: -
# Load libraries import pandas as pd import seaborn as sns from sklearn.tree import
DecisionTreeClassifier # Import Decision Tree Classifier from sklearn.model_selection
import train_test_split # Import train_test_split function from sklearn import metrics
#Import scikit-learn metrics module for accuracy calculation # load dataset pima =
pd.read_csv("diabetes.csv") pima.head()

[32]
pima.describe()

pima.info()

print(pima["Outcome"].value_counts())
sns.countplot(pima["Outcome"])

#split dataset in features and target variable

[33]
feature_cols = ['Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
X = pima[feature_cols] # Features y
= pima.Outcome # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70%
training and 30% test

# Create Decision Tree classifer object

clf = DecisionTreeClassifier() # Train
Decision Tree Classifer clf =
clf.fit(X_train,y_train) #Predict the
response for test dataset y_pred =
clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
Accuracy: 0.70995670995671

[34]
Experiment no- 7

Identify a data set to execute Support Vector Machine algorithm and do

the proper analysis with different test options.
Identify a data set to execute Support Vector Machine algorithm and do the proper analysis with
different test options.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider
the below diagram in which there are two different categories that are classified using a decision
boundary or hyperplane:

Program: -

# Load libraries
import pandas as pd
import seaborn as sns
from sklearn import svm # Import SVM Classifier

[35]
from sklearn.model_selection import train_test_split # Import train_test_split function from
sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

# load dataset
pima = pd.read_csv("diabetes.csv") pima.head()

pima.describe()

pima.info()

print(pima["Outcome"].value_counts())
sns.countplot(pima["Outcome"])

[36]
#split dataset in features and target variable
feature_cols = ['Pregnancies', 'Glucose', 'BloodPressure',
'SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
X = pima[feature_cols] # Features y
= pima.Outcome # Target variable

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70%
training and 30% test

#Create a svm Classifier

clf = svm.SVC(kernel='linear') # Linear Kernel

#Train the model using the training sets clf.fit(X_train,

y_train)

#Predict the response for test dataset y_pred

= clf.predict(X_test)

# Model Accuracy: how often is the classifier correct?

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))
print("Precision", metrics.precision_score(y_test, y_pred))

Accuracy: 0.7922077922077922
Precision 0.7936507936507936

[37]
Experiment no- 8

Identify / prepare a data set for executing K-Means algorithm.

Implement K-Means algorithm using python. Do the proper analysis of
the result with visualizing the clusters and by changing the K.
Identify / prepare a data set for executing K-Means algorithm. Implement K-Means algorithm using
python. Do the proper analysis of the result with visualizing the clusters and by changing the K.

What is K-Means Algorithm?

• K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters.
• Here K defines the number of pre-defined clusters that need to be created in the process, as
if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
• It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
• It is a centroid-based algorithm, where each cluster is associated with a centroid.
• The main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.
• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters.
• The value of k should be predetermined in this algorithm.
• The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has data points with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

The working of the K-Means algorithm is explained in the below steps:

• Step-1: Select the number K to decide the number of clusters.

[38]
• Step-2: Select random K points or centroids. (It can be other from the input dataset).
• Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready.
•
How to choose the value of "K number of clusters" in K-means Clustering?
• The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms.
• But choosing the optimal number of clusters is a big task.
• There are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
Elbow Method
• The Elbow method is one of the most popular ways to find the optimal number of clusters.
This method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of
Squares, which defines the total variations within a cluster.

• The formula to calculate the value of WCSS (for 3 clusters) is given below:

• Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the
elbow method. The graph for the elbow method looks like the below image:

Program: -
# importing libraries
import numpy as nm

[39]
import matplotlib.pyplot as mtp
import pandas as pd from
sklearn.cluster import KMeans

# Importing the dataset

dataset = pd.read_csv('Mall_Customers.csv')
print(dataset.head())

x = dataset.iloc[:, [3, 4]].values

wcss_list= [] #Initializing the list for the values of WCSS

#Using for loop for iterations from 1 to 10. for i in range(1, 11): kmeans
= KMeans(n_clusters=i, init='k-means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list') mtp.show()

#training the K-means model on a dataset

kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
y_predict=kmeans.fit_predict(x)

#visulaizing the clusters

[40]
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1') #for
first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
#for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = 'Cluster 3') #for
third cluster
mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4') #for
fourth cluster
mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
#for fifth cluster
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label
= 'Centroid') mtp.title('Clusters of customers') mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)') mtp.legend(loc='lower center') mtp.show()

Experiment no- 9

Implementation of different functions to train Neural

activation Network.
Implementation of different activation functions to train Neural Network.

[41]
What is Activation Functions in Neural Network?
Activation functions in Neural Network are used in a neural network to compute the weighted sum
of inputs and biases, which is in turn used to decide whether a neuron can be activated or not. It
manipulates the presented data and produces an output for the neural network that contains the
parameters in the data. The activation functions are also referred to as transfer functions in some
literature. These can either be linear or nonlinear depending on the function it represents and is
used to control the output of neural networks across different domains.

Different kind of activation functions

There are several commonly used activation functions in neural networks, including:
Sigmoid: A sigmoid activation function maps any input to the range of 0 and 1, producing an
output that can be interpreted as a probability.
ReLU (Rectified Linear Unit): The ReLU activation function sets any negative input to 0 and
retains positive inputs unchanged. This function has become widely popular in deep learning due
to its computational efficiency and ability to avoid the vanishing gradient problem.
Tanh (Hyperbolic Tangent): The Tanh activation function maps its inputs to the range of -1 and
1, producing outputs with zero mean and unit variance. This makes it useful for normalizing the
output of a neuron, which can improve the performance of the network.
Softmax: Softmax activation is typically used as the final activation function in a neural network
for multiclass classification problems. It maps its inputs to a probability distribution over multiple
classes.
Leaky ReLU: It is similar to the ReLU function but allows a small gradient for negative inputs,
preventing neurons from dying (i.e., outputting zero for all inputs).
Swish: Swish is a recent activation function that has been shown to outperform ReLU on some
tasks. It is defined as x * sigmoid(x).
These are some of the most widely used activation functions, but there are others as well that have
been developed for specific use cases.

Program: import numpy as np

import tensorflow as tf from
tensorflow import keras from
tensorflow.keras import layers

[42]
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data x_train = x_train.reshape(x_train.shape[0], 28 *

28).astype('float32') / 255 x_test = x_test.reshape(x_test.shape[0], 28 *
28).astype('float32') / 255

# Convert labels to one-hot encodings y_train

= keras.utils.to_categorical(y_train, 10) y_test
= keras.utils.to_categorical(y_test, 10)

# Define the activation functions

def sigmoid(x): return 1 / (1 +
np.exp(-x))
def relu(x): return
np.maximum(0, x)
def tanh(x): return
np.tanh(x)
def leaky_relu(x, alpha=0.01):
return np.maximum(alpha * x, x)
def softmax(x):
exp_x = np.exp(x)
return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Build the model model = keras.Sequential([ layers.Dense(128,

input_shape=(28 * 28,), activation='sigmoid'), layers.Dense(10,
activation='softmax') ])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model

model.fit(x_train, y_train, epochs=5, batch_size=32)
# Evaluate the model on the test set test_loss,
test_acc = model.evaluate(x_test, y_test)
print('Test Loss:', test_loss) print('Test
Accuracy:', test_acc)

[43]
[44]
Experiment no- 10

Implementation of Perceptron Networks using tensoreflow and keras.

What is Single Layer Perceptron?

It is one of the oldest and first introduced neural networks. It was proposed by Frank Rosenblatt in
1958. Perceptron is also known as an artificial neural network. Perceptron is mainly used to
compute the logical gate like AND, OR, and NOR which has binary input and binary output.
The main functionality of the perceptron is:
Takes input from the input layer
Weight them up and sum it up.
Pass the sum to the nonlinear function to produce the output.

Program: import numpy as

np import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
%matplotlib inline

(x_train, y_train),(x_test, y_test) = keras.datasets.mnist.load_data()

print("Data for training:",len(x_train))

print("Data for testing:",len(x_test))
x_train[0].shape
plt.matshow(x_train[0])

[45]
# Normalizing the dataset
x_train = x_train/255
x_test = x_test/255

# Flatting the dataset in order

# to compute for model building x_train_flatten =
x_train.reshape(len(x_train), 28*28) x_test_flatten =
x_test.reshape(len(x_test), 28*28)

model = keras.Sequential([ keras.layers.Dense(10,

input_shape=(784,),
activation='sigmoid')
])
model.compile( optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train_flatten, y_train, epochs=5)

model.evaluate(x_test_flatten, y_test)

[46]

Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
BCP Testing PDF
No ratings yet
BCP Testing PDF
16 pages
Certified Cloud Security Professional (CCSP) : Study Guide
100% (2)
Certified Cloud Security Professional (CCSP) : Study Guide
182 pages
ML CS3EL15 Lab Manual Complete
No ratings yet
ML CS3EL15 Lab Manual Complete
47 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Dwm practical ..
No ratings yet
Dwm practical ..
41 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
What is Weka_
No ratings yet
What is Weka_
2 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Mooc-on-Weka
No ratings yet
Mooc-on-Weka
59 pages
AIML FINAL
No ratings yet
AIML FINAL
45 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
WEKA LAB MANUAL (1)
No ratings yet
WEKA LAB MANUAL (1)
49 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Weka Installation Steps Final
No ratings yet
Weka Installation Steps Final
7 pages
MCSL-223 Section 2 Data Mining Lab
No ratings yet
MCSL-223 Section 2 Data Mining Lab
55 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
aiml manual
No ratings yet
aiml manual
27 pages
Aim Theory::: Study and Working of WEKA Tool
No ratings yet
Aim Theory::: Study and Working of WEKA Tool
3 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Lab02
No ratings yet
Lab02
4 pages
Weka Tool Presentation
No ratings yet
Weka Tool Presentation
41 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Introduction To Weka
No ratings yet
Introduction To Weka
9 pages
Practical Play Framework: Focus on what is really important
From Everand
Practical Play Framework: Focus on what is really important
Alberto Souza
No ratings yet
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
DWDM File
No ratings yet
DWDM File
26 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
DWM1
No ratings yet
DWM1
19 pages
NOTES
No ratings yet
NOTES
45 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Result Prediction Using Weka: An Effort by - Shlok Tibrewal (14bit0088) Siddarth Nyati (14bit0074)
No ratings yet
Result Prediction Using Weka: An Effort by - Shlok Tibrewal (14bit0088) Siddarth Nyati (14bit0074)
11 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
DWDM LAB MANUAL
No ratings yet
DWDM LAB MANUAL
55 pages
CLVII-Part A Lab Manual
No ratings yet
CLVII-Part A Lab Manual
57 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Weka DW&DM Lab Notes
No ratings yet
Weka DW&DM Lab Notes
37 pages
Weka_(software)
No ratings yet
Weka_(software)
4 pages
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
From Everand
Mastering Nikto: A Comprehensive Guide to Web Vulnerability Scanning: Security Books
Erwin Dirks
No ratings yet
Sales Invoice Data Entry
No ratings yet
Sales Invoice Data Entry
3 pages
Fluke 1744 Power Quality Analyzer
No ratings yet
Fluke 1744 Power Quality Analyzer
5 pages
5.open Nebula Commands
No ratings yet
5.open Nebula Commands
7 pages
E-B Tax Extensible Parameters Feature and Its Usage - Oracle Document # 1992106.1
No ratings yet
E-B Tax Extensible Parameters Feature and Its Usage - Oracle Document # 1992106.1
5 pages
BCC 301 Cyber Security Notes Unit 2
No ratings yet
BCC 301 Cyber Security Notes Unit 2
12 pages
Linux Basic Commands
No ratings yet
Linux Basic Commands
19 pages
unit-4 mobile application development
No ratings yet
unit-4 mobile application development
23 pages
Compiler Design - Phases of Compiler
No ratings yet
Compiler Design - Phases of Compiler
3 pages
By Robert B Salter Textbook of Disorders and Injuries of The Musculoskeletal System Third 3rd Edition by Author B004hgvxko PDF
0% (2)
By Robert B Salter Textbook of Disorders and Injuries of The Musculoskeletal System Third 3rd Edition by Author B004hgvxko PDF
5 pages
Gann InSights
No ratings yet
Gann InSights
8 pages
Systems Engineer: Executive Summary
No ratings yet
Systems Engineer: Executive Summary
5 pages
Advanced Word Processing Skills
100% (1)
Advanced Word Processing Skills
10 pages
TNB IEC 61850 System Verification and Simulation (SVS) Laboratory: Enabler To A Successful Smart Grid Implementation
No ratings yet
TNB IEC 61850 System Verification and Simulation (SVS) Laboratory: Enabler To A Successful Smart Grid Implementation
6 pages
Satya Prakash Public School Class 7 Ch:10 Cyber Safety: A. Fill in The Blanks
No ratings yet
Satya Prakash Public School Class 7 Ch:10 Cyber Safety: A. Fill in The Blanks
3 pages
Monster Go DJ Manual
No ratings yet
Monster Go DJ Manual
104 pages
Ecutalk Consult LCD Display V2 User Guide: Background
No ratings yet
Ecutalk Consult LCD Display V2 User Guide: Background
4 pages
file VD minh hoạ cụ thể chủ đề 2
No ratings yet
file VD minh hoạ cụ thể chủ đề 2
13 pages
521.105.03.2 Quick Start P1F - English 1.0
No ratings yet
521.105.03.2 Quick Start P1F - English 1.0
20 pages
Com 422 Computer Graphics
No ratings yet
Com 422 Computer Graphics
60 pages
Sonar Interferrométrico 4600 System Manual
No ratings yet
Sonar Interferrométrico 4600 System Manual
113 pages
Veni Vidi Vici 2nd Edition Eugene Ehrlich All Chapters Instant Download
100% (7)
Veni Vidi Vici 2nd Edition Eugene Ehrlich All Chapters Instant Download
71 pages
Download (Ebook) Fundamentals of Multimedia by Ze-Nian Li, Mark S. Drew, Jiangchuan Liu ISBN 9783030621230, 3030621235 ebook All Chapters PDF
100% (6)
Download (Ebook) Fundamentals of Multimedia by Ze-Nian Li, Mark S. Drew, Jiangchuan Liu ISBN 9783030621230, 3030621235 ebook All Chapters PDF
67 pages
Examen1 CSWP
100% (2)
Examen1 CSWP
16 pages
SimSXCu Full Version 2 0
No ratings yet
SimSXCu Full Version 2 0
83 pages
Incident Management Process
100% (1)
Incident Management Process
18 pages
5M570ZT144I5N
No ratings yet
5M570ZT144I5N
30 pages
DSA Lab Experiments - 7 C), 7 D), 7 E) 7 F) 7 G)
No ratings yet
DSA Lab Experiments - 7 C), 7 D), 7 E) 7 F) 7 G)
8 pages
Plant Cell 1
No ratings yet
Plant Cell 1
53 pages