0% found this document useful (0 votes)

23 views7 pages

B-56 Sanket Jambhulkar MLA-3

Uploaded by

sanketjambhulkar018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

B-56 Sanket Jambhulkar MLA-3

Uploaded by

sanketjambhulkar018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

G. H.

RAISONI COLLEGE OF ENGINEERING, NAGPUR

(An Autonomous Institute affiliated to RTM Nagpur University)
Department of Computer Science & Engineering
Session: Summer 2023

Date :
Practical Details: Practical No. 3
Student Details:
Roll Number 56
Name Sanket Jambhulkar
Semester 5th
Section B
Branch CSE
Subject MLA

Aim: Write a python program to classify the given dataset using Logistic Regression and
evaluate the model.

Tools: PIMA diabetes Dataset, Python, Kaggle jupyter Notebook

Theory: Introduction:
Logistic Regression is a fundamental algorithm used for binary classification tasks. Despite its
name, it's a classification algorithm rather than a regression one. In this theoretical explanation,
we will delve into the concept of Logistic Regression, its underlying principles, and how it is
used for classification tasks. Furthermore, we will discuss the process of evaluating a Logistic
Regression model's performance.

1) Logistic Regression:

 Logistic Regression is a statistical method used for predicting the probability of a binary
outcome.
 It models the probability that a given input belongs to a particular class.
 The logistic function (sigmoid function) is used to map input features to the range [0, 1],
representing probabilities.
 Mathematically, the logistic function is expressed as:
σ(z) = 1 / (1 + e^(-z)), where z = w^T * x + b, w is the weight vector, x is the feature
vector, and b is the bias term.

2) Training Process:

 In the training process, the Logistic Regression model learns the optimal weights and
bias that minimize a predefined loss function, typically the logistic loss or cross-entropy
loss.
 This process involves iterative optimization algorithms such as gradient descent, where
the model iteratively updates the weights and bias to minimize the loss function.
3) Classification:

 After training, the Logistic Regression model uses the learned parameters to predict the
probability that a given input belongs to the positive class (class 1).
 If the predicted probability is greater than a predefined threshold (usually 0.5), the input
is classified as belonging to the positive class; otherwise, it is classified as belonging to
the negative class (class 0).

4) Model Evaluation:

 Several metrics are commonly used to evaluate the performance of a Logistic Regression
model, including accuracy, precision, recall, F1-score, and area under the ROC curve
(AUC-ROC).
 Accuracy measures the proportion of correctly classified instances out of the total
instances.
 Precision measures the proportion of true positive predictions among all positive
predictions.
 Recall measures the proportion of true positive predictions among all actual positive
instances.
 F1-score is the harmonic mean of precision and recall and provides a balanced measure
of a model's performance.
 AUC-ROC measures the area under the Receiver Operating Characteristic curve and
provides a comprehensive evaluation of the model's ability to discriminate between
positive and negative instances across different threshold value
predict diabetes using the Logistic Regression Classifier

Importing necessary libraries

If you already have an idea of the dataset you would like to use from the package, you can
specify it. In the following example, we will import the diabetes dataset. This dataset
contains data from diabetic patients and contains certain features such as their bmi, age ,
blood pressure and glucose levels which are useful in predicting the diabetes disease
progression in patients.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.datasets import load_diabetes

Load the diabetes dataset

diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

Splitting the dataset into training and testing sets

To understand model performance, dividing the dataset into a training set and a test set is a
good strategy.
Let's split the dataset by using the function train_test_split(). You need to pass 3
parameters: features, target, and test_set size. Additionally, you can use random_state to
select records randomly.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Initialize and train the logistic regression model

Logistic Regression is another statistical analysis method borrowed by Machine Learning.
It is used when our dependent variable is dichotomous or binary. It just means a variable
that has only 2 outputs, for example, A person will survive this accident or not, The
student will pass this exam or not.
model = LogisticRegression()
model.fit(X_train, y_train)
LogisticRegression()

Predicting on the test set

y_pred = model.predict(X_test)
y_pred

array([200., 178., 178., 178., 178., 200., 178., 200., 71., 200., 200.,
71., 71., 178., 71., 71., 178., 178., 71., 178., 200., 200.,
71., 178., 200., 178., 178., 178., 71., 71., 200., 200., 71.,
178., 200., 200., 71., 178., 178., 71., 71., 71., 71., 178.,
200., 200., 71., 71., 71., 200., 71., 71., 71., 71., 200.,
200., 200., 178., 200., 71., 200., 200., 71., 71., 200., 200.,
200., 178., 71., 71., 71., 178., 178., 71., 71., 178., 200.,
200., 178., 200., 200., 71., 71., 200., 71., 71., 71., 71.,
200.])

Evaluating the model

It is one of the performance evaluation metrics of a classification-based machine learning
model. It displays your model’s precision, recall, F1 score and support. It provides a better
understanding of the overall performance of our trained model. To understand the
classification report of a machine learning model, you need to know all of the metrics
displayed in the report. For a clear understanding, I have explained all of the metrics below
so that you can easily understand the classification report of your machine learning model:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)

print("Mean Absolute Error:", mae)
print("R-squared:", r2)

Mean Squared Error: 5691.91011235955

Mean Absolute Error:
61.640449438202246 R-squared:
0.07431994826369315

plotting ROC and Precision-Recall curve

This flexibility comes from the way that probabilities may be interpreted using different
thresholds that allow the operator of the model to trade-off concerns in the errors made by
the model, such as the number of false positives compared to the number of false negatives.
This is required when using models where the cost of one error outweighs the cost of other
types of errors.
Two diagnostic tools that help in the interpretation of probabilistic forecast for binary
(two-class) classification predictive modeling problems are ROC Curves and Precision-
Recall curves.
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], 'k--', lw=4)
plt.xlabel('True Values')
plt.ylabel('Predicted Values')
plt.title('Predicted vs Actual Values')
plt.show()

confusion matrix
import matplotlib.pyplot as plt
import seaborn as sns

# Define number of bins and bin edges

num_bins = 10
bin_edges = np.linspace(min(min(y_test), min(y_pred)), max(max(y_test),
max(y_pred)), num_bins + 1)

# Create bins
y_test_bins = np.digitize(y_test, bin_edges) y_pred_bins =
np.digitize(y_pred, bin_edges)

# Create confusion matrix

conf_matrix = np.zeros((num_bins, num_bins))
for i in range(len(y_test)):
# Adjust for potential boundary cases
actual_index = min(y_test_bins[i], num_bins) - 1
pred_index = min(y_pred_bins[i], num_bins) - 1
conf_matrix[actual_index, pred_index] += 1

# Plot confusion matrix heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='.0f', cmap='Blues', cbar=False)
plt.xlabel('Predicted Bin')
plt.ylabel('Actual Bin') plt.title('Confusion
Matrix (Binned)')
plt.xticks(np.arange(num_bins) + 0.5, np.arange(1, num_bins + 1))
plt.yticks(np.arange(num_bins) + 0.5, np.arange(1, num_bins + 1))
plt.show()
conclusion: Logistic Regression is a powerful algorithm for binary classification tasks,
providing interpretable results and efficient computation. Understanding its principles
and the process of evaluating its performance is crucial for effectively applying it to
real-world datasets. By comprehensively evaluating the model's performance, we can
assess its suitability for the given task and make informed decisions about its
deployment.

Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
20 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Logistic Regression Lab Manual
No ratings yet
Logistic Regression Lab Manual
7 pages
08 Logistic Regression
No ratings yet
08 Logistic Regression
19 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
ML Exp 8
No ratings yet
ML Exp 8
22 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Logistic Regression for Beginners
No ratings yet
Logistic Regression for Beginners
3 pages
Day.12 Logistic Regression
No ratings yet
Day.12 Logistic Regression
8 pages
Practical - Logistic Regression
No ratings yet
Practical - Logistic Regression
84 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Advanced Regression with GLMs
No ratings yet
Advanced Regression with GLMs
13 pages
Mla 4
No ratings yet
Mla 4
2 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Part A Assignment - No - 5 PDF
No ratings yet
Part A Assignment - No - 5 PDF
8 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
Exp2 Milf
No ratings yet
Exp2 Milf
7 pages
Logistic Regression for Analysts
No ratings yet
Logistic Regression for Analysts
33 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Lesson 8
No ratings yet
Lesson 8
30 pages
23UCC554
No ratings yet
23UCC554
9 pages
Logistic Regression Tutorial Python
No ratings yet
Logistic Regression Tutorial Python
30 pages
SMDS Unit 5
No ratings yet
SMDS Unit 5
21 pages
Logistic Regression
No ratings yet
Logistic Regression
61 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
AIML - Lab7 - Manual (Model Eval-Cross Validation)
No ratings yet
AIML - Lab7 - Manual (Model Eval-Cross Validation)
6 pages
INSY446 - 4 - Classification Part 1
No ratings yet
INSY446 - 4 - Classification Part 1
26 pages
Chp2 Logistic Regression
No ratings yet
Chp2 Logistic Regression
6 pages
Session-11 Machine Learning
No ratings yet
Session-11 Machine Learning
27 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
? What Is Logistic Regression
No ratings yet
? What Is Logistic Regression
15 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
5 pages
Csa Lab 3
No ratings yet
Csa Lab 3
14 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
No ratings yet
8 - Logistic - Regression - Multiclass - Ipynb - Colaboratory
6 pages
Good-Logistic Regression With A Real-World Example in Python - MarkTechPost
No ratings yet
Good-Logistic Regression With A Real-World Example in Python - MarkTechPost
9 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
Week-7 DS Practical
No ratings yet
Week-7 DS Practical
8 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
No ratings yet
M.E Machine Learning - CP4252 Lab Manual4716718074353656238
26 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Task 1
No ratings yet
Task 1
7 pages
Toc-Unit-5 Push Down Automata
No ratings yet
Toc-Unit-5 Push Down Automata
14 pages
Beej's Guide To C Programming: Brian "Beej Jorgensen" Hall
No ratings yet
Beej's Guide To C Programming: Brian "Beej Jorgensen" Hall
303 pages
Java Lab Manual
No ratings yet
Java Lab Manual
13 pages
File Handling Ques-Ans
No ratings yet
File Handling Ques-Ans
3 pages
CS-200 Chapter 1 - 5
No ratings yet
CS-200 Chapter 1 - 5
45 pages
Oop Lecture 8
No ratings yet
Oop Lecture 8
12 pages
CSE215.4 Problem Set B Summer 2024
No ratings yet
CSE215.4 Problem Set B Summer 2024
6 pages
Geetha's Resume
No ratings yet
Geetha's Resume
1 page
Indira Gandhi National Open University: Grade Card Status As On 28-02-2024
No ratings yet
Indira Gandhi National Open University: Grade Card Status As On 28-02-2024
1 page
THEORY FILE - Programming in Java (5th Sem) .
100% (1)
THEORY FILE - Programming in Java (5th Sem) .
87 pages
CyberAces Module3-Bash 3 FlowControl
No ratings yet
CyberAces Module3-Bash 3 FlowControl
12 pages
Ibps Clerk Prelims Day - 18 e 168517916665
No ratings yet
Ibps Clerk Prelims Day - 18 e 168517916665
41 pages
Grade 6 Mcpan
No ratings yet
Grade 6 Mcpan
10 pages
Vdoc - Pub The Complete Coding Interview Guide in Java
100% (1)
Vdoc - Pub The Complete Coding Interview Guide in Java
975 pages
Wa0015.
No ratings yet
Wa0015.
18 pages
Online Algorithms For Rent or Buy With Expert Advice
No ratings yet
Online Algorithms For Rent or Buy With Expert Advice
9 pages
Heidari 2018
No ratings yet
Heidari 2018
53 pages
Surreptitious Software Book
No ratings yet
Surreptitious Software Book
13 pages
Career Talk For ICT in PDF
No ratings yet
Career Talk For ICT in PDF
20 pages
JSP Class Notes
No ratings yet
JSP Class Notes
49 pages
What Is C++? What Are The Advantages of C++?
No ratings yet
What Is C++? What Are The Advantages of C++?
22 pages
C# Docs
No ratings yet
C# Docs
2,288 pages
Specimen (2023) QP - Paper 2 CAIE Computer Science GCSE
No ratings yet
Specimen (2023) QP - Paper 2 CAIE Computer Science GCSE
16 pages
2 Identifiers - Data Type - Arithmetic Operations حل-1
No ratings yet
2 Identifiers - Data Type - Arithmetic Operations حل-1
2 pages
Typescript
No ratings yet
Typescript
3 pages
Search For A String in Python-Exp-5
No ratings yet
Search For A String in Python-Exp-5
6 pages
Unit 1 - Lesson 1 - Binary Systems
No ratings yet
Unit 1 - Lesson 1 - Binary Systems
42 pages
DSP Digital Notes
No ratings yet
DSP Digital Notes
90 pages
Answer Key - CSC
No ratings yet
Answer Key - CSC
7 pages
Exponents and Logarithms Problems
No ratings yet
Exponents and Logarithms Problems
3 pages

B-56 Sanket Jambhulkar MLA-3

Uploaded by

B-56 Sanket Jambhulkar MLA-3

Uploaded by

G. H.

RAISONI COLLEGE OF ENGINEERING, NAGPUR

Tools: PIMA diabetes Dataset, Python, Kaggle jupyter Notebook

Importing necessary libraries

Load the diabetes dataset

Splitting the dataset into training and testing sets

Initialize and train the logistic regression model

Predicting on the test set

Evaluating the model

print("Mean Squared Error:", mse)

Mean Squared Error: 5691.91011235955

plotting ROC and Precision-Recall curve

# Define number of bins and bin edges

# Create confusion matrix

# Plot confusion matrix heatmap

You might also like