0% found this document useful (0 votes)

22 views

ML Practical 04

The document discusses analyzing a diabetes dataset using Python and various machine learning techniques. It loads and explores the dataset, cleans missing values, scales the data, splits it into training and test sets, and evaluates a k-nearest neighbors model on the data to find the optimal number of neighbors.

Uploaded by

whitehouse2202

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

ML Practical 04

Uploaded by

whitehouse2202

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 20

10/18/23, 2:51 PM

Practical 04

Importing Libraries

In [1]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

sns.set()

import warnings

warnings. filterwarnings(’ ignore

#matplotlib inline

Loading The Dataset

In [2]:

diabetes_data = pd.read_csv('diabetes.csv')

diabetes_data

ML _Practical_04 - Jupyter Notebook

Out[2]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age Outcome
0 6 148 72 35 0 336 0627 50 1
1 1 85 66 29 0 266 0351 3 0
2 8 183 64 0 0 233 0872 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2288 33 1
763 10 101 76 48 180 32.9 0171 863 0
764 2 122 70 27 0 36.38 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
768 1 126 60 [1] 0 301 0349 47 1
767 1 a3 70 3 0 304 0315 23 0
768 rows x 9 columns
In [3]:
#Print the first 5 rows of the daotaframe.
diabetes_data.head()
out[3]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age Outcomes
0 6 148 72 35 0 336 0.627 50 1
1 1 85 66 29 0 268 0.351 31 0
2 8 183 64 o 0 233 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2288 33 1

Check Information About Dataset

localhost:8888/notebooks/ML_Practical 04.ipynb

1M
10/18/23, 2:51 PM

In [4]:

diabetes_data.info({verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, @ to 767
Data columns (total 9 columns):

# Column

Pregnancies
Glucose

BloodPressure 768 non-null

SkinThickness 768 non-null

BMI
Pedigree
Age
Outcome
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

In [5]:

}
1
2
3
4 Insulin
5
6
7
8

Non-Null Count
768 non-null
768 non-null

768 non-null
768 non-null
768 non-null
768 non-null
768 non-null

diabetes_data.describe()

float64
float64
inte4
inte4

ML _Practical_04 - Jupyter Notebook

Out[5]:
Pregnancles Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age Outcome
count 768.000000 768.000000 766.000000 768.000000 768.000000 768.000000 768.000000
768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.7990479 31.992578 0.471876
33.240885 0.348958
std 3.369578 31.972618 19.355807 15.952218 115.244002 7.834160 0.331329 11.760232
0476951
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000
0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000
0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 28.000000
0.000000
75% 6.000000 140.250000 80.000000 32,000000 127.250000 36.600000 0.626250 41.000000
1.000000
max 17.000000 199.000000 122.000000 99.000000 B846.000000 67.100000 2.420000
81.000000 1.000000
In [6]:
diabetes_data.describe().T
Out[6]:
count mean std min 25% 50% 75% max
Pregnancies 768.0 3.845052 3.369578 0.000 1.00000 3.0000 6.00000 17.00
Glucose 768.0 120.894531 31.972618 0.000 99.00000 117.0000 140.25000 199.00
BloodPressure 768.0 69.105469 19.355807 0.000 62.00000 72.0000 80.00000 122.00
SkinThickness 768.0 20.536458 15952218 0.000 0.00000 23.0000 32.00000 99.00
Insulin 768.0 79.799479 116.244002 0.000 0.00000 30.5000 127.25000 846.00
BMI 768.0 31.992578 7.884160 0.000 27.30000 32.0000 36.60000 67.10
Pedigree 768.0 0.471876 0.331320 0.078 0.24375 0.3725 0.62625 242
Age 768.0 33.240885 11760232 21.000 24.00000 29.0000 41.00000 81.00
Outcome 768.0 0.348958 0.476951 0.000 0.00000 0.0000 1.00000 1.00

Checking Null Values

In [7]:

diabetes_data_copy = diabetes_data.copy(deep = True)

diabetes_data_copy[['Glucose', 'BloodPressure’, 'SkinThickness', 'Insulin','BMI']]

= diabetes_data_copy[[ 'Glucose', 'BloodPressure’, 'SkinThicl
##t showing the count of Nans
print (diabetes_data_copy.isnull().sum())

Pregnancies

Glucose

BloodPressure
SkinThickness

Insulin

BMI

Pedigree

Age

Outcome

dtype: int64
227
374

localhost:8888/notebooks/ML_Practical 04.ipynb

2111
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

To fill these Nan values the data distribution needs to be understood

In [8]: |
p = diabetes_data.hist(figsize = (2e,20))

Pregnancies Glucose BloodPressure

250

200
150
150
125 150
100
100
75 100
] : 1 ]
, i 1. . = _N J . | __ mm -——
00 25 50 [1] 25 50 75 100 125 150 20 40 60 80 1

75 100 125 150 175 175 200 0 00 120

8
8

SkinThickness Insulin BMI

500
250
200
400
200
150
300
150
100 200
100
] ) ] i
0 = 0 He , mm HN -—
0 20 40 80 100 0 200 400 600 800 0 10 20 30 40 50 70
Pedigree Age Outcome
300 500
300
250
250 #00,
200
200 300
150
150
200
100 100
100
, | , -_ ,

0.0 05 10 15 20 25 20 30 40 70 80 0.0 0.2 04 06 08 1.0

Aiming to impute nan values for the columns in accordance with their distribution
4 4
In [9]: |

diabetes_data_copy[ 'Glucose'].fillna(diabetes_data_copy[ 'Glucose'].mean(),

inplace = True)
diabetes_data_copy[ 'BlocdPressure'].fillna(diabetes_data_copy[ 'BloodPressure'].me
an(), inplace = True)
diabetes_data_copy[ 'SkinThickness'].fillna(diabetes_data_copy[ 'SkinThickness'].me
dian(), inplace = True)
diabetes_data_copy[ 'Insulin'].fillna(diabetes_data_copy[ 'Insulin'].median(),
inplace = True)
diabetes_data_copy[ 'BMI'].fillna(diabetes_data_copy[ 'BMI'].median(), inplace =
True)

| >

Plotting after Nan removal

localhost:8888/notebooks/ML_Practical 04.ipynb 3M
10/18/23, 2:51 PM

In [10]:

p = diabetes_data_copy.hist(figsize = (20,20))

Pregnancies

,
. bs
00 25 50

75 100 125 150 175

SkinThickness

20 40

Pedigree

200
150
I
, CE =
00 05 1.0 15 20

Identify Shape Of Dataset

In [11]:

diabetes_data.shape

out[11]:
(768, 9)

Data Type Analysis

localhost:8888/notebooks/ML_Practical 04.ipynb

BS]

300
0

|
40 60

100

ML _Practical_04 - Jupyter Notebook

Glucose

120

Insulin

140

160

180 200

BloodPressure

I's
20 40 80 80 100 120

0.0

20 30

Outcome

4/11
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

In [12]:

sns.countplot(y=diabetes_data.dtypes ,data=diabetes_data)
plt.xlabel( "count of each data type")

plt.ylabel(“"data types")

plt.show()

int64

data types

float64

0 1 2 3 4 5 6 7

count of each data type

checking the balance of the data

In [13]:

color_wheel = {1: "#@8392cf",

2: "#7bce43"}

colors = diabetes_data[ "Outcome”].map(lambda x: color_wheel.get(x + 1))

print(diabetes_data.Outcome.value_counts())
p=diabetes_data.Outcome.value_counts().plot(kind="bar")

[] 500
1 268
Name: Outcome, dtype: int64

400

100

Heatmap for unclean data

localhost:8888/notebooks/ML_Practical 04.ipynb

5111
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

In [14]:
plt.figure(figsize=(12,10))

# on this line I just set the size of figure to 12 by 16.

p=sns.heatmap(diabetes_data.corr(), annot=True,cmap ='RdY1Gn'})

Pregnancies ; ; 0.018

Glucose 0.15 0.057

BloodPressure . i . 0.041

0.075

SkinThickness { 0.057

Insulin

BMI 0.018

0.034

@
4

Pedigree

Age

Outcome

Pregnancies
Glucose
BloodPressure
SkinThickness
Insulin

BMI
Pedigree
Qutcome

Heatmap for clean data

localhost:8888/notebooks/ML_Practical 04.ipynb 6/11

10/18/23, 2:51 PM

In [15]:

plt.figure(figsize=(12,10))
# on this line I just set the size of figure to 12 by 16.
p=sns.heatmap(diabetes_data_copy.corr(), annot=True,cmap ='RdYlGn'}

Pregnancies

Glucose

BloodPressure

SkinThickness

Insulin

BMI

Pedigree

Age

Outcome

0.082

ML _Practical_04 - Jupyter Notebook

0.045

0.034

[2 © [2] [2] [= — [4] ©

@ In] = 7] = = ©
§ 8 = g 3 a 2 <
3 wn = c 5
2 Io} g 8 == B
[=] 0 = o
I) ° =
aT 8 =
@ [7]
Scaling the data
In [16]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = pd.DataFrame(sc_X.fit_transform(diabetes_data_copy.drop(["Outcome"],axis =
1),),
columns=[ ‘Pregnancies’, 'Glucose', 'BloodPressure’, ‘'SkinThickness', ‘Insulin’,
'BMI', 'DiabetesPedigreeFunction’, 'Age'])
In [17]:
X.head()
out[17]:
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
DiabetesPedigresFunction Age
0 0.639947 0.865108 -0.033518 0.670643 -0.181541 0.166619 0.468492 1.425995
1 0.844885 -1.206162 -0.529859 -0.012301 -0.181541 -0.852200 -0.365061 -0.190672
2 1.233880 2.015813 -0.695306 -0.012301 -0.181541 -1.332500 0.604397 -0.105584
3 -0.844885 -1.074852 -0.529859 -0.695245 -0.540642 -0.633881 -0.920763 -1.041549
4 -1.141852 0.503458 -2.680669 0.670643 0.316566 1.549303 5.484909 -0.020496

localhost:8888/notebooks/ML_Practical 04.ipynb

Qutcome

-06

-04

7M
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

In [18]: |

#X = diabetes_data.drop( "Outcome", axis = 1)

y = diabetes_data_copy.Outcome

Splitting the dataset

In [19]: |

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=1/3,random_state=42,
stratify=y)

In [20]: |

from sklearn.neighbors import KNeighborsClassifier

test_scores = []
train_scores = []
for i in range(1,15):
knn = KNeighborsClassifier(i)
knn.fit(X_train,y_train)
train_scores.append{knn.score(X_train,y_train)}
test_scores.append(knn.score(X_test,y_test))

In [21]: |

## score that comes from testing on the same datapoints that were used for training

max_train_score = max(train_scores)

train_scores_ind = [i for i, v in enumerate(train_scores) if v == max_train_score]

print('Max train score {} % and k = {}'.format(max_train_score*100,list{map(lambda

x: x+1, train_scores_ind)}))

Max train score 79.6875 %¥ and k = [1]

In [22]: |

## score that comes from testing on the datapoints that were split in the beginning
to be used for testing solely
max_test_score = max(test_scores)

test_scores_ind = [1 for 1, v in enumerate(test_scores) if v == max_test_score]

print('Max test score {} %¥ and k = {}'.format(max_test_score*100,list(map(lambda

x: x+1, test_scores_ind))))

Max test score 73.4375 ¥ and k = [1]

Result Visualisation

In [23]: |

plt.figure(figsize=(12,5))
p = sns.lineplot(data=(range(1,15),train_scores),marker="'*',label="'Train Score')
p = sns.lineplot(data=(range(1,15),test_scores),marker="0"',label="Test Score’)
14 —.— Train Score ®
— - = Train Score |
y===) be

L )
—e— Test Score
10

~o~ Test Score

a —
co sa

The best result is captured at k = 11 hence 11 is used for the final mode

localhost:8888/notebooks/ML_Practical 04.ipynb 8/11

10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

Setup a knn classifier with k neighbor

In [24]:
knn = KNeighborsClassifier(11)

knn.fit(X_train,y_train)
knn.score(X_test,y_test)

out[24]:
0.765625

In [25]:

print(train_scores)

[8.796875]
Model Performance Analysis

confusion Matrix

In [26]:

#import confusion_matrix

from sklearn.metrics import confusion_matrix

#let us get the predictions using the classifier we had fit above
y_pred = knn.predict(X_test)

confusion _matrix(y_test,y_pred)

pd.crosstab(y_test, y pred, rownames=['True'], colnames=['Predicted'],

margins=True)

out[26]:

Predicted c 1 Al
True

0 142 25 167

1 35 54 89

All 177 79 256

localhost:8888/notebooks/ML_Practical 04.ipynb

9/11
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

In [27]:

y_pred = knn.predict(X_test)
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)

p = sns.heatmap(pd.DataFrame(cnf matrix), annot=True, cmap="Y1GnBu" ,fmt='g’)

plt.title('Confusion matrix’, y=1.1)

plt.ylabel('Actual label’)
plt.xlabel('Predicted label")

out[27]:
Text(.5, 20.049999999999997, 'Predicted label')

Confusion matrix

a 35 54

[4] 1

Predicted label

Classification Report (Report which includes Precision, Recall and F1-Score)

In [28]:
#import classification report

from sklearn.metrics import classification_report

print(classification_report(y_test,y_pred))

precision recall fl-score support

] 0.80 9.85 2.83 167

1 0.68 9.61 9.64 89

accuracy 8.77 256

macro avg 0.74 8.73 8.73 256
weighted avg 0.76 2.77 8.76 256

In [29]:

from sklearn.metrics import roc_curve

y_pred_proba = knn.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

localhost:8888/notebooks/ML_Practical 04.ipynb
140

120

100

10/11
10/18/23, 2:51 PM ML _Practical_04 - Jupyter Notebook

In [30]:

plt.plot([@,1],[0,1], 'k--"}
plt.plot(fpr,tpr, label="Knn')
plt.xlabel(’fpr'}

plt.ylabel('tpr'})
plt.title('Knn{n_neighbors=11) ROC curve")
plt.show()

Knn{n_neighbors=11) ROC curve

tpr

0.4

0.0

0.0 02 04 0.6 0.8 1.0

Area under ROC curve

In [31]:

from sklearn.metrics import roc_auc_score

roc_auc_score(y_test,y_pred_proba)

Out[31]:

Hyper Parameter optimization

In [32]:

#import GridSearchCV
from sklearn.model_ selection import GridSearchcv

#In case of classifier Like knn the parameter to be tuned is n_neighbors

param_grid = {'n_neighbors':np.arange(1,50)}

knn = KNeighborsClassifier()

knn_cv= GridSearchCv(knn,param_grid,cv=5)

knn_cv.fit(X,y)
print(“"Best Score:
print("Best Parameters:

+ str(knn_cv.best_score_))
" + str(knn_cv.best_params_))
Best Score:9.7721848251252015
Best Parameters: {'n_neighbors': 25}

localhost:8888/notebooks/ML_Practical 04.ipynb

11/11

Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Fallopian Tube Presentation
100% (3)
Fallopian Tube Presentation
41 pages
Med Surg Test 4 Study Guide
100% (2)
Med Surg Test 4 Study Guide
29 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
diabetes-prediction-using-machine-learning
No ratings yet
diabetes-prediction-using-machine-learning
16 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Project
No ratings yet
Project
8 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
healthcare-project-simplilearn- Week1
No ratings yet
healthcare-project-simplilearn- Week1
6 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Diabetes
No ratings yet
Diabetes
97 pages
ML Proj Diabetes.pptx
No ratings yet
ML Proj Diabetes.pptx
51 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
Ml4.ipynb - Colab
No ratings yet
Ml4.ipynb - Colab
3 pages
Diabetes
No ratings yet
Diabetes
10 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
1 page
Project 10 Movie Recommendation - Ipynb - Colaboratory
No ratings yet
Project 10 Movie Recommendation - Ipynb - Colaboratory
6 pages
Logidtic_Regression_ASSIGNMENT
No ratings yet
Logidtic_Regression_ASSIGNMENT
13 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
Exp 5
No ratings yet
Exp 5
7 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Experiment 4
No ratings yet
Experiment 4
5 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes
No ratings yet
Diabetes
7 pages
KNN For Classification
No ratings yet
KNN For Classification
4 pages
24MCB0021_VL2024250505870_AST03
No ratings yet
24MCB0021_VL2024250505870_AST03
4 pages
Project 190
No ratings yet
Project 190
6 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
مختار النعيري - The Course Work Submission (1)
No ratings yet
مختار النعيري - The Course Work Submission (1)
31 pages
Univariate and Multivariate Analysis - Jupyter Notebook
No ratings yet
Univariate and Multivariate Analysis - Jupyter Notebook
5 pages
RA2111003011432
No ratings yet
RA2111003011432
3 pages
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
No ratings yet
Pima Indian Diabetes Data Analysis in Python - Canopus Business Management Group
21 pages
ML Minor May
No ratings yet
ML Minor May
5 pages
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
No ratings yet
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
71 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
diabetes_test report
No ratings yet
diabetes_test report
62 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
ML Practical 3D
No ratings yet
ML Practical 3D
4 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Pima Indians Diabetes Database Analysis - Kaggle
No ratings yet
Pima Indians Diabetes Database Analysis - Kaggle
37 pages
vertopal.com_python2025
No ratings yet
vertopal.com_python2025
25 pages
My Code
No ratings yet
My Code
7 pages
5
No ratings yet
5
5 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
# Diabetes: Pandas PD Numpy NP Seaborn Sns
No ratings yet
# Diabetes: Pandas PD Numpy NP Seaborn Sns
4 pages
Cia 2 ML 2348352
No ratings yet
Cia 2 ML 2348352
6 pages
AI Diabetes Expt 10
No ratings yet
AI Diabetes Expt 10
2 pages
Project 3 - Diabetes Prediction.ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction.ipynb - Colab
4 pages
DAL Experiment Outputs 6to10
No ratings yet
DAL Experiment Outputs 6to10
16 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
healthcare-project-simplilearn- Week2
No ratings yet
healthcare-project-simplilearn- Week2
8 pages
Macro Economics: A Simplified Detailed Edition for Students Understanding Fundamentals of Macroeconomics
From Everand
Macro Economics: A Simplified Detailed Edition for Students Understanding Fundamentals of Macroeconomics
Hesbon R.M
No ratings yet
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
From Everand
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
Dean Abbott
No ratings yet
Potassium Chloride
50% (2)
Potassium Chloride
2 pages
Biochemistry "Lipids Test"
No ratings yet
Biochemistry "Lipids Test"
8 pages
Estradiol Valerate 2mg + Nongestrel 0.5mg (PROGYLUTON)
No ratings yet
Estradiol Valerate 2mg + Nongestrel 0.5mg (PROGYLUTON)
12 pages
JCM 09 01616 v2
No ratings yet
JCM 09 01616 v2
12 pages
Vitros XT 3400 KR
No ratings yet
Vitros XT 3400 KR
2 pages
HSG TIẾNG ANH 2021-2022 LẦN 2 -converted
No ratings yet
HSG TIẾNG ANH 2021-2022 LẦN 2 -converted
19 pages
NUMC 101B Endocrine System Laboratory Exercise
100% (1)
NUMC 101B Endocrine System Laboratory Exercise
17 pages
Gzoo111-Rs-Finals Olfu Reviewer
No ratings yet
Gzoo111-Rs-Finals Olfu Reviewer
4 pages
60 TH Male Spermatic Cord Hydrocele
No ratings yet
60 TH Male Spermatic Cord Hydrocele
2 pages
Diabetes Mellitus NCP
No ratings yet
Diabetes Mellitus NCP
6 pages
CHEM45U - GROUP1 - Case Study On Enzymes
No ratings yet
CHEM45U - GROUP1 - Case Study On Enzymes
49 pages
PCOS
No ratings yet
PCOS
32 pages
Miracle Molecule Cheat Sheet for Muscle Building Channel
No ratings yet
Miracle Molecule Cheat Sheet for Muscle Building Channel
13 pages
Dr. Apt. Yudi Wicaksono, S.Si., M.Si
No ratings yet
Dr. Apt. Yudi Wicaksono, S.Si., M.Si
22 pages
Radioactive Isotopes
No ratings yet
Radioactive Isotopes
23 pages
Complete Download Medical Biochemistry 5th Edition John W. Baynes PDF All Chapters
100% (2)
Complete Download Medical Biochemistry 5th Edition John W. Baynes PDF All Chapters
47 pages
Bio D the-Natural-Choice-For-Vitamin-D Brochure en v02 0317
No ratings yet
Bio D the-Natural-Choice-For-Vitamin-D Brochure en v02 0317
4 pages
21.iron Deficiency Anemia
No ratings yet
21.iron Deficiency Anemia
10 pages
Anatomy & Physiology (Chapter 19 - Reproductive System)
No ratings yet
Anatomy & Physiology (Chapter 19 - Reproductive System)
30 pages
Effects of Hormones in The Body
No ratings yet
Effects of Hormones in The Body
54 pages
Microsoft Word - Libs - Task Oigsci 11 - 0610 - 22 2020
No ratings yet
Microsoft Word - Libs - Task Oigsci 11 - 0610 - 22 2020
7 pages
St. Thomas Academy: Pob. 3, City of Sto. Tomas, Batangas SY 2021 - 2022
No ratings yet
St. Thomas Academy: Pob. 3, City of Sto. Tomas, Batangas SY 2021 - 2022
4 pages
Esteves-Stanford-2024-Exercise-As-A-Tool-To-Mitigate-Metabolic-Disease-1-7 Es
No ratings yet
Esteves-Stanford-2024-Exercise-As-A-Tool-To-Mitigate-Metabolic-Disease-1-7 Es
14 pages
Science Reviewer 3rd Quarter For Grade 10 Biology
No ratings yet
Science Reviewer 3rd Quarter For Grade 10 Biology
19 pages
Surgery Review Manual
No ratings yet
Surgery Review Manual
158 pages
Drugs Hypo Hyper
No ratings yet
Drugs Hypo Hyper
5 pages
BIOCHEMISTRY OF PROTEINS &amino Acids-INTRODUCTION
No ratings yet
BIOCHEMISTRY OF PROTEINS &amino Acids-INTRODUCTION
35 pages
Body Generators
No ratings yet
Body Generators
16 pages

ML Practical 04

Uploaded by

ML Practical 04

Uploaded by

10/18/23, 2:51 PM

import matplotlib.pyplot as plt

warnings. filterwarnings(’ ignore

Loading The Dataset

ML _Practical_04 - Jupyter Notebook

Check Information About Dataset

BloodPressure 768 non-null

ML _Practical_04 - Jupyter Notebook

Checking Null Values

diabetes_data_copy = diabetes_data.copy(deep = True)

diabetes_data_copy[['Glucose', 'BloodPressure’, 'SkinThickness', 'Insulin','BMI']]

To fill these Nan values the data distribution needs to be understood

Pregnancies Glucose BloodPressure

75 100 125 150 175 175 200 0 00 120

SkinThickness Insulin BMI

0.0 05 10 15 20 25 20 30 40 70 80 0.0 0.2 04 06 08 1.0

diabetes_data_copy[ 'Glucose'].fillna(diabetes_data_copy[ 'Glucose'].mean(),

Plotting after Nan removal

75 100 125 150 175

Identify Shape Of Dataset

Data Type Analysis

ML _Practical_04 - Jupyter Notebook

count of each data type

checking the balance of the data

color_wheel = {1: "#@8392cf",

colors = diabetes_data[ "Outcome”].map(lambda x: color_wheel.get(x + 1))

Heatmap for unclean data

# on this line I just set the size of figure to 12 by 16.

Glucose 0.15 0.057

Heatmap for clean data

localhost:8888/notebooks/ML_Practical 04.ipynb 6/11

ML _Practical_04 - Jupyter Notebook

[2 © [2] [2] [= — [4] ©

#X = diabetes_data.drop( "Outcome", axis = 1)

Splitting the dataset

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

train_scores_ind = [i for i, v in enumerate(train_scores) if v == max_train_score]

print('Max train score {} % and k = {}'.format(max_train_score*100,list{map(lambda

Max train score 79.6875 %¥ and k = [1]

test_scores_ind = [1 for 1, v in enumerate(test_scores) if v == max_test_score]

print('Max test score {} %¥ and k = {}'.format(max_test_score*100,list(map(lambda

Max test score 73.4375 ¥ and k = [1]

~o~ Test Score

localhost:8888/notebooks/ML_Practical 04.ipynb 8/11

Setup a knn classifier with k neighbor

from sklearn.metrics import confusion_matrix

pd.crosstab(y_test, y pred, rownames=['True'], colnames=['Predicted'],

All 177 79 256

p = sns.heatmap(pd.DataFrame(cnf matrix), annot=True, cmap="Y1GnBu" ,fmt='g’)

plt.title('Confusion matrix’, y=1.1)

Classification Report (Report which includes Precision, Recall and F1-Score)

from sklearn.metrics import classification_report

precision recall fl-score support

] 0.80 9.85 2.83 167

1 0.68 9.61 9.64 89

accuracy 8.77 256

from sklearn.metrics import roc_curve

Knn{n_neighbors=11) ROC curve

0.0 02 04 0.6 0.8 1.0

Area under ROC curve

from sklearn.metrics import roc_auc_score

Hyper Parameter optimization

#In case of classifier Like knn the parameter to be tuned is n_neighbors

You might also like