0% found this document useful (0 votes)

151 views38 pages

Heart Failure Prediction EDA & Modeling

This document discusses predicting heart failure from a clinical dataset using machine learning models. It contains information on heart failure, loads and explores a heart failure dataset with 299 entries and 13 features, and prepares to conduct exploratory data analysis and build models to predict heart failure occurrence with over 95% accuracy and 93% F1 score.

Uploaded by

Saso Nikolovski

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views38 pages

Heart Failure Prediction EDA & Modeling

Uploaded by

Saso Nikolovski

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Heart Failure Prediction

What is Heart Failure?

Heart Failure is a condition when the heart muscle does not pump blood as well as it should
to meet the body's demands. Blood is the most important fluid that circulates throughout the
body by supplying oxygen to all the parts of the body

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an
estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four
out of 5CVD deaths are due to heart attacks and strokes, and one-third of these deaths
occur prematurely in people under 70 years of age. Heart failure is a common event caused
by CVDs and this dataset contains 11 features that can be used to predict a possible heart
disease

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 1/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

People with cardiovascular disease or who are at high cardiovascular risk (due to the
presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or
already established disease) need early detection and management wherein a machine
learning model can be of great help

This dataset contains person's information like age sex blood pressure smoke diabetes ejection fraction

Library
In [379]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
colors = ['#97C1A9','#FFFFFF']

import imblearn
from collections import Counter
from imblearn.over_sampling import SMOTE

from sklearn.preprocessing import MinMaxScaler,StandardScaler

from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_auc_score
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import precision_recall_curve

import warnings
warnings.filterwarnings("ignore")

Dataset
In [380]:

data=pd.read_csv('/content/drive/MyDrive/projek/heart_failure_clinical_records_dataset.cs

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 2/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [381]:

data.head()

Out[381]:

age anaemia creatinine_phosphokinase diabetes ejection_fraction high_blood_pressure

0 75.0 0 582 0 20 1

1 55.0 0 7861 0 38 0

2 65.0 0 146 0 20 0

3 50.0 1 111 0 20 0

4 65.0 1 160 1 20 0

Dataset Attributes

Age : age [years]

anaemia : Decrease of red blood cells or hemoglobin (boolean)
creatinine_phosphokinase : Level of the CPK enzyme in the blood (mcg/L)
diabetes : If the patient has diabetes (boolean)
ejection_fraction : Percentage of blood leaving the heart at each contraction (percentage)

high_blood_pressure : If the patient has hypertension (boolean)

platelets : Platelets in the blood (kiloplatelets/mL)
serum_creatinine : Level of serum creatinine in the blood (mg/dL)
serum_sodium : Level of serum sodium in the blood (mEq/L)
sex : Woman or man (binary)
smoking : If the patient smokes or not (boolean)
time : Follow-up period (days)
DEATH_EVENT : If the patient deceased during the follow-up period (boolean)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 3/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Data Info
In [382]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 299 non-null float64
1 anaemia 299 non-null int64
2 creatinine_phosphokinase 299 non-null int64
3 diabetes 299 non-null int64
4 ejection_fraction 299 non-null int64
5 high_blood_pressure 299 non-null int64
6 platelets 299 non-null float64
7 serum_creatinine 299 non-null float64
8 serum_sodium 299 non-null int64
9 sex 299 non-null int64
10 smoking 299 non-null int64
11 time 299 non-null int64
12 DEATH_EVENT 299 non-null int64
dtypes: float64(3), int64(10)
memory usage: 30.5 KB

In [383]:

data.shape

Out[383]:

(299, 13)

In [384]:

data.columns

Out[384]:

Index(['age', 'anaemia', 'creatinine_phosphokinase', 'diabetes',

'ejection_fraction', 'high_blood_pressure', 'platelets',
'serum_creatinine', 'serum_sodium', 'sex', 'smoking', 'time',
'DEATH_EVENT'],
dtype='object')

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 4/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [385]:

data.describe().T

Out[385]:

count mean std min 25% 50%

age 299.0 60.833893 11.894809 40.0 51.0 60.0

anaemia 299.0 0.431438 0.496107 0.0 0.0 0.0

creatinine_phosphokinase 299.0 581.839465 970.287881 23.0 116.5 250.0

diabetes 299.0 0.418060 0.494067 0.0 0.0 0.0

ejection_fraction 299.0 38.083612 11.834841 14.0 30.0 38.0

high_blood_pressure 299.0 0.351171 0.478136 0.0 0.0 0.0

platelets 299.0 263358.029264 97804.236869 25100.0 212500.0 262000.0 3

serum_creatinine 299.0 1.393880 1.034510 0.5 0.9 1.1

serum_sodium 299.0 136.625418 4.412477 113.0 134.0 137.0

sex 299.0 0.648829 0.478136 0.0 0.0 1.0

smoking 299.0 0.321070 0.467670 0.0 0.0 0.0

time 299.0 130.260870 77.614208 4.0 73.0 115.0

DEATH_EVENT 299.0 0.321070 0.467670 0.0 0.0 0.0

In [386]:

data.isnull().mean()*100

Out[386]:

age 0.0
anaemia 0.0
creatinine_phosphokinase 0.0
diabetes 0.0
ejection_fraction 0.0
high_blood_pressure 0.0
platelets 0.0
serum_creatinine 0.0
serum_sodium 0.0
sex 0.0
smoking 0.0
time 0.0
DEATH_EVENT 0.0
dtype: float64

EDA

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 5/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [387]:

data['age'] = data['age'].astype(int)
data['platelets'] = data['platelets'].astype(int)
df = data.copy(deep = True)

In [388]:

df.loc[df['DEATH_EVENT']==0,'Status']='Survived'
df.loc[df['DEATH_EVENT']==1,'Status']='Not Survived'

In [389]:

col = list(data.columns)
categorical_features = []
numerical_features = []
for i in col:
if len(data[i].unique()) > 6:
numerical_features.append(i)
else:
categorical_features.append(i)

print('Categorical Features :',*categorical_features)

print('Numerical Features :',*numerical_features)

Categorical Features : anaemia diabetes high_blood_pressure sex smoking DE

ATH_EVENT
Numerical Features : age creatinine_phosphokinase ejection_fraction platel
ets serum_creatinine serum_sodium time

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 6/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Target - Death Event

In [390]:

sns.set(style='white')

fig = plt.subplots(1,2,figsize = (13,4))

plt.subplot(1,2,1)
df['Status'].value_counts().plot.pie(explode=[0.1,0.1], autopct='%1.1f%%', shadow=Tru

plt.subplot(1,2,2)
ax=sns.countplot(data=df, x='Status',palette = colors,edgecolor = 'k')
ax.bar_label(ax.containers[0])

plt.suptitle('Death Event')

Out[390]:

Text(0.5, 0.98, 'Death Event')

The dataset has very low data points (299)

The dataset is unbalanced with 2:1 ratio for No Death Event cases : Death Event cases
Visualizations and Predictions will be biased towards No Death Event cases.

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 7/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Categorical Features
In [391]:

# Categorical Plot
def catplot(df,x):
sns.set(style='white')
fig = plt.subplots(1,3,figsize = (15,4))
plt.subplot(1,3,1)
df[x].value_counts().plot.pie(explode=[0.1,0.1], autopct='%1.1f%%', shadow=True, text

plt.subplot(1,3,2)
ax=sns.histplot(data=df,x=x,kde = True,color=colors[0],edgecolor = 'k')
ax.bar_label(ax.containers[0])
# ax.set_xlim(-1,2)
# ax.set_xticks(range(-1,2))

plt.subplot(1,3,3)
ax=sns.countplot(data=df, x=x, hue='Status',palette = colors,edgecolor = 'k')
for container in ax.containers:
ax.bar_label(container)
tit = x + ' vs Death Event'
plt.suptitle(tit)

In [392]:

# for i in range(len(categorical_features)):
# catplot(df,categorical_features[i])

Anemia

In [393]:

catplot(df,'anaemia')

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 8/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Diabetes

In [394]:

catplot(df,'diabetes')

high_blood_pressure

In [395]:

catplot(df,'high_blood_pressure')

sex

In [396]:

catplot(df,'sex')

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 9/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

smoking

In [397]:

catplot(df,'smoking')

Sumary

Categorical Features Insight :

All the graphs have the same pattern

There are more cases of male population

Categorical Features Sumary :

anaemia : Anaemia = No Anaemia

diabetes : Diabetes = No Diabetes
high_blood_pressure : High Blood Pressure > No High Blood Pressure (Needs more data)
sex : Male > Female
smoking : No Smoking > Smoking

Genaral Information

anaemia : High chances of heart failures due to anaemia.

diabetes : High chances of heart failures due to diabetes.
high blood pressure : High chances with heart failures due to high blood pressure.
sex : male > female but by small margin are prone to more heart failures.
smoking : Smoking increases the chances of suffering from heart failures.

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 10/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Numerical Features
In [398]:

# Numerical Plot
def numplot(df,x,scale):
sns.set(style='whitegrid')
fig = plt.subplots(2,1,figsize = (15,11))

plt.subplot(2,1,1)
ax=sns.histplot(data=df, x=x, kde=True,color=colors[0],edgecolor = 'k')
ax.bar_label(ax.containers[0])
tit=x + ' distribution'
plt.title(tit)

plt.subplot(2,1,2)
tar=x + '_group'
Tstr= str(scale)
tit2=x + ' vs Death Event ( ' + Tstr + ' : 1 )'
df[tar] = [ int(i / scale) for i in df[x]]
ax=sns.countplot(data=df, x=tar, hue='Status',palette = colors,edgecolor = 'k')
for container in ax.containers:
ax.bar_label(container)
plt.title(tit2)

age

In [399]:

numplot(df,'age',5)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 11/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Creatinine Phosphokinase

In [400]:

numplot(df,'creatinine_phosphokinase',100)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 12/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

ejection_fraction

In [401]:

numplot(df,'ejection_fraction',10)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 13/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

platelets

In [402]:

numplot(df,'platelets',10**5)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 14/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

serum_creatinine

In [403]:

numplot(df,'serum_creatinine',1)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 15/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

serum_sodium

In [404]:

numplot(df,'serum_sodium',5)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 16/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

time

In [405]:

numplot(df,'time',10)

Sumary

Numerical Features Insight :

Cases of DEATH_EVENT initiate from the age of 45. Some specific peaks of high cases of
DEATH_EVENT can be observed at 45, 50, 60, 65, and 70
High cases of DEATH_EVENT can be observed for ejaction_fraction values from 20 - 60.
serum_creatinine values from 0.6 to 3.0 have higher probability to lead to DEATH_EVENT.
serum_sodium values 127 - 145 indicate towards a DEATH_EVENT due to heart failure.
DEATH_EVENT cases are on a high for the values between 0(0x100) - 500(5x100) for
creatinine_phosphokinase.
platelets values between 0(0x10^5) - 400,000(4x10^5) are prone to heart failures leading to
DEATH_EVENT.
For the time feature, values from 0(0x10) - 60(6*10) have higher probability to lead to a DEATH_EVENT.

Categorical Features Sumary :

age : 50 - 70
creatinine_phosphokinase : 0 - 500
ejaction_fraction : 20 - 40
platelets : 200,000 - 300,000
serum_creatinine : 1 - 2
serum_sodium : 130 - 140
localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 17/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

time : 0 - 50

General Information

age : General aging leads to heart failures.

creatinine_phosphokinase : > 120 mcg/L
ejection_fraction : Normal Range 55% - 70%. Below 55% is prone to heart failures.
platelets : Low and very high values of platelets led to heart failure.
serum_creatinine : 0.8 - 1.7 (mg/dL) is the range of values that leads to most heart failures.
serum_sodium : Above 130 (mEq/L), chances of heart failure increases by alot.
time : Ideal follow-up period is of 14 days. Anything above 14 days can lead to worse situations.

Features Engineering

Scaling

In [406]:

mms = MinMaxScaler() # Normalization

ss = StandardScaler() # Standardization

# Normalization
df['age'] = mms.fit_transform(df[['age']])
df['creatinine_phosphokinase'] = mms.fit_transform(df[['creatinine_phosphokinase']])
df['ejection_fraction'] = mms.fit_transform(df[['ejection_fraction']])
df['serum_creatinine'] = mms.fit_transform(df[['serum_creatinine']])
df['time'] = mms.fit_transform(df[['time']])

# Standardization
df['platelets'] = ss.fit_transform(df[['platelets']])
df['serum_sodium'] = ss.fit_transform(df[['serum_sodium']])
df.head()

Out[406]:

age anaemia creatinine_phosphokinase diabetes ejection_fraction high_blood_press

0 0.636364 0 0.071319 0 0.090909

1 0.272727 0 1.000000 0 0.363636

2 0.454545 0 0.015693 0 0.090909

3 0.181818 1 0.011227 0 0.090909

4 0.454545 1 0.017479 1 0.090909

5 rows × 21 columns

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 18/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Correlation

In [407]:

corr = data.corrwith(data['DEATH_EVENT']).sort_values(ascending = False).to_frame()

corr.columns = ['DEATH_EVENT']

plt.subplots(figsize = (5,5))
sns.heatmap(corr,annot = True,cmap = colors,linewidths = 0.4,linecolor = 'black');

plt.title('DEATH_EVENT Correlation');

Insight :

Features like high_blood_pressure, anaemia, creatinine_phosphokinase, diabetes, sex,

smoking, and platelets do not display any kind of correlation with DEATH_EVENT.

We will create 2 models :

Based on the statistical test, we will drop the following features : high_blood_pressure, anaemia,
creatinine_phosphokinase, diabetes, sex, smoking, and platelets
Based on the General information., we will drop the following features : sex, platelets.

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 19/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [408]:

df1=data.copy()
df2=data.copy()

# Dataset for model based on Statistical Test :

df1 = df1.drop(columns = ['anaemia', 'diabetes', 'high_blood_pressure', 'sex', 'smoking',

# Dataset for model based on General Information :

df2 = df2.drop(columns = ['sex','platelets'])

Data Balancing

In [409]:

over = SMOTE()

f1 = df1.iloc[:,:5].values
t1 = df1.iloc[:,5].values
f1, t1 = over.fit_resample(f1, t1)
Counter(t1)

Out[409]:

Counter({1: 203, 0: 203})

In [410]:

over = SMOTE()

f2 = df2.iloc[:,:10].values
t2 = df2.iloc[:,10].values
f2, t2 = over.fit_resample(f2, t2)
Counter(t2)

Out[410]:

Counter({1: 203, 0: 203})

Model
In [411]:

x_train1, x_test1, y_train1, y_test1 = train_test_split(f1, t1, test_size = 0.15, random_

x_train2, x_test2, y_train2, y_test2 = train_test_split(f2, t2, test_size = 0.15, random_

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 20/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [412]:

def model(classifier,x_train,y_train,x_test,y_test):
sns.set(rc={'figure.figsize':(5,3)})
sns.set(style='whitegrid')
classifier.fit(x_train,y_train)
prediction = classifier.predict(x_test)
cv = RepeatedStratifiedKFold(n_splits = 10,n_repeats = 3,random_state = 1)
print("Cross Validation Score : ",'{0:.2%}'.format(cross_val_score(classifier,x_train
print("ROC_AUC Score : ",'{0:.2%}'.format(roc_auc_score(y_test,prediction)))
# plot_roc_curve(classifier, x_test,y_test)
RocCurveDisplay.from_estimator(classifier, x_test,y_test)
plt.title('ROC_AUC_Plot')
plt.show()

def model_evaluation(classifier,x_test,y_test):

# Confusion Matrix
cm = confusion_matrix(y_test,classifier.predict(x_test))
names = ['True Neg','False Pos','False Neg','True Pos']
counts = [value for value in cm.flatten()]
percentages = ['{0:.2%}'.format(value) for value in cm.flatten()/np.sum(cm)]
labels = [f'{v1}\n{v2}\n{v3}' for v1, v2, v3 in zip(names,counts,percentages)]
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cm,annot = labels,cmap = 'Blues',fmt ='')

# Classification Report
print(classification_report(y_test,classifier.predict(x_test)))

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 21/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

XGB Classifier

In [413]:

classifier_xgb = XGBClassifier(random_state=1)

model(classifier_xgb,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_xgb,x_test1,y_test1)

Cross Validation Score : 94.59%

ROC_AUC Score : 88.56%

precision recall f1-score support

0 0.86 0.89 0.87 27

1 0.91 0.88 0.90 34

accuracy 0.89 61
macro avg 0.88 0.89 0.88 61
weighted avg 0.89 0.89 0.89 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 22/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [414]:

model(classifier_xgb,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_xgb,x_test2,y_test2)

Cross Validation Score : 94.24%

ROC_AUC Score : 89.27%

precision recall f1-score support

0 0.96 0.81 0.88 27

1 0.87 0.97 0.92 34

accuracy 0.90 61
macro avg 0.91 0.89 0.90 61
weighted avg 0.91 0.90 0.90 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 23/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

LGBMClassifier

In [415]:

classifier_lgbm = LGBMClassifier(random_state=1)

model(classifier_lgbm,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_lgbm,x_test1,y_test1)

Cross Validation Score : 94.60%

ROC_AUC Score : 84.86%

precision recall f1-score support

0 0.85 0.81 0.83 27

1 0.86 0.88 0.87 34

accuracy 0.85 61
macro avg 0.85 0.85 0.85 61
weighted avg 0.85 0.85 0.85 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 24/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [416]:

model(classifier_lgbm,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_lgbm,x_test2,y_test2)

Cross Validation Score : 93.84%

ROC_AUC Score : 91.50%

precision recall f1-score support

0 0.92 0.89 0.91 27

1 0.91 0.94 0.93 34

accuracy 0.92 61
macro avg 0.92 0.92 0.92 61
weighted avg 0.92 0.92 0.92 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 25/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Logistic Regression

In [417]:

classifier_lr = LogisticRegression(random_state = 1)

model(classifier_lr,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_lr,x_test1,y_test1)

Cross Validation Score : 90.15%

ROC_AUC Score : 80.07%

precision recall f1-score support

0 0.78 0.78 0.78 27

1 0.82 0.82 0.82 34

accuracy 0.80 61
macro avg 0.80 0.80 0.80 61
weighted avg 0.80 0.80 0.80 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 26/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [418]:

model(classifier_lr,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_lr,x_test2,y_test2)

Cross Validation Score : 90.57%

ROC_AUC Score : 86.71%

precision recall f1-score support

0 0.85 0.85 0.85 27

1 0.88 0.88 0.88 34

accuracy 0.87 61
macro avg 0.87 0.87 0.87 61
weighted avg 0.87 0.87 0.87 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 27/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Support Vector Classifier

In [419]:

classifier_svc = SVC()

model(classifier_svc,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_svc,x_test1,y_test1)

Cross Validation Score : 88.89%

ROC_AUC Score : 87.47%

precision recall f1-score support

0 0.81 0.93 0.86 27

1 0.93 0.82 0.87 34

accuracy 0.87 61
macro avg 0.87 0.87 0.87 61
weighted avg 0.88 0.87 0.87 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 28/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [420]:

model(classifier_svc,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_svc,x_test2,y_test2)

Cross Validation Score : 82.44%

ROC_AUC Score : 83.39%

precision recall f1-score support

0 0.81 0.81 0.81 27

1 0.85 0.85 0.85 34

accuracy 0.84 61
macro avg 0.83 0.83 0.83 61
weighted avg 0.84 0.84 0.84 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 29/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Gradient Booster Classifier

In [421]:

classifier_grad = GradientBoostingClassifier(random_state=1)

model(classifier_grad,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_grad,x_test1,y_test1)

Cross Validation Score : 94.18%

ROC_AUC Score : 86.33%

precision recall f1-score support

0 0.88 0.81 0.85 27

1 0.86 0.91 0.89 34

accuracy 0.87 61
macro avg 0.87 0.86 0.87 61
weighted avg 0.87 0.87 0.87 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 30/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [422]:

model(classifier_svc,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_svc,x_test2,y_test2)

Cross Validation Score : 82.44%

ROC_AUC Score : 83.39%

precision recall f1-score support

0 0.81 0.81 0.81 27

1 0.85 0.85 0.85 34

accuracy 0.84 61
macro avg 0.83 0.83 0.83 61
weighted avg 0.84 0.84 0.84 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 31/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Random Forest Classifier

In [423]:

classifier_rdf = RandomForestClassifier(random_state=1)

model(classifier_rdf,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_rdf,x_test1,y_test1)

Cross Validation Score : 95.13%

ROC_AUC Score : 84.48%

precision recall f1-score support

0 0.88 0.78 0.82 27

1 0.84 0.91 0.87 34

accuracy 0.85 61
macro avg 0.86 0.84 0.85 61
weighted avg 0.85 0.85 0.85 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 32/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [424]:

model(classifier_svc,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_svc,x_test2,y_test2)

Cross Validation Score : 82.44%

ROC_AUC Score : 83.39%

precision recall f1-score support

0 0.81 0.81 0.81 27

1 0.85 0.85 0.85 34

accuracy 0.84 61
macro avg 0.83 0.83 0.83 61
weighted avg 0.84 0.84 0.84 61

Result

Type Markdown and LaTeX: 𝛼2

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 33/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Dataset 1

Dataset 2

From these results it is found that Dataset 2 shows better results and LGBM is the best model

Hyperparameter Tuning
In [425]:

# pip install flaml

In [426]:

# from flaml import AutoML

# automl = AutoML()
# settings = {
# "time_budget": 1200, # total running time in seconds
# "metric": 'roc_auc', # primary metrics for regression can be chosen from: ['mae','
# "estimator_list": ['lgbm'], # list of ML learners; we tune lightgbm in this exampl
# "task": 'classification', # task type
# "log_file_name": '/content/drive/MyDrive/heart_lg2.log', # flaml log file
# "seed": 1, # random seed
# }
# automl.fit(X_train=x_train2, y_train=y_train2, **settings)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 34/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [427]:

# I used flaml hyperparameter tuning for 20 minutes and got this results

classifier_lgbm = LGBMClassifier(colsample_bytree=0.26649620250942635,
learning_rate=0.02058909150877934, max_bin=127,
min_child_samples=7, n_estimators=184, num_leaves=48,
reg_alpha=0.004090180440029941, reg_lambda=0.0009765625,
verbose=-1)

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 35/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [428]:

model(classifier_lgbm,x_train1,y_train1,x_test1,y_test1)
model_evaluation(classifier_lgbm,x_test1,y_test1)

Cross Validation Score : 94.36%

ROC_AUC Score : 93.36%

precision recall f1-score support

0 0.93 0.93 0.93 27

1 0.94 0.94 0.94 34

accuracy 0.93 61
macro avg 0.93 0.93 0.93 61
weighted avg 0.93 0.93 0.93 61

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 36/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

In [429]:

model(classifier_lgbm,x_train2,y_train2,x_test2,y_test2)
model_evaluation(classifier_lgbm,x_test2,y_test2)

Cross Validation Score : 96.03%

ROC_AUC Score : 95.21%

precision recall f1-score support

0 0.93 0.96 0.95 27

1 0.97 0.94 0.96 34

accuracy 0.95 61
macro avg 0.95 0.95 0.95 61
weighted avg 0.95 0.95 0.95 61

The Final Results

After using flaml, the results of both datasets improved

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 37/38
5/8/23, 7:04 PM Heart_Failure_Prediction_EDA_&_&_Modeling_(95_acc,_93_f1) - Jupyter Notebook

Dataset 1 :

Cross Validation Score : 94%

ROC_AUC Score : 93%

Dataset 2 :

Cross Validation Score : 96 %

ROC_AUC Score : 95%

localhost:8888/notebooks/Heart_Failure_Prediction_EDA_%26_%26_Modeling_(95_acc%2C_93_f1).ipynb 38/38

A.I Lab Report
No ratings yet
A.I Lab Report
24 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Heart Attack Prediction Model EDA
100% (1)
Heart Attack Prediction Model EDA
24 pages
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
No ratings yet
Prediction of Risk in Cardiovascular Disease Using Machine Learning Algorithms
6 pages
Thera Bank - Project
100% (4)
Thera Bank - Project
34 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Project - Ipynb - Colaboratory
No ratings yet
Project - Ipynb - Colaboratory
4 pages
Data Mining Project
No ratings yet
Data Mining Project
33 pages
Capstone Project AIML CV1 Interim Report
No ratings yet
Capstone Project AIML CV1 Interim Report
18 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
ML 2 Project Business Report - Nandini
No ratings yet
ML 2 Project Business Report - Nandini
43 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
9 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Unit I
No ratings yet
Unit I
85 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
VARUNSAINI - 13 Nov 2022
No ratings yet
VARUNSAINI - 13 Nov 2022
14 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Heart Prediction
No ratings yet
Heart Prediction
15 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
AD3002 Healthcare 5 Lect Notes
No ratings yet
AD3002 Healthcare 5 Lect Notes
34 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Project 5-EasyVisa Assignment
No ratings yet
Project 5-EasyVisa Assignment
57 pages
Pima Indian Diabetes Questions
No ratings yet
Pima Indian Diabetes Questions
6 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
22 pages
Quiz 3 - Recommendation Systems, Association Rule Mining - Machine Learning 3 - Ravi
No ratings yet
Quiz 3 - Recommendation Systems, Association Rule Mining - Machine Learning 3 - Ravi
7 pages
SEO Tools for Engineering Students
No ratings yet
SEO Tools for Engineering Students
14 pages
Web Developer Interview
No ratings yet
Web Developer Interview
3 pages
Project Report TSF Extendec
No ratings yet
Project Report TSF Extendec
52 pages
Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm
No ratings yet
Predicting Heart Disease in Patients Using Bat Features Selection and Back Propagation Algorithm
37 pages
Python Data Science Cookbook - (Preface) PDF
No ratings yet
Python Data Science Cookbook - (Preface) PDF
8 pages
Pima Indian Diabetes Prediction
No ratings yet
Pima Indian Diabetes Prediction
22 pages
Semi-Automated EDA in Python
No ratings yet
Semi-Automated EDA in Python
3 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
SMDM Assignment PDF
100% (1)
SMDM Assignment PDF
15 pages
Twitter Sentiment Word Embedding
No ratings yet
Twitter Sentiment Word Embedding
27 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Austo Case Study
No ratings yet
Austo Case Study
19 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
139 pages
Advanced Certification in Data Science and Artificial Intelligence
No ratings yet
Advanced Certification in Data Science and Artificial Intelligence
18 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
19 pages
Quiz
No ratings yet
Quiz
6 pages
Free Download Data Science Curriculum - Innomatics Research Labs Hyderabad, India
No ratings yet
Free Download Data Science Curriculum - Innomatics Research Labs Hyderabad, India
14 pages
04 Chap04 ClassificationMethods LDA QDA
No ratings yet
04 Chap04 ClassificationMethods LDA QDA
28 pages
Assignment 2 PDF
No ratings yet
Assignment 2 PDF
25 pages
DBMS Tree Indexes Explained
No ratings yet
DBMS Tree Indexes Explained
65 pages
Asthma Prediction with ML
No ratings yet
Asthma Prediction with ML
55 pages
Machine Learning Projects PDF
No ratings yet
Machine Learning Projects PDF
5 pages
P-149 Final PPT
No ratings yet
P-149 Final PPT
57 pages
Sports Injury Probability Analysis
No ratings yet
Sports Injury Probability Analysis
12 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
Data Science Tools Study Guides For MIT's 15.003
No ratings yet
Data Science Tools Study Guides For MIT's 15.003
23 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Data Science Week 4
No ratings yet
Data Science Week 4
14 pages
Running Qlik Automations from Qlik Sense Load Script
No ratings yet
Running Qlik Automations from Qlik Sense Load Script
8 pages
The Ultimate Set Analysis Cheat Sheet For Qlik Sense
No ratings yet
The Ultimate Set Analysis Cheat Sheet For Qlik Sense
14 pages
Qlik String Functions Guide
No ratings yet
Qlik String Functions Guide
11 pages
Table Functions - Qlik - A Quick Overview With Examples
No ratings yet
Table Functions - Qlik - A Quick Overview With Examples
7 pages
QSGC: Qlik Sense Data Governance Tool
No ratings yet
QSGC: Qlik Sense Data Governance Tool
7 pages
Create An Efficient Dashboard With Qlik Sense
No ratings yet
Create An Efficient Dashboard With Qlik Sense
35 pages
Qlik Sense - Next Steps in Scripting May 2023
No ratings yet
Qlik Sense - Next Steps in Scripting May 2023
51 pages
React Spring
No ratings yet
React Spring
1 page