DIABETES
PREDICTION
using MACHINE
LEARNING
GOWTHAM KUMAR
CONTENTS
• Introduction
• Literature review 1
• Literature review 2
• Literature review 3
• Comparison
• Proposed System
• Methodology
• Dataset
• Conclusion
Introduction
• Diabetes is the most common disease worldwide and
keeps increasing everyday due to changing lifestyles,
unhealthy food habits and over weight problems.
• Identified when blood glucose is higher than normal level
• 2 types of Diabetes: Type 1 and Type 2
• Type 1 Diabetes: Your immune system mistakenly attacks
and destroys the insulin-making cells in your body.
• Type 2 Diabetes: Your body doesn't use insulin properly,
and it often happens slowly.
CAUSES
• Insulin Resistance
• Lifestyle Factors: Poor diet, lack of physical activity,
obesity, and stress
• Age: Risk increases with age, especially after 45.
• Family History: A family history of diabetes can be a
contributing factor.
SYMPTOMS
• Frequent Urination
• Slow Healing
• Excessive Thirst
• Excessive Hunger
• Fatigue
• Blurred Vision
LITERATURE REVIEW-1
Title of the paper Diabetes Disease Prediction Using Machine Learning Algorithms
About 2020 IEEE EMBS Conference on Biomedical Engineering and Sciences(IECBES)
Area of work Used in medical field
Dataset Pima Indians Diabetes Database from National Institute of Diabetes and Digestive and Kidney Diseases
Algorithm K-Nearest neighbor, Decision tree, Random Forest , Naive Bayes, Support Vector Machine
Dataset Collection--->Data pre processing--->Setting Classification metrics--->Applying Machine
Methodology Learning
(training set ,testing set )--->K-fold cross validation
Result/Accuracy 76% accuracy (kNN), 73% (SVM), 74% Naive Bayes , 72% Decision tree , 70% Random Forest
The early detection of diabetes disease plays a vital role in making decisions on lifestyle changes in
Advantages
high-risk patients and in turn reduces the complications
In future, we will create a diabetes dataset in collaboration with a hosptial or a medical
Future Proposals
institute and will try to achieve better results
LITERATURE REVIEW-2
Diabetes Disease Prediction Using Machine Learning Algorithms with Feature Selection and
Title of the paper
Dimensionality Reduction
About 2021 7th International Conference on Advanced Computing and Dimensionality Reduction
Area of work Used in medical field
Pima Indians Diabetes Database from National Institute of Diabetes and Digestive and Kidney Diseases
Dataset
Dataset collection->Data pre-processing->Feature Selection->Dimensionality Reduction->Classifier
Methodology
(Support Vector Machine &Random Forest)->Analysis
Algorithm Random Forest , Support Vector Machine
Result/Accuracy 77.3% accuracy (SVM) and 83% accuracy in Random Forest
Understanding of the dataset and pre-processing it to increase efficiency. Performance of feature
Advantages
selection and Dimensionality reduction to increase overall performance
LITERATURE REVIEW-3
Title of the paper Prediction of Diabetes Using Machine Learning Algorithms in Healthcare
Proceedings of the 24th International Conference on Automation & Computing, Newcastle
About
University,Newcastle upon Tyne, UK ,2018
Area of work Used in medical field
Pima Indians Diabetes Database from National Institute of Diabetes and Digestive and Kidney Diseases
Dataset
Dataset collection--->Data pre-processing--->Feature Selection--->Split Training data(70%)
Methodology
and Testing Data( 30%)--->Machine learning Algorithm--->train model--->Analysis
K-Nearest neighbor, Decision tree, Random Forest , Naive Bayes, Support Vector Machine, Logistic
Algorithm
Regression
Result/Accuracy 77% accuracy in SVM and KNN, 74% in LR and NB , 71% in DT and RF
Our future work will focus in integrating of other methods into the used model for tuning the
Future Proposals
parameters of models for better accuracy
COMPARISON
Diabetes Disease Prediction Using
Prediction of Diabetes
Prediction of Diabetes Using Machine Machine Learning Algorithms with
PAPER NAME Using Machine Learning
Learning Algorithms in Healthcare Feature Selection and
Algorithms in Healthcare
Dimensionality Reduction
YEAR 2020 2021 2018
Pima Indians Diabetes Database
from National Institute of Diabetes
DATASET Same dataset Same dataset
and Digestive and Kidney Diseases
ALGORITHM KNN Random Forest SVM,KNN
ACCURACY 76% 83% 77%
PROPOSED SYSTEM
• Utilizing the Pima Indians Diabetes Database from the National
Institute of Diabetes and Digestive and Kidney Diseases for
diabetes prediction.
• Employing the Support Vector Machine (SVM) algorithm for building
the predictive model
• Developing a user-friendly web application for inputting user data
and displaying prediction results.
METHODOLOGY
DATASET DATA
DATASET
VISUALISATION PREPROCESSING
COLLECTION
SPLIT
DEPLOY THE APPLY MACHINE
TRAIN(80%)
MODEL INTO LEARNING
TEST(20%)
WEB ALGORITHM
DATASET
APPLICATION (SVM)
DATASET
Pima Indians Diabetes Database from National Institute of
Diabetes and Digestive and Kidney Disease
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-da
tabase
ABOUT THE DATSET
• This dataset is originally from the National Institute of Diabetes and
Digestive and Kidney Diseases.
• The objective of the dataset is to diagnostically predict whether or not a
patient has diabetes, based on certain diagnostic measurements included
in the dataset
• All patients here are females at least 21 years old of Pima Indian heritage.
• Total 768 rows indicating 768 patients
• 9 coloumns containing 8 feature lists and 1 class variable
Feature labels
CLASS LABEL
• Class label is Outcome
• It contains two values 0 or 1
• 0-->The person is not diabetic
• 1-->The person is diabetic
Conclusion
• The web application provides an accessible and user-friendly platform for
individuals to assess their diabetes risk.
• Achieve high percentage accuracy in prediction of diabetes
• Timely identification of individuals at risk
• Enabling early intervention through lifestyle changes or medical treatments
• Reducing long term healthcare costs
THANK
YOU