Disease Prediction Using Machine Learning
Disease Prediction Using Machine Learning
ISSN No:-2456-2165
Abstract:- Machine learning (ML) is an artificial individual much better. Some of the domains of healthcare
intelligence (AI) technique that facilitates the where machine learning is being used are cancer detection,
improvement of predictability in software applications laser treatment, heart disease prediction, liver disease
without requiring explicit programming. Data from the prediction, kidney disease prediction and whether the
past is used to predict new outcomes using machine person is diabetic or not.
learning algorithms. During the course of this paper, we
used four machine learning algorithms: Logistic II. DATASET
Regression, Support Vector Machine, Decision Tree,
and Gradient Boosting. Our chosen algorithms were In this paper, five datasets are being used, each for one
applied to five datasets from the healthcare domain, in disease prediction. These datasets are available with no cost
which we were able to predict kidney disease, liver on the Kaggle website.
disease, breast cancer predictions, heart diseases, and A. Liver Disease Prediction Dataset
diabetes predictions. The liver dataset contains 11 attributes and 583 rows of
Keywords:- Machine Learning; kidney; heart; liver; data. This dataset contains column like the gender of the
breast- cancer; diabetes; svm; linear regression; decision patient, their age and other attributes like insulin level,
tree; gradient boosting. protein intake just to name a few. To predict whether the
patient has a liver disease or no there is a column attribute
I. INTRODUCTION with the title ‘Dataset’. This attribute is used in the
prediction process.
Machine learning can be defined as the use of
statistical models and probabilistic algorithms to answer B. Diabetes Prediction Dataset
questions so we can make informative decisions based on The diabetes dataset contains 9 attributes and 768 rows
our data. Machines make all these things possible by of data. This dataset contains column like pregnancies,
filtering useful pieces of information and piecing them glucose, blood pressure, skin thickness, insulin, BMI, age
together based on patterns to get accurate results. The and outcome. To predict whether the patient has diabetes or
different machine learning algorithms are supervised, no there is a column attribute with the title ‘outcome’.
unsupervised and reinforcement learning algorithms. In
supervised learning, we use known or labeled data for the C. Kidney Disease Prediction Dataset
training data. Since the data is known, the learning is, The kidney dataset contains 26 attributes and 400 rows
therefore, supervised, i.e., directed into successful of data. This dataset contains column like age, blood
execution. Two types of supervised learning algorithms are pressure, sugar levels, red blood cells, just to name a few.
regression and classification. Examples of this division of To predict whether the patient has a kidney disease or no
algorithm include decision tree, linear regression logistic there is a column attribute with the title ‘class’, which
regression, and support vector machine, just to name a few. defines the class of the disease.
The second type of machine learning algorithm is the D. Breast Cancer Prediction Dataset
unsupervised machine learning algorithm. In unsupervised The breast cancer dataset contains 33 attributes and 570
learning, the training data is unknown and unlabeled - rows of data. This dataset contains column that describe the
meaning that no one has looked at the data before. size, shape and location of the cancer. To predict whether
Clustering is a type of unsupervised learning algorithm. the patient is suffering from breast cancer or no there is a
Examples of this division of algorithm include singular column attribute with the title ‘diagnosis’, which describes
value decomposition, K-means clustering, Apriori, whether thecancer is benign or malignant.
Hierarchical clustering, Principal component analysis, just
to name a few. The third type of machine learning E. Heart Disease Prediction Dataset
algorithm is the reinforcement learning. The algorithm The heart dataset contains 14 attributes and 303 rows of
discovers data through a process of trial and error and then data. This dataset contains column like age, gender, chest
decides what action results in higher rewards. The agent, pain type, etc., just to name a few. To predict whether the
the environment, and the actions are the three major patient has a heart disease or no there is a column attribute
components that make up reinforcement learning. with the title ‘output’.
Examples of this division of algorithm include Q-learning,
R-learning and TD-learning.
below describes the architecture of the proposed system. Table 1: Results for Breast Cancer Prediction
Fig. 1: System Architecture As shown above, Gradient boosting classifier and
support vector machine showed the highest accuracy score.
A. Logistic Regression
Logistic regression is one of the most popular machine
learning algorithms. It is a type of supervised learning
algorithm. It is used for predicting dependent variables
using a set of independent variables. It predicts the output of
categorical dependent data. These categorical values could
be ‘Yes’ or ‘no’, or ‘true’ or ‘false’, or a probabilistic value
between 0 and 1. Logistic regression is a classification
based algorithm.