0% found this document useful (0 votes)
46 views7 pages

Heart Disease Prediction Using Data Mining Techniques IJERTV10IS020083

The document discusses the use of data mining techniques for predicting heart disease, highlighting the importance of accurate early diagnosis in healthcare. It evaluates various AI-based methods, including Decision Tree, Naïve Bayes, and Neural Networks, comparing their performance and accuracy in predicting cardiovascular conditions. The study emphasizes the potential of these techniques to assist medical practitioners in timely decision-making for patient care.

Uploaded by

HANISHA SAALIH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views7 pages

Heart Disease Prediction Using Data Mining Techniques IJERTV10IS020083

The document discusses the use of data mining techniques for predicting heart disease, highlighting the importance of accurate early diagnosis in healthcare. It evaluates various AI-based methods, including Decision Tree, Naïve Bayes, and Neural Networks, comparing their performance and accuracy in predicting cardiovascular conditions. The study emphasizes the potential of these techniques to assist medical practitioners in timely decision-making for patient care.

Uploaded by

HANISHA SAALIH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/349548570

Heart Disease Prediction using Data Mining Techniques

Article · February 2021

CITATIONS READS

40 4,979

2 authors:

Pratiksha Shetgaonkar Shailendra Aswale


National Institute of Technology Karnataka SRIEIT - Goa University
26 PUBLICATIONS 313 CITATIONS 64 PUBLICATIONS 647 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pratiksha Shetgaonkar on 24 February 2021.

The user has requested enhancement of the downloaded file.


Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

Heart Disease Prediction using Data Mining


Techniques
Pratiksha Shetgaonkar Dr. Shailendra Aswale
SRIEIT-Goa SRIEIT-Goa

Abstract— The heart is the most crucial & critical organ of Classification techniques are used widely in healthcare
the human body. Life is completely dependent on the efficient because of their capabilities of processing very large data
working & functioning of our heart. It is one of the major sets. The commonly used techniques in healthcare are Naïve
causes of mortality in today's world. Heart disease remains one Bayesian, support vector machine, nearest neighbor,
of the most serious health issues of our day. It is said to be the
primary motive in death globally. Many times it's difficult for
decision tree, Fuzzy logic, Fuzzy based neural network,
medical professionals to expect a heart disease on time. Artificial neural network, and genetic algorithms [1].
Nowadays, the health sector contains a lot of precious hidden
facts & information which could prove to be very helpful in II. RELATED WORK
making predictive decisions especially in the field of medicine. Several researchers and authors have studied, experimented
Data mining is a method or technique used to analyze vast with, and analyzed numerous techniques for heart disease
datasets and then derive significant and useful results with the predictions which includes the techniques for classification
use of extraordinary AI-based techniques. This article attempts
to use three of these AI-based methods namely Decision Tree,
and feature selection.
Naïve Bayes, & Neural Network for forecasting cardiovascular The authors proposed the hybrid HRFLM approach by
or heart disease. All of these methods will be evaluated based combining the characteristics of the Linear Method (LM)
on different unique & parameters with optimizations for better and Random Forest (RF). They obtained a prediction
accuracy. The accuracy of each method will then be compared accuracy of 88.4% [1].
depending on accuracy based on various parameters. The best The authors in one of the research done in 2019, tried to
& accurate technique is then implemented for predicting mainly increase the accuracy of prediction by using the
whether or not a man or a woman will have coronary heart various feature selection techniques. Different data
disease. This technique can be used by medical practitioners mining techniques i.e. Decision Tree, Logistic
for early prediction of the disease so that timely care can be
taken by the patient.
regression, Logistic regression SVM, Naïve Bayes, and
Random forest are applied individually in Rapid miner on a
Keywords—Data Mining, Artificial Intelligence, Heart, UCI heart disease date set and compared results with the
Disease, Prediction past researches and finally, the results concluded that the
I. INTRODUCTION Logistic regression which obtained an accuracy of is
Cardiovascular disease has become one of the most 84.85% is the best feature selection technique for
widespread diseases in the world at present. It is estimated predicting heart disease[2]
to have caused around 17.9 million deaths in 2017 which In 2018, the researchers used the Prediction models by using
constitutes about 15% of all natural deaths [13]. the different combinations of features, and seven
Cardiovascular disease is chronic heart disease and can be classification techniques: k-NN, DT, NB, LR, SVM), NN,
detected at the initial stages by measuring the levels of and VOTE (a hybrid technique with Naïve Bayes and
various health parameters like blood pressure, cholesterol Logistic Regression). And their experiment results showed
level, heart rate, and glucose level [13]. The cardiovascular that the best-performing data mining technique, the VOTE
disease not only affects human health but also the technique with NB and LR achieved an accuracy of 87.4%
economics and cost of the countries [14]. Nowadays, in heart disease prediction [3]. The 10-fold cross-validation
several data mining algorithms and machine learning technique was used to validate the performance of the
algorithms are b e i n g developed a n d r e searched for models[3].
predicting the different types of diseases [28]. Similarly, The authors in 2019 developed an automated diagnostic
there are many research article which shows that numerous system based on χ 2 statistical model and DNN (χ 2 -DNN
data mining, machine learning, a n d t h e h y b r i d MODEL) for the improved diagnosis of heart disease. Their
a l g o r i t h m s are b e i n g s t u d i e d , developed a n d proposed method targeted the two main problems i.e., the
investigated which can help detect the and predict the early problem of underfitting and overfitting, and proposed a
stage of heart disease [22-26]. The heart disease diagnosis is diagnostic system that neither under fits nor overfits the
the process of detecting or predicting heart disease from a training data and their proposed model gave the testing
patient's records. Doctors may not able to diagnose a patient accuracy of 93.33%[4].
properly in a short time, especially when the patients suffer The authors in [5] proposed a hybrid model or system in
from more than one disease [10]. The authors in [18] have which the researchers used the decision tree technique, i.e.,
surveyed numerous research papers from different years on the C4.5 algorithm, and combined it with ANN and named
the prediction of heart diseases and they concluded that data it as hybrid DT to produce the desired result. When this
mining techniques are better at predicting heart diseases. model was analyzed and compared with the C4.5 algorithm

IJERTV10IS020083 www.ijert.org 281


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

and ANN on the same data set, it proved to be more In 2011 the researchers in[19]used the classification
accurate with an accuracy of 78.14%[5]. algorithm such as RIPPER (Repeated Incremental Pruning
In 2019 the researchers implemented a hybrid approach to Produce Error Reduction) proposed by William W
combining various techniques that exploited the Fast Cohen, Decision tree, ANN, and Support Vector Machine,
Correlation-Based Feature Selection (FCBF) method to and their experimental results showed that Support Vector
filter redundant features to improve the quality of heart machine achieved the highest prediction accuracy[19]. The
disease classification. This method proved to be more than authors, Sellappan & Palaniappan in [20]proposed an
90% accurate [6]. advanced and Intelligent coronary heart disorder prediction
Few authors [7] used an ensemble of classifiers. The machine (IHDPS) using three data mining techniques (naïve
ensemble algorithms bagging, boosting, stacking and Bayes, decision tree, neural network).
majority voting were employed for experiments. The proper Authors K. Srinivas, B. Kavita Rani, and A. Govardhan
selection techniques for feature sets helped to improve the presented the use of numerous data mining techniques to
accuracy of the ensemble algorithms. The highest accuracy predict a heart attack. They used methods such as Decision
was obtained with majority voting with the feature set Tree, Naive Bayes, and ANN [21]. Data mining tools, such
FS2[7]. as TANAGRA, were used in statistical learning algorithms.
In 2020 the authors used the seven different intelligent
techniques to predict coronary heart disease using the III. PROPOSED METHODOLOGY
Starlog and Cleveland heart disease dataset and in their Based on the conclusion from our literature review we
comparative study, the deep neural network performed concluded that the three below mentioned techniques are
better and obtained an accuracy of 98.15% with the Starlog better & efficient in classifying and predicting in terms of
dataset and in the case of Cleve- land dataset, SVM achieved accuracy. Therefore we experimented with these three
an accuracy of 97.36%[8]. techniques that are;
In 2019 in one of the research for diagnosis of heart 1. Neural Network
disease, the authors used UCI machine learning repository 2. Decision Tree
for heart disease dataset and proposed a Multi-Layer Pi- 3. Naïve Bayes.
Sigma Neuron Model (MLPSNM) for heart disease
diagnosis which was based on PI-Sigma model in which, as IV. EXPERIMENTATION AND PERFORMANCE
per the authors, the architecture and calculation are less ANALYSIS
complex as compared to other previously pro- posed models. a) DATASET
For the learning of the network, the BP algorithm was used with We have used the dataset from the UCI repository from this
bipolar sigmoid function activation function and PCA and website link
LDA preprocessing methods are used to reduce the https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
dimensionality of the dataset. In the SVM-LDA method, the We also consulted the doctor nearby who helped us to add
attributes that are closer to the hyperplane are selected. For more data to our database.
validation of the network, the k-fold validation method is Our datasets consisted of 14 attributes with 668 records,
used. The network converges after 50 iterations. The details of which are given in Table 1, below.
proposed model achieves 94.53% classification accuracy for
diagnosis of heart disease by using PCA [9].
The authors in [11] compared the use of several supervised
machine learning (ML) algorithms for predicting clinical
events in terms of their internal validity and accuracy and
the results, which were obtained using two statistical
software platforms that is R-Studio and Rapid Miner were
then compared and showed that the decision tree algorithm
gave better results.
The authors in [15] performed the comparative study of
heart disease diagnosis system using top ten data mining
classification algorithms [27]. The data mining algorithms
discussed were C4.5, SVM, Ada Boost, KNN, Naive Bayes,
and CART, Random Forest, Bagging Algorithm, Logistic
Regression, and Multilayer Perceptron (MLP). From their
experimental study in terms of accuracy, the top three
algorithms were Random Forest with 78.0%, kNN with
71.6%, and MLP with 63.8% and the top three based on
speed were AdaBoost, kNN, and Naive Bayes.
The authors in [16] carried the implementation of prediction
algorithm and reached to the conclusion that the accuracy of
the algorithms in machine learning depends upon the
dataset that used for training and testing purpose[16].

IJERTV10IS020083 www.ijert.org 282


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

Table 1. Attributes of Heart Disease Dataset


Sr. Attribute Description Values
After manipulating the dataset i.e. increasing and
no decreasing the training and testing data, we got the
1 Age Age in years Continuous following results for the three data mining techniques for the
1 = male prediction of heart disease shown in various tables and
2 Sex Male or female
0 = female graphs below.
1 = typical type 1
2 = typical type agina Table 2. Decision Tree Data (Snapshot with few
3 Cp Chest pain type
3 = non-agina pain
4=asymptomatic
configuration changes)
Resting blood Continuous value in
4 thestbps
pressure mm hg
Serum Continuous value in
5 chol
cholesterol mm/dl
0 = normal
Resting 1 = having_ ST_T
6 Restecg electrographic wave abnormal
results 2 = left ventricular
hypertrophy
Fasting blood 1 ≥ 120 mg/dl 0 ≤ 120
7 FBS
sugar mg/dl

Maximum heart
8 thalach Continuous value
rate achieved
Exercise-induced 0= no
9 exang
agina 1 = yes
ST depression
induced by
10 oldpeak Continuous value
exercise relative
to rest
The slope of the 1 = unsloping
11 slope peak exercise ST 2 = flat
segment 3 = downsloping

Number of major
12 Ca vessels colored 0-3 value
by floursopy
3 = normal
13 thal Defect type 6 = fixed
7 = reversible defect

b) DECISION TREE

Below table 2 and Figure 1, shows the Decision Tree tested


for a different number of testing data, how the accuracy can
be improved by removing some of the attributes and testing
again.

Figure 1. Attribute v/s Accuracy graph for table2

IJERTV10IS020083 www.ijert.org 283


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

c) NAÏVE BAYES d) NEURAL NETWORK

The below table3 shows the Naïve Bayes tested for different Below table 4 and Figure 3 show the accuracy obtained for
numbers of testing data, how the accuracy can be improved by neural networks tested on different hidden layers, changing
removing some of the attributes and testing again. the number of epochs, increasing and decreasing learning rate
and folds, and the activation functions. For improving the
Table 3: Naïve Bayes Data (Snapshot with few accuracy, we removed some of the attributes
configurations)
Table 4: Neural Network Data (Snapshot with few
configuration changes)

Figure 2. Attribute v/s Accuracy graph for table 3


Figure 3. Attribute v/s Accuracy graph for table 4

IJERTV10IS020083 www.ijert.org 284


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

e) MODEL COMPARISON Also, there is a scope to improvise this system by


integrating these approaches and forming a hybrid model
Figure 5 shows the graph for the accuracy of the three data that can deliver better outcomes than individual methods.
mining techniques.
ACKNOWLEDGEMENTS
The authors of this research & study wishes to express their
gratitude to their team members for providing their support
in the completion of the implementation of this study,
namely Amogh Power, Varsha Pawar, Seema Shilvant and
Visheh Parab.

REFERENCES
[1] Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart
disease prediction using hybrid machine learning techniques. IEEE
Access, 7, 81542-81554.
[2] Bashir, S., Khan, Z. S., Khan, F. H., Anjum, A., & Bashir, K. (2019,
January). Improving heart disease prediction using feature selection
approaches. In 2019 16th International Bhurban Conference on
Applied Sciences and Technology (IBCAST) (pp. 619-623). IEEE.
[3] Amin, M. S., Chiam, Y. K., & Varathan, K. D. (2019). Identification
of significant features and data mining techniques in predicting heart
disease. Telematics and Informatics, 36, 82-93.
[4] Ali, L., Rahman, A., Khan, A., Zhou, M., Javeed, A., & Khan, J. A.
(2019). An Automated Diagnostic System for Heart Disease
Prediction Based on ${\chi^{2}} $ Statistical Model and Optimally
Fig 5: Accuracy levels for three data mining techniques Configured Deep Neural Network. IEEE Access, 7, 34938-34945.
[5] Maji, S., & Arora, S. (2019). Decision tree algorithms for prediction
V. CONCLUSION AND FUTURE SCOPE of heart disease. In Information and Communication Technology for
Competitive Strategies (pp. 447-454). Springer, Singapore.
From the above graphs obtained from our implementation, [6] Khourdifi, Y., & Bahaj, M. (2019). Heart disease prediction and
we can conclude that when we increase hidden layers, the classification using machine learning algorithms optimized by particle
result becomes less accurate and it also consumes more time swarm optimization and ant colony optimization. Int. J. Intell. Eng.
i.e. not efficient. Also After decreasing the learning rate the Syst., 12(1), 242-252.
[7] Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of
accuracy decreased. In the neural network, we got the prediction of heart disease risk based on ensemble classification
highest accuracy i.e. 81.08% when we used a smaller techniques. Informatics in Medicine Unlocked, 16, 100203.
number of hidden layers with increased learning rate and [8] Ayon, S. I., Islam, M. M., & Hossain, M. R. (2020). Coronary artery
increased training dataset. heart disease prediction: a comparative study of computational
intelligence techniques. IETE Journal of Research, 1-20.
When we changed the attributes, the result was also [9] Burse, K., Kirar, V. P. S., Burse, A., & Burse, R. (2019). Various
changing. Removal of the chest pain and cholesterol preprocessing methods for neural network-based heart disease
attributes decreased the accuracy of a decision tree since prediction. In Smart innovations in communication and
both are important attributes for heart disease prediction. computational sciences (pp. 55-65). Springer, Singapore
[10] Tarawneh, M., & Embarak, O. (2019, February). Hybrid approach for
But after removing the sex attribute the accuracy remained heart disease prediction using data mining techniques.
unchanged, which led to our conclusion that this attribute In International Conference on Emerging Internetworking, Data &
doesn't play an important role in disease prediction. Web Technologies (pp. 447-454). Springer, Cham.
We also tried to check the accuracy of Naïve Bayes by [11] Beunza, J. J., Puertas, E., García-Ovejero, E., Villalba, G., Condes,
E., Koleva, G., ... & Landecho, M. F. (2019). Comparison of machine
removing some attributes but the results didn't change much learning algorithms for clinical event prediction (risk of coronary
because the Naïve Bayes algorithm is independent of other heart disease). Journal of biomedical informatics, 97, 103257.
attributes. [12] Gonsalves, A. H., Thabtah, F., Mohammad, R. M. A., & Singh, G.
In a neural network, for finding better accuracy we tried (2019, July). Prediction of coronary heart disease using machine
learning: an experimental analysis. In Proceedings of the 2019 3rd
with different hidden layers, learning rates, and changing International Conference on Deep Learning Technologies (pp. 51-
Attributes. When we increased hidden layers, it gave better 56).
accuracy but its computation time increased which was not [13] Nalluri, S., Saraswathi, R. V., Ramasubbareddy, S., Govinda, K., &
good for prediction, but when we reduced the number of Swetha, E. (2020). Chronic Heart Disease Prediction Using Data
Mining Techniques. In Data Engineering and Communication
hidden layers it gave us better results with much shorter Technology (pp. 903-912). Springer, Singapore.
calculation time which was reliable. After analyzing the [14] Gokulnath, C. B., & Shantharajah, S. P. (2019). An optimized feature
above graphs we concluded that the decision tree was giving selection based on genetic approach and support vector machine for
more accurate results with 98.54% as compared to other heart disease. Cluster Computing, 22(6), 14777-14787.
[15] Enriko, I. K. A. (2019, June). Comparative study of heart disease
methods which we're giving 85.01% (Naïve Bayes) and diagnosis using top ten data mining classification algorithms.
81.83% (neural network). As we can see from the below In Proceedings of the 5th International Conference on Frontiers of
graph. Educational Technologies (pp. 159-164).
We can make this system more efficient & reliable by using [16] Singh, A., & Kumar, R. (2020, February). Heart Disease Prediction
Using Machine Learning Algorithms. In 2020 International
a more number of training datasets and evaluating the Conference on Electrical and Electronics Engineering (ICE3) (pp.
datasets. We can also try to increase the number of features 452-457). IEEE.
such as Junk food, exercise, and tobacco to be more precise.

IJERTV10IS020083 www.ijert.org 285


(This work is licensed under a Creative Commons Attribution 4.0 International License.)
Published by : International Journal of Engineering Research & Technology (IJERT)
http://www.ijert.org ISSN: 2278-0181
Vol. 10 Issue 02, February-2021

[17] Barik, S., Mohanty, S., Rout, D., Mohanty, S., Patra, A. K., & Mishra,
A. K. (2020). Heart Disease Prediction Using Machine Learning
Techniques. In Advances in Electrical Control and Signal
Systems (pp. 879-888). Springer, Singapore.
[18] A. Powar, S. Shilvant, V. Pawar, V. Parab, P. Shetgaonkar and S.
Aswale, "Data Mining & Artificial Intelligence Techniques for
Prediction of Heart Disorders: A Survey," 2019 International
Conference on Vision Towards Emerging Trends in Communication
and Networking (ViTECoN), Vellore, India, 2019, pp. 1-7, doi:
10.1109/ViTECoN.2019.8899547.
[19] Kumari, M., & Godara, S. (2011). Comparative study of data mining
classification methods in cardiovascular disease prediction 1.
[20] Dangare, C. S., & Apte, S. S. (2012). Improved study of heart disease
prediction system using data mining classification techniques.
International Journal of Computer Applications, 47(10), 44-48
[21] Ashish C, Lakhan A, Sahil A and Prof Y.K.Sharma, P.(2016). Heart
Disease Prediction Using Data Mining Techniques. International
Journal Of Research in Advent Technology.
[22] Das, Resul, Turkoglu, Ibrahim, et al.: Effective diagnosis of
heartdisease through neural networks ensembles. J. Expert Syst.
Appl.36, 7675–7680 (2009)
[23] Das, Resul, Turkoglu, Ibrahim, et al.: Diagnosis of valvular heart
disease through neural networks ensembles. J. Comput. Methods
Progr. Biomed. 93, 185–191 (2009)
[24] Gokulnath, C., Priyan, M. K., Balan, E. V., Prabha, K. R.,
Jeyanthi, R.: Preservation of privacy in data mining by using PCA
based perturbation technique. In: Smart Technologies and Man-
agement for Computing, Communication, Controls, Energy and
Materials (ICSTM), 2015 International Conference on (pp.202–
206). IEEE (2015)
[25] Babaoglu, et al.: Assessment of exercise stress testing with arti- ficial
neural network in determining coronary artery disease and predicting
lesion localization. J. Expert Syst. Appl. 36, 2562–2566
(2009)
[26] Rajeswari, K., et al.: Feature selection in ischemic heart disease
identification using feed forward neural networks. Int. Symp.
Robot. Intell. Sens. 41, 1818–1823 (2012)
[27] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H.,
& Steinberg, D. Top 10 algorithms in data mining. Knowledge and
Information Systems, 14(1), 1-37 (2008).
[28] Tilve, A., Nayak, S., Vernekar, S., Turi, D., Shetgaonkar, P. R., &
Aswale, S. (2020, February). Pneumonia Detection Using Deep
Learning Approaches. In 2020 International Conference on Emerging
Trends in Information Technology and Engineering (ic-ETITE) (pp.
1-8). IEEE.

IJERTV10IS020083 www.ijert.org 286


(This work is licensed under a Creative Commons Attribution 4.0 International License.)

View publication stats

You might also like