1
1
Research Article
A Method for Improving Prediction of Human Heart Disease Using
Machine Learning Algorithms
Received 8 December 2021; Revised 1 February 2022; Accepted 16 February 2022; Published 9 March 2022
Copyright © 2022 Abdul Saboor et al. +is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
A great diversity comes in the field of medical sciences because of computing capabilities and improvements in techniques,
especially in the identification of human heart diseases. Nowadays, it is one of the world’s most dangerous human heart diseases
and has very serious effects the human life. Accurate and timely identification of human heart disease can be very helpful in
preventing heart failure in its early stages and will improve the patient’s survival. Manual approaches for the identification of heart
disease are biased and prone to interexaminer variability. In this regard, machine learning algorithms are efficient and reliable
sources to detect and categorize persons suffering from heart disease and those who are healthy. According to the recommended
study, we identified and predicted human heart disease using a variety of machine learning algorithms and used the heart disease
dataset to evaluate its performance using different metrics for evaluation, such as sensitivity, specificity, F-measure, and clas-
sification accuracy. For this purpose, we used nine classifiers of machine learning to the final dataset before and after the
hyperparameter tuning of the machine learning classifiers, such as AB, LR, ET, MNB, CART, SVM, LDA, RF, and XGB.
Furthermore, we check their accuracy on the standard heart disease dataset by performing certain preprocessing, standardization
of dataset, and hyperparameter tuning. Additionally, to train and validate the machine learning algorithms, we deployed the
standard K-fold cross-validation technique. Finally, the experimental result indicated that the accuracy of the prediction classifiers
with hyperparameter tuning improved and achieved notable results with data standardization and the hyperparameter tuning of
the machine learning classifiers.
is both costly and computationally challenging to examine include sufficient risk factors or attributes from the detailed
[7]. +us, we build a noninvasive prediction system to clinical data. +is difference in clinical severity may affect the
handle these issues using machine learning classifiers. Heart prediction accuracy. +ese limitations have not been suffi-
diseases are efficiently diagnosed using an expert decision ciently considered in previous studies. In the state-of-the-art
system relying on machine learning classifiers and artificial research, dataset standardization and algorithm tuning were
fuzzy logic. As a consequence, the death ratio declines [8, 9]. not performed.
Numerous researchers used the Cleveland heart disease For enhanced cardiac disease prediction, researchers
dataset. For training and testing, the predictive models of have developed a variety of machine learning models, such
machine learning require appropriate data. When a refined/ as SVM, KNN, FR, DT, LR, NB, and so on. Heart disease
standardized dataset is used for training and testing, the prediction accuracy, on the other hand, remains a challenge.
accuracy of machine learning classifiers can be improved. It is critical to develop a novel and cost-effective tool for
Furthermore, by incorporating relevant and related data predicting the risk of heart disease with high accuracy. +e
features, the predictive model capabilities can be enhanced. NB, BN, RF, and MLP total level of complexity has not been
+erefore, data standardization and feature selection are defined. +e age element is the age risk factor, which is also
important for machine learning classifiers’ accuracy. Nu- excluded in NB, BN, RF, and MLP from dataset [15]. +e
merous researchers have used different predictive tech- system was studied using StatLog datasets. For the Cleveland
niques in the literature, however, these approaches do not dataset, important risk factors, such as age, RestECG, ST
predict heart diseases effectively. Data standardization is Depression (Slope), and so on are removed from the model
necessary to enhance the machine learning classifiers’ ac- [16]. For the standardization of the proposed approach, no
curacy. +ere are different standardization techniques, such significance tests are performed, and StatLog dataset [17]
as standard scalar (SS), min-max scalar, and others that are and Z-Alizadeh Sani dataset are used. +e dataset has a
used to remove the missing feature value instances from the smaller size. +e obtained result was not compared to other
dataset. datasets for standardization, and the Cleveland dataset was
Multiple tests are required for heart disease prediction. used [18].
Timely identification is difficult. Cardiovascular disease In this research work, we proposed a machine learning
prediction is complicated, especially in emerging nations, classifier that includes random forest (RF), XGBoost (XGB),
where there is a shortage of skilled medical personnel, testing decision trees (CART), support vector machine (SVM),
equipment, and other resources needed for the identification multinomial Naı̈ve Bayes (MNB), logistic regression (LR),
and treatment of individuals with cardiac problems [10]. linear discriminant analysis (LDA), AdaBoost classifier
When trained using appropriate data, computational clas- (AB), and extra trees classifier (ET) for heart disease pre-
sifiers can be useful in diagnosing diseases [11]. Numerous diction. +e standardization and hyperparameters are per-
machine learning-based methods have been proposed for formed using the GridSearch CV method to select the best
predicting the risk of CSD. Most of these methods exploit the value for the hyperparameters for the best machine learning
use of publicly available datasets for the purpose of model classifier. Apart from that, various performance evaluation
training and evaluation. +e availability of these datasets has parameters, such as accuracy, precision, sensitivity, recall,
improved the performance of machine learning-based and F-measures, are used for the machine learning classi-
predictive models and opened up new research avenues for fier’s performance. +e proposed method has been tested on
researchers to develop cutting-edge algorithms for pre- the Cleveland HD dataset. Moreover, the proposed machine
dicting CVD risk. +ese datasets provide information about learning classifiers’ accuracy has been compared to existing
different risk factors and the patient’s disease status (whether state-of-the-art methods in the literature, such as SVM, LR
the patient has a disease). Preprocessing is required for [19], and RF [20]. +e proposed work has the following main
designing predictive models for CVD because the clinical contributions:
datasets available are inconsistent and duplicated [12].
(1) Firstly, the authors attempt to address the issue of
Furthermore, information about different risk factors (fea-
datasets and then refine and standardized the
tures) is available, and the selection of an appropriate set of
datasets. +en, the datasets are used to train and test
features is based on certain criteria, such as having a high
classifiers and determine which classifiers provide
prevalence in most populations, having a significant impact
the best accuracy results.
on heart disease on their own, and being able to be con-
trolled or treated to lower the risks [13]. Various risk factors (2) Secondly, the authors, to identify the best values of
or features have been employed by different studies when hyperparameter, used the GridSearchCV method.
modeling CVD predictors. When machine learning algo- (3) +irdly, apply the machine learning classifiers with
rithms are trained on appropriate datasets, they are most the best hyperparameter values to achieve the highest
effective [12, 14]. Limited medical datasets, feature selection, accuracy performed using hyperparameter tuning.
ML algorithm implementations, and a lack of in-depth (4) Finally, the proposed classifier (SVM) gives state-of-
analysis are all obstacles that may preclude the effective the-art accuracy.
prediction of heart diseases. Our research intends to fill some
of these knowledge gaps to construct a better CVD pre- +e rest of the paper is organized as follows: in Section 2,
diction model. Apart from that, the datasets used in existing a literature review of the existing machine learning tech-
studies also have some limitations. +ese datasets do not niques has been discussed. Section 3 describes research
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 3
goals, and Section 4 describes a proposed methodology to be classifiers achieved a higher accuracy of 90.00%. Tarawneh
followed during the study. Section 5 describes data collec- et al. [26] have conducted a study using the hybrid ap-
tion, Section 6 discusses the experimental results, and proaches of data mining classifiers to predict heart disease.
Section 7 concludes the paper and gives future work. +e datasets were obtained from the UCI repository of
machine learning, which consists of 303 records and has 76
2. Section II: Literature Review attributes. Model training and testing were performed on 14
attributes. +e data was preprocessed to minimize the
+e primary method used by the physicians was the aus- features from 14 to 12. KNN, NN, SVM, GA, J48, RF, and NB
cultation method for distinguishing between normal and are the classification algorithms used to assess the precision,
abnormal cardiac sounds [21]. Every heart disease was recall, and accuracy of cardiac disease prediction. +e ac-
identified by the physicians listening to these sounds of the curacy obtained by SVM and NB was 89.2%, and they made
heart using stethoscopes [20]. +e auscultation technique better predictions of heart disease. Anitha et al. [27] have
used by professional doctors to diagnose a heart disease has conducted a study using learning vector quantization al-
some drawbacks. +e clarification and classification of gorithms for the prediction of cardiac disease. +e accuracy
distinct sounds in the heart are associated with the abilities achieved by this algorithm is 85.55%. +e datasets were
and practices of the doctors, which are gained after lengthy taken from the University of California, Irvine’s (UCI),
examinations [22]. machine learning library, which consists of 303 records and
Apart from the manual method, various machine has 76 attributes. +e data were preprocessed because of
learning methods have been proposed for CVD detection. missing values, resulting in a sample of 302 records, with
Research was conducted by Amin et al. [19] to classify the only 14 features used for heart disease. +e dataset is cat-
most relevant attributes of heart disease prediction. Seven egorized into two sections: 70% for model training and 30%
classification algorithms are used, which consist of NB, for model testing. Another study developed by Jagtap et al.
KNN, LR, DT, NN, SVM, and Vote. +e Cleveland datasets [28] developed a web-based application for heart disease
were obtained from the UCI repository of machine learning, prediction using machine learning techniques. For the
which consists of 303 records and 76 attributes. +e 10-fold classification algorithms, LR, NB, and SVM are used for
cross-validation method is used for model training and model training and testing. Using the UCI machine learning
testing. We used 10-fold cross-validation because, in the repository, the Cleveland datasets were divided into 75
dataset, we have fewer training examples, and using data percent and 25 percent for training and testing, respectively.
split, such as train-test split, will give us an underestimate of +e data were preprocessed to eliminate discrepancies and
the model predictive performance because we will have missing values, and SVM achieved a higher accuracy of
fewer number of examples in the training set. However, 64.4%. +e study’s limitation was its inability to detect the
using 10-fold validation, the model will have 90% of the data risk factors of human heart disease patients at an early stage.
to learn from. +e Vote Classifier achieved a higher accuracy Another study developed by Dulhare et al. [29] combined
of 87.4%. A study carried out by Ketut Agung Enriko et al. the common feature selection algorithms of particle swarm
[23] used a KNN classifier with minimal parameters for optimization (PSO) and Naı̈ve Bayesian algorithms for an
heart disease prediction and had an accuracy rate of 81.85%. efficient prediction of heart disease. +e model training and
When using KNN, the performance drops as the number of testing processes were conducted using the UCI repository
parameters increases, and it uses 90% of the input for of the machine learning dataset of VA Long Beach, which
training, which is computationally expensive. Subhadra et al. consists of 270 records and 14 attributes, however, only 7
[24] conducted the study. +e used training algorithm is a attributes out of the 14 attributes of heart disease were used
multilayer perceptron neural network (MLP-NN) with to predict it. When combined with PSO and NB, the per-
backpropagation for heart disease prediction. To evaluate the formance accuracy of NB increases to 87.91%. It has been
system’s performance, recall, accuracy, precision, and shown that accuracy improves by 8.79% as compared to NB
F-measure are employed, and model training and testing are accuracy. Another study was developed by Kim et al. [30]
carried out using the UCI repository of machine learning using machine learning algorithms to predict heart disease.
Cleveland dataset, which consists of the records of 303 +e datasets were collected from the repository of machine
instances and has 76 attributes. +rough preprocessing, learning at the University of California, Irvine (UCI), which
missing values were removed from the data, which consisted consists of 303 records and uses 14 attributes. For training
of six records, and the 14 most relevant attributes of the heart and testing, the 10-fold cross-validation approach is utilized.
disease were used. +e results generated during the exper- +e DT algorithm performs with a better accuracy of 93.19%
iment showed that MLN-NN obtained a higher accuracy of prediction of heart disease. Siontis et al. [31] describe the
93.39%, with a running time of 3.86 seconds. Another study present and future condition of AI-enhanced electrocar-
conducted by Khan et al. [25] used a comprehensive pre- diogram (ECG) in the diagnosis of heart disease in at-risk
diction of heart disease based on an analysis using some of communities, summarize its consequences for healthcare
the most popular machine learning classifiers. For training decisions in patients with cardiovascular disease, and assess
and testing, only 14 features are employed from the its potential drawbacks. Linda et al. [32] proposed a unique
Cleveland (UCI) datasets, which consist of 303 records. health information system for prescribing exercise to heart
+ere was a data preprocessing activity carried out, resulting disease patients. According to their early findings, clinicians
in a dataset consisting of 296 records. +e results of SVM are confused about how to establish an exercise prescription
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Mobile Information Systems
for patients with numerous CVD risk factors. For patients, (2) In the second step, we refined and standardized the
the supplied system is an easy-to-use, guided, and time- collected data sets. +ese datasets were not gathered
saving evidence-based method. Ali et al. [33] provided a in a controlled environment and had erroneous
three-phase PB-FARM approach for the assessment of values. Hence, data preprocessing is an essential step
disease-related risk factors. It was also used to analyze the for studying data and machine learning. Data nor-
factors that influence the incidence of this disease using the malization means when the risk factors of a dataset
Z-Alizadeh Sani dataset. +e findings revealed a clear link have different values. For example, Celsius and
between the risk of coronary artery disease (CAD), elderly Fahrenheit are different measuring units of tem-
age, and normal chest pain. Rubini et al. [34] proposed a perature. +e standardization of data means scaling
prediction model for heart disease prediction. Different the risk factors and assigning the values that show the
classifiers, such as logistic regression, Nave Bayes, and SVM, difference between standard deviations from the
were compared to the proposed algorithm. In the proposed mean value. It rescales the risk factor value to im-
article, random forest achieved the highest accuracy of prove the performance of machine learning classi-
84.81%. Devansh Shah et al. [35] utilized a dataset of 303 fiers with a standard deviation (σ) of 1 and a mean
examples and 76 attributes, 14 of which were used in su- (μ) of 0. +e mathematical form of standardization is
pervised learning algorithms, such as decision tree, Nave given by (1).
Bayes, K-NN, and random forest. +e results show that X − Mean of X
K-NN has attained the maximum level of accuracy. Archana Standardization of X � . (1)
Singh et al. [36] developed a heart disease prediction model Standard Deviation of X
using machine learning classifiers. +e UCI, Cleveland (3) In this step, hyperparameter tuning is performed to
dataset uses 14 attributes to train and test their models to select the best value for the hyper parameters and
achieve maximum accuracy. +e results achieved by the get high accuracy. For this purpose, we used the
classifiers were as follows: linear regression 78%, decision GridSearchCV method. Before applying machine
trees 79%, support vector machines 83%, and K-NN 87%. learning classifiers, we adjust the hyper parameters
+e results revealed that K-NN had the highest accuracy. In values of machine learning classifiers to increase
this article, Asif Khan et al. [37] use SVM, logistic regression, their performance. +e Scikit-learn GridSearchCV
artificial neural networks, KNN, Nave Bayes, and decision class’s fit approach provides a grid of tuning clas-
tree as classification techniques. When compared to pre- sification algorithms. It allows each machine
vious models, the new model achieved an accuracy of learning algorithm to be trained and its corre-
92.37%. +e fundamental goal of this article, according to sponding hyper parameters to be adjusted in a
Mohan et al. [38], was to uncover suitable features using single consistent environment. +e entire training
machine learning techniques, such as decision trees, lan- dataset is then used to achieve a precise model once
guage models, SVM, random forests, Naive Bayes, neural the adequate values for hyperparameters have been
networks, and KNN. +e proposed hybrid HRFLM method achieved. +e 10-fold CV is used to identify the
was applied to merge the characteristics of random forests optimum values for the adjustable hyperparameters
and linear techniques. +is model’s accuracy was 88.4%. based on the training dataset. During the CV
Kumar et al.’s [39] various machine learning algorithms process, the adjusted hyper parameter values are
were utilized to predict cardiovascular disease. When provided to achieve the overall best classification
compared to other classifier techniques, the proposed model accuracy.
revealed that random forests had the greatest accuracy of
85.71%. (4) +e fourth step is to apply the machine algorithms
(i.e., AdaBoost, logistic regression, extra tree, mul-
3. Section III Research Goals and Objectives tinomial Naı̈ve Bayes, support vector machine, linear
discriminant analysis, classification and regression
+e main goal of this research is to develop a heart disease tree, random forest, and XGBoost) to the dataset
prediction model with improved and enhanced accuracy. obtained from step 2.
+e specific objectives are to quickly identify new patients, (5) In this step, the prediction model’s performance is
reduce diagnostic time, reduce heart attacks, and save lives. evaluated using different parameters, such as ac-
curacy, precision, recall, and F-measure. +e
4. Section IV Methodology model that gives the highest prediction accuracy,
precision, recall, and F-measures is selected. +e
In Section 4, we describe the proposed method and also accuracy metric assesses the precision or cor-
explain that the method is defined by the subsequent steps, rectness of a machine learning or classifier model’s
as shown in Figure 1. predictions. Mathematically, it is given by equa-
(1) +e first step is to select the dataset from the machine tion (2).
learning online repositories. +ere are many online
repositories, such as the Cleveland heart disease dataset,
Z-Alizadeh Sani dataset, StatLog Heart, Hungarian, true positive (TP) + true negative(TN)
Accuracy � . (2)
Long Beach VA, and Kaggle Framingham dataset. TP + TN + false negative (FN) + false positive(FP)
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 5
Machine Learning
Algorithms
Performance
Evaluation
Precision measures the predicted positive instances that study. +e first source for data is the Cleveland Clinic
are true/real positives. Mathematically, it is given by (3). Foundation [40]. +e second source for data is StatLog
TP (true positives) datasets that are accessible at [41]. +e third source of data is
precision � . (3) the Z-Alizadeh Sani dataset, which is accessible at [42]. It
TP (true positives) + FP(false positives) contains 303 data samples and 55 attributes. To acknowledge
Recall evaluates the analysis of the total number of true/ the evaluation with literature, we used a publicly available
real positive instances as affected by the total number of false resource, the Z-Alizadeh Sani dataset. Description and
negative instances. Mathematically, it is given by (4). source of the datasets are given in Table 1.
TP(true positives) 6. Section VI: Experimental Results
Recall � . (4)
TP(true positives) + FN(false negatives) and Discussion
An F-Measure is a harmonic mean of precision and
In Section 4, we discuss our experimental results. We col-
recall. It takes the equilibrium between precision and recall,
lected the dataset from an online machine learning repos-
and mathematically, it is given by (5).
itory and refined and standardized it. After standardization,
Precision × Recall we performed hyperparameter tuning and applied machine
F − measure � 2 × . (5)
Precision + Recall learning classifiers. All the classifiers are trained and tested
using 10-fold cross-validation. +e accuracy of classifiers is
also analyzed before and after standardized datasets. For
5. Section V: Data Collection evaluation purposes, the accuracy of the selected classifiers is
plotted. Figure 2 shows the accuracy of classifiers before and
+e Cleveland heart disease dataset, available from the after standardization data. From Figure 2, it is clear that
University of California, Irvine (UCI) online repository for most of the machine learning techniques (RF, CART, LDA,
machine learning, is the most prominent dataset used by the AB, LR, ET, and XGB) improved their accuracy, while MNB
researchers. +ere are 303 records, with 6 samples having and SVM classifiers decreased their accuracy on the stan-
missing values. +e data has 76 features in its original form, dardized dataset. Some classifiers, such as CART, ET, and
however, all published work is likely to refer to 13 of them, AB, showed significant accuracy improvements on the
while the other feature outlines the disease’s effect. +e standardized dataset. From Figure 1, it is evident that the ET
Z-Alizadeh Sani dataset, which includes 303 patients’ data and AB classifiers achieve the highest prediction accuracy of
with 55 input factors and a class label variable for each 90.16%. MNB shows the overall lowest performance and has
patient, is another popular dataset selected by researchers in the lowest accuracy of 59.01%. We also compare the ac-
the prediction process. +e StatLog Heart, Hungarian, Long curacy before and after the standardization of the dataset. An
Beach VA, and Kaggle Framingham datasets are some of the accuracy of 90.16% is achieved by ET and AB classifiers,
additional datasets used by the researchers in the prediction which shows the effect of the standardization of the dataset.
process. +e StatLog dataset has 270 records, each with 13 From the experimental results, it is clear that the ac-
Cleveland-like attributes. +e other two datasets, the curacy of the classifier increased with hyperparameter
Hungarian and Long Beach VA datasets, are collected from tuning. We tune the selected classifiers by adjusting
the UCI repository and consist of 274 records with 14 hyperparameter values to achieve the best accuracy. A set of
features each, similar to the Cleveland dataset. Researchers accuracy with different hyperparameter combinations is
used publicly available datasets, such as Cleveland, Hungary, achieved using 10-fold cross-validation. Since we have a
Switzerland, etc. +ere are different datasets available for this small number of training examples, using test split is not a
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Mobile Information Systems
Data Standardization
100
90
80
70
60
Accuracy
50
40
30
20
10
0
MNB SVM LR CART LDA AB RF ET XGB
Accuracy before data standardization
Accuracy after data standardization
Figure 2: Accuracy of classifiers before and after data standardization.
100
80
Accuracy
60
40
20
0
MNB SVM LR CART LDA AB RF ET XGB
Accuracy before hyper parameters tuning
Accuracy after hyper parameters tuning
Figure 3: Accuracy of classifiers before and after hyper parameter tuning.
good option since we have fewer examples to train the and accuracy of 98%, 98%, 98%, and 96.76%, respectively.
model. Hence, we are using 10-fold cross-validation. +e Precision is above 80% for all classifiers, whereas recall is
accuracy of the classifiers before and after hyperparameter above 90% for all classifiers. A maximum precision of 100% is
optimization is presented in Figure 3. Most of the classifiers achieved by XGB and MNB for the negative class. LR presents
(MNB, RF, LR, LDA, AB, SVM, ET, and XGB) improved a small precision of 78%, and CART shows the lowest recall,
their accuracy on hyperparameter tuning, while the accuracy F-measure, and accuracy of 61%, 69%, and 83.66%, respec-
of CART alone was not changed. Table 2 shows the best tively, where a maximum recall is achieved by SVM and LDA
combinations of hyperparameters for some algorithms to of 94.00%. +erefore, the negative class presented a com-
improve their accuracy. paratively poor recall of 61% and an F-measure of 69%, re-
Table 3 presents recall, precision, F-measure, and accuracy spectively. SVM shows comparatively good performance for
for the classifiers. A maximum precision of 98% is achieved by negative classes, with a recall, precision, and F-measure of
SVM for positive classes, although MNB and XGB classifiers 94%, and an accuracy of 96.72%.
have a maximum recall of 100%. However, SVM shows From the analysis of the results, it is clear that the SVM
generally the best performance in recall, precision, F-measure, classifier achieved the best accuracy during hyperparameter
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 7
Table 2: Best values for hyper parameter tuning to improve classifiers accuracy.
Algorithm Best Hyperparameter Values
SVM Kernel: sigmoid, C:0.5
CART Max features � “auto,” random state � 123, min samples split � 20, min samples leaf � 11
AB N estimators � 50, learning rate � 0.05
LDA Shrinkage � “auto,” solver � “lsqr”
GBM N estimators:250
RF Criterion � “gini,” n jobs � −1, min samples leaf � 2, min samples split � 5, n estimators � 15, random state � 123
ET N jobs � −1, min samples leaf � 1, n estimators � 15, random state � 123, criterion � “gini,” Min samples split � 6
XBoost objective � “reg: linear,” colsample bytree � 0.3, learning rate � 0.1, max depth � 15, alpha � 5, n estimators � 123
tuning. By comparing results obtained before and after For future work, we plan to use XGBoost for heart
standardized datasets, it is determined that the standardi- disease prediction in children and compare if better accuracy
zation of datasets has a positive impact on the accuracy can be achieved. If features are properly managed, then there
improvement of most of the classifiers, and some classifiers will be significant performance in the classification of heart
show an accuracy improvement of up to 8.78%, which is a disease prediction. In future studies, the outcomes of our
huge performance improvement. proposed methods will serve as the standard performance
By comparing the classifiers’ accuracy on the normal and results on heart disease.
standardized datasets, we observed an improvement in the
accuracy of most of the classifiers. +erefore, the stan- Data Availability
dardization of the dataset is a useful technique for accuracy
improvement before applying machine learning classifiers. +e data used to support the findings of this study are
Similarly, we have observed a significant accuracy im- available from the corresponding author upon request.
provement after hyperparameter tuning of the classifiers.
+erefore, algorithm tuning is also a useful technique for
Conflicts of Interest
improving the accuracy of the algorithms. From the com-
parison of different classifiers, we conclude that XGB and ET +e authors declare that they have no conflicts of interest.
classifiers show overall good accuracy. However, SVM shows
the best accuracy in tuning the hyperparameters and
achieved an accuracy of 96.72%. References
[1] Cardiovascular Diseases (Cvds), “World health organization,”
7. Section VII: Conclusions https://www.who.int/news-room/fact%20sheets/detail/cardio
vascular-diseases-(cvds).
+e drawback of the prior proposed systems is that their [2] A. K. Dwivedi, S. A. Imtiaz, and E. R. Villegas, “Algorithms for
operation is considerably reduced if the size of the dataset is automatic analysis and classification of heart sounds - a
increased. +e main problem with machine learning is that a systematic review,” IEEE Access, vol. 7, 2019.
dataset cannot be classified efficiently, although it can be [3] A Coronary, “Heart disease,” Available from: https://www.
enhanced if the [43] attributes of the dataset are efficiently aihw.gov.au/reports/australias-health/coronaryheart-disease,
extracted. Another flaw is that the classifier prediction ac- 2020.
curacy improves with increasing dataset magnitude, however, [4] L. A. Allen, L. W. Stevenson, K. L. Grady et al., “Decision
after a certain point, increasing dataset magnitude has a making in advanced heart failure: a scientific statement from
the American heart association,” Circulation, vol. 125, no. 15,
negative impact on the classifier prediction accuracy.
pp. 1928–1952, 2012.
According to the proposed method, using machine learning [5] S. Ghwanmeh, A. Mohammad, and A. Al-Ibrahim, “Inno-
techniques for heart disease prediction improves accuracy vative artificial neural networks-based decision support sys-
and minimizes the cost factor. We have used different clas- tem for heart diseases diagnosis,” Journal of Intelligent
sifiers of machine learning to classify the prediction of heart Learning Systems and Applications, vol. 5, no. 3, Article ID
disease, including an accuracy of 96.72% achieved by SVM. 35396, 2013.
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Mobile Information Systems
[6] Q. K. Al-Shayea, “Artificial neural networks in medical di- [21] R. K. Sinha, Y. Aggarwal, and B. N. Das, “Backpropagation
agnosis,” Int. J. Comput. Sci. Issues, vol. 8, no. 2, pp. 150–154, artificial neural network classifier to detect changes in heart
2011. sound due to mitral valve regurgitation,” Journal of Medical
[7] A. Tsanas, M. A. Little, P. E. McSharry, and L. O. Ramig, Systems, vol. 31, no. 3, pp. 205–209, 2007.
“Nonlinear speech analysis algorithms mapped to a standard [22] A. Kandaswamy, C. S. Kumar, R. P. Ramanathan,
metric achieve clinically useful quantification of average S. Jayaraman, and N. Malmurugan, “Neural classification of
Parkinson’s disease symptom severity,” Journal of 'e Royal lung sounds using wavelet coefficients,” Computers in Biology
Society Interface, vol. 8, no. 59, pp. 842–855, 2011. and Medicine, vol. 34, no. 6, pp. 523–537, 2004.
[8] S. I. Ansarullah and P. Kumar, “A systematic literature review [23] I. Ketut Agung Enriko, M. Suryanegara, and D. Agnes
on cardiovascular disorder identification using knowledge Gunawan, Heart Disease Prediction System Using K-Nearest
mining and machine learning method,” International Journal Neighbor Algorithm with Simplified Patient’s Health Param-
of Recent Technology and Engineering, vol. 7, no. 6S, eters, Springer, Berlin, Germnay, 2016.
pp. 1009–1015, 2019. [24] K. Subhadra and B. Vikas, “Neural network based intelligent
[9] S. Nazir, S. Shahzad, S. Mahfooz, and M. Nazir, “Fuzzy logic system for predicting heart disease,” International Journal of
based decision support system for component security Innovative Technology and Exploring Engineering, vol. 8, no. 5,
evaluation,” 'e International Arab Journal of Information pp. 484–487, 2019.
Technology, vol. 15, no. 2, pp. 224–231, 2018. [25] S. N. Khan, N. M. Nawi, A. Shahzad, A. Ullah, and
[10] S. Ghwanmeh, A. Mohammad, and A. Al-Ibrahim, “Inno- M. F. Mushtaq, “Comparative analysis for heart disease
vative artificial neural networks-based decision support sys- prediction,” International Journal on Informatics Visualiza-
tem for heart diseases diagnosis,” Journal of Intelligent tion, vol. 1, no. 4-2, pp. 227–231, 2019.
Learning Systems and Applications, vol. 5, no. 3, pp. 176–183, [26] M. Tarawneh and O. Embarak, “Hybrid approach for heart
2013. disease prediction using data mining techniques,” Acta Sci-
[11] F. M. J. M. Shamrat, M. A. Raihan, A. K. M. S. Rahman, entific Nutritional Health, vol. 3, no. 7, pp. 147–151, 2019.
I. Mahmud, and R. Akter, “An analysis on breast disease [27] S. Anitha and N. Sridevi, “Heart disease prediction using data
prediction using machine learning approaches,” ’’ Int. J. Sci. mining techniques,” Journal of Analysis and Computation,
Technol. Res.vol. 9, no. 2, pp. 2450–2455, 2020. vol. 8, no. 2, pp. 48–55, 2019.
[12] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification [28] A. Jagtap, P. Malewadkar, O. Baswat, and H. Rambade, “Heart
of significant features and data mining techniques in pre- disease prediction using machine learning,” International
dicting heart disease,” Telematics and Informatics, vol. 36, Journal of Research in Engineering, Science and Management,
pp. 82–93, 2019. vol. 2, no. 2, pp. 352–355, 2019.
[13] J. Mackay and G. A. Mensah, “+e atlas of heart disease and [29] U. N. Dulhare, “Prediction system for heart disease using
stroke,” Techical Report, World Health Org., Geneva, naı̈ve bayes and particle swarm optimization,” Biomedical
Switzerland, 2004. Research, vol. 29, no. 12, pp. 2646–2649, 2018.
[14] I. D. Mienye, Y. Sun, and Z. Wang, “An improved ensemble [30] J. K. Kim and S. Kang, “Neural network-based coronary heart
learning approach for the prediction of heart disease risk,” disease risk prediction using feature correlation analysis,”
Informat. Med. Unlocked, vol. 20, Article ID 100402, 2020. Journal of Healthcare Engineering, Article ID 2780501, 2017.
[15] C. B. C. Latha, S. C. Jeeva, and S. Carolin Jeeva, “Improving [31] K. C. Siontis, P. A. Noseworthy, Z. I. Attia, and A. Paul, “Artificial
the accuracy of prediction of heart disease risk based on intelligence-enhanced electrocardiography in cardiovascular
ensemble classification techniques,” Informatics in Medicine disease management,” Nature Reviews Cardiology, vol. 18, 2021.
Unlocked, vol. 16, Article ID 100203, 2019. [32] P. S. Linda, W. Yin, P. A. Gregory, Z. Amanda, and
[16] M. Manur, A. Kumar Pani, and P. Kumar, “A prediction G. Margaux, “Development of a novel clinical decision
technique for heart disease based on long short term support system for exercise prescription among patients with
memory recurrent neural network,” International Journal multiple cardiovascular disease risk factors,” Mayo Clinic
of Intelligent Engineering and Systems, vol. 13, no. 2, Proceedings: Innovations, Quality & Outcomes, vol. 5, no. 1,
pp. 31–33, 2020. pp. 193–203, 2021.
[17] M. Abdar, W. Ksia˛żek, U. R. Acharya, R.-S. Tan, [33] Y. Ali, R. Amir, and A.-M. Fardin, “Profile-based assessment
V. Makarenkov, and P. Pławiak, “A new machine learning of diseases affective factors using fuzzy association rule
technique for an accurate diagnosis of coronary artery dis- mining approach: a case study in heart diseases,” Journal of
ease,” Computer Methods and Programs in Biomedicine, Biomedical Informatics, vol. 116, Article ID 103695, 2021.
vol. 179, Article ID 104992, 2019. [34] P. E. Rubini, C. A. Subasini, A. V. Katharine, V. Kumaresan,
[18] I. Yekkala, S. Dixit, and M. A. Jabbar, “Prediction of heart S. G. Kumar, and T. M. Nithya, “‘A cardiovascular disease
disease using ensemble learning and Particle Swarm Opti- prediction using machine learning algorithms,” Annual Ro-
mization,” in Proceedings of the 2017 International Conference manian Society Cell Biology, vol. 25, no. 2, pp. 904–912, 2021.
on Smart Technologies for Smart Nation (SmartTechCon), [35] D. Shah, S. Patel, and S. Kumar Bharti, Heart Disease Pre-
Bengaluru, India, August 2017. diction Using Machine Learning Techniques, Springer Nature
[19] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification Singapore Pte Ltd, Berlin, Germany, 2020.
of significant features and data mining techniques in pre- [36] A. Singh and R. Kumar, “Heart disease prediction using
dicting heart disease,” Telematics and Informatics, vol. 36, machine learning algorithms,” International Conference on
pp. 82–93, 2019. Electrical & Electronics Engineering, pp. 452–457, 2020.
[20] G. E. Guraksin, U. Ergun, and O. Deperlioglu, “Classification [37] J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor,
of the heart sounds via artificial neural network,” Interna- “Heart disease identification method using machine learning
tional Journal of Reasoning-Based Intelligent Systems, vol. 2, classification in E-healthcare,” IEEE Access, vol. 8,
no. 3-4, pp. 272–278, 2010. pp. 107562–107582, 2020.
9071, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/1410169, Wiley Online Library on [20/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Mobile Information Systems 9