Paper 14-Deep Learning Hybrid With Binary Dragonfly
Paper 14-Deep Learning Hybrid With Binary Dragonfly
Abstract—Breast cancer is the world’s top cancer affecting attention. AI is beneficial in reducing medical human-errors
women. While the danger of the factors varies from a place, (because it minimizes possible errors) that might occur due to
lifestyle, and diet. Treatment procedures after discovering a unskilled doctors [3].
confirmed cancer case can reduce the risk of the disease.
Unfortunately, breast cancers that arise in low and middle- More research is being done on breast cancer diagnosis
income countries are diagnosed at a very late stage in which the using the Wisconsin Breast Cancer Database (WBCD)[4].
chances of survival are impeded and reduced. Early detection is Many methods have been constantly developed to achieve
therefore required not only to improve the accuracy of accurate and efficient diagnosis results and several experiments
discovering breast cancer but also to increase the chances of were performed on the WBCD using multiple classifiers and
making the right decision on a successful treatment plan. There feature selection techniques. Many of them show a good
have been several studies tending to build software models classification accuracy, for example, in [5] the performance
utilizing machine learning and soft computing techniques for criterion of supervised learning classifiers such as Naïve Bayes
cancer detection. This research aims to build a model scheme to (NB), Support Vector Machine (SVM-RBF) kernel, and neural
facilitate the detection of breast cancer and to provide the exact networks (NN) are compared to find the best classifier using
diagnosis. Improving the accuracy of a proposed model has, the dataset (WBCD), and the SVM-RBF has the best outcome
therefore, been one of the key fields of study. The model is based achieving 96.84%. The robustness of the least square Support
on deep learning that intends to develop a framework to Vector Machine (SVM) obtained a classification accuracy of
accurately separate benign and malignant breast tumors. This
98.53% [6]. In [7] Linear Regression achieved an average
study optimizes the learning algorithm by applying the Dragonfly
algorithm to select the best features and perfect parameter values
training accuracy of 96.093%, whereas Multilayer Perceptron
of the deep learning model. Moreover, it compares deep learning (MLP) is 99.038%, Softmax Regression has an average
results against that of support vector machine (SVM), random training accuracy of 97.366573% and the accuracy obtained by
forest (RF), and k nearest neighbor (KNN). Those classifiers are SVM (97.13%) is better than the accuracy obtained by KNN
chosen as they are the most reliable algorithms having a solid [8]. The prediction accuracy of the SVM (linear kernel) in [9]
fingerprint in the field of clinical data classification. reaches 97.14%, an accuracy of 95.71% using RBF kernel, and
Consequently, the hybrid model of deep learning combined with 97.14% using RF classifier for Breast Cancer detection. The
binary dragonfly has accurately classified between benign and accuracy obtained from the system which combines rough set
malignant breast tumors with fewer features. Besides, deep theory with backpropagation neural network in [10] is 98.6%
learning model has achieved better accuracy in classifying on the breast cancer dataset. The first stage handles missing
Wisconsin Breast Cancer Database using all available features. values to obtain a smooth data set and to select appropriate
attributes from the clinical dataset by the indiscernibility
Keywords—Breast cancer; Wisconsin data set; classifiers; deep relation method. The second stage is classification using a
learning; feature selection; dragonfly backpropagation neural network. The algorithm KNN for
classification which is used in [11] with several different types
I. INTRODUCTION
of distances and classification rules is used in the diagnosis and
Breast cancer is the most common cancer in women and, classification of cancer, and these experiments are conducted
overall, the second most leading to death. In 2019, women on the database WBCD. The results advocate the use of the
were diagnosed with an estimated 268,600 new cases of KNN algorithm with both types of Euclidean distance and
invasive breast cancer and approximately 2,670 cases were Manhattan that give the best results (98.70% for Euclidean
diagnosed in men [1]. An accurate diagnosis for various sorts distance and 98.48% for Manhattan with k = 1), these values
of cancer plays a great role for doctors to assist them in are not significantly affected even when k=1 is increased to 50.
determining and choosing the proper treatment. Lately, the SVM and KNN individually used in [12] achieved the accuracy
application of various artificial intelligence (AI) classification of 98.57% and 97.14%, respectively. This work aims to
methods has been proven in aiding doctors to facilitate their automatically design and modify the parameters of the deep
decision-making process [2]. Recently, the use of AI learning model hybrid with the Dragonfly algorithm for breast
classification techniques in the medical field generally and cancer diagnosis.
cancer detection particularly has grabbed the researchers’
*Corresponding Author
114 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
II. MATERIALS AND METHODS reduction [19] is a process used in Data Mining where the
numbers of random variables under consideration are reduced.
A. Machine Intelligence Library An essential step in the efficient analysis of large high-
The software, developed for implementation in this study is dimensional data sets is the reduction of dimensions. PCA
written by using Spyder which is an interactive development performs dimensionality reduction whilst maintaining
environment capable of advanced editing, interactive testing, maximum feasible arbitrariness in the high-dimensional space.
debugging, and introspection for Python (version 3.7 was used) PCA is probably the oldest and certainly the most popular
programming language. Also, Keras [13] neural network API technique for computing lower-dimensional representations of
was used for deep learning in the developed method. It is a multivariate data. The technique is linear in the sense that the
high-level neural network API, supporting Python which can components are linear combinations of the original variables
convert the results rapidly, highly modular, minimalist, and has (features), but non-linearity in the data is preserved for
extensible features. Keras with Google TensorFlow backend is effective visualization. The PCA is a method of statistical data
used to implement the deep learning algorithms in this study, analysis that transforms the initial set of variables into an
with the aid of other scientific computing libraries: matplotlib assorted set of linear combinations, referred to as the principal
[14] is a comprehensive library for creating interactive, and components (PC), with variance-specific properties. This
animated visualizations in Python, NumPy [15] is a library for condenses the system's dimensionality while retaining the
the Python programming language, adding support for big, variable connections information. The analysis is carried out by
multi-dimensional arrays and matrices, along with a huge calculating and analyzing the data covariance matrix on a data
collection of high-level mathematical functions to operate on set, its eigenvalues along with its respective eigenvectors
these arrays, and scikit-learn [16] is a free software machine systematized in descending order.
learning library for the Python programming language, where it
emphasizes several classifications, regression, and clustering E. Classification Techniques
algorithms including support vector machines, k-means, The classification aims to develop a set of models that can
random forests. correctly classify the class of different objects. There are three
types of inputs to such models, which are: (a) a bunch of
B. Dataset Description objects that are described as training data, (b) the dependent
The UCI machine learning repository has been used to variables, and (c) classes that may be a group of variables
download the WBCD [4] for breast cancer classification [17]. describing various characteristics of the objects. Once a
This dataset usage is more common among researchers who classification model is built, it tends to be utilized to classify
utilize machine learning methods for the classification of breast the class of the objects to which class information is
cancer. Each dataset is composed of a set of numerical unidentified [20]. There are numerous sorts of classifiers that
attributes that were assessed by fine needle aspiration (FNA) have been utilized for a cancer diagnosis; some of them are
from human breast cancer tissue. WBCD has 699 instances and NN, SVM, KNN, NB, and RF. They are used to classify cancer
10 attributes including the class attribute. One of the two datasets as malignant and benign tumors.
possible classes is found in each instance; malignant (M) or
benign (B). Every attribute has been represented in the form of 1) Support vector machine: Support vector machine
an integer between 1 and 10. These attributes include: (SVM) classifier is a type of supervised machine learning
(uniformity of cell size, clump thickness, uniformity of cell classification algorithm, it is applied in classifying cancer
shape, single epithelial cell size, marginal adhesion, bare because it is a non-probabilistic binary and nonlinear
nuclei, normal nuclei, bland chromatin, and mitosis). statistical tool which works by separating space into two
C. Data Preprocessing regions by a straight line or hyperplane in higher dimensions.
It examines the data, recognizes the pattern, and classifies the
Preparing data for use in a machine learning (ML)
data based on common attributes by using kernel tricks. The
framework is significant, where data preparation requires at
least 80 percent of the total time expected to create an ML kernel is a set of numerical functions that are used in SVM.
system. Data preparation has three main phases: cleaning, The kernel's function is to take data as an input and convert it
normalizing, and encoding, and splitting. Each of the three into the form necessary. Various kinds of kernel functions
phases has several steps. Equation (1) is used to normalize were utilized by the SVM algorithm. These functions can be
dataset attributes. different types; for example, linear, nonlinear, polynomial,
X−µ radial basis (RBF), and sigmoid functions.
Z= (1) 2) Naïve Bayes: Naïve Bayes (NB) is a probabilistic
σ
Where X represents the dataset attributes, µ represents the classifier based on the Bayes theorem. Rather than predictions,
mean value for each dataset attribute x(i), and σ represents the it produces probability estimates. For the value of each class,
corresponding standard deviation. This normalization it estimates the probability of each given instance belongs to
technique was implemented using the Standard Scaler of scikit- that class. An advantage of the NB classifier is that it requires
learn. a small amount of training data in order to estimate the
D. Principal Component Analysis parameters that are mandatory for classification.
3) Artificial Neural Network: Artificial Neural Network
Principal Component Analysis (PCA) [18] is a dimension (ANN) is a numerical model based on biological neural
reduction method that includes related features. Dimensionality
networks. It comprises an interconnected group of artificial
115 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
neurons, and it processes information employing a insensitive to large variations. The classification issue is an
connectionist approach to the computing process. In most important component in the field of deep learning since it is
cases, an ANN is a robust framework that changes its structure focused on judging a new sample that belongs to which
based on outside or inner data that flows through the network predefined sample category, according to a train set containing
a certain number of known samples. The classification problem
during the learning phase. One of the fundamental advantages
is also called supervised classification, since all samples in the
of ANN over conventional methods is its ability in capturing train set are labeled, and all categories are predefined.
the complex and nonlinear interaction between prognostic
markers and the outcome to be anticipated. The output is defined by the following formula in (2):
4) Random forest (RF): Random forest (RF) algorithm is Y =𝑓(∑𝑗 𝑤𝑗 𝑥𝑗 + 𝑏) (2)
a supervised classification algorithm that creates a forest with
several trees. It is a flexible, easy to utilize machine learning Where w j is the network weights, b is a bias term, and f is a
algorithm that mostly produces a great result. Due to its specified activation function. Figure 3 shows a natural
simplicity, it is also one of the most used algorithms. The extension of this simple model is attained by combining
multiple neurons to form a so-called hidden layer.
more trees in the forest the more robust the forest appears in
general. In the same way in the random forest classifier, the
Artificial
higher the number of trees in the forest indicates the high itillegence
accuracy results.
5) K-nearest Neighbors: K-nearest Neighbors (KNN) is Machine
learning
one of the most used algorithms in machine learning. It is a
method of learning based on instances that do not require a Deep Learning
phase of learning. The model developed is the training sample,
connected to a distance function and the choice function of the
class based on the classes of the nearest neighbors. Before convolutional
classifying a new element, it must be compared with other neural network
116 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
117 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
118 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
1
σ(x) = (11) H. Dropout
1+exp(−𝑥)
Dropout is one of the methods that are utilized to prevent
3) Softmax activation function: The Softmax is a function memorization. In each iteration, it randomly removes some
that turns a vector of K real values into a vector of K real neurons from a layer at a specified rate. The process of dropout
values that sum to 1. The input values can be positive, is shown in Figure 10. They dropped the crossed units out of
negative, zero, or greater than one, but the Softmax transforms the network.
them into values somewhere in the range 0 and 1 as shown in
Figure 9, so that they can be interpreted as probabilities. Large
multi-layer neural networks end in a penultimate layer that
outputs real-valued scores that are not efficiently scaled and
which makes working with them complicated. In the current
study, the Softmax is very helpful as it turns the scores into a
normalized probability distribution. Consequently, it is normal
(a) Standard Neural Network. (b) Network after Dropout.
to append a Softmax function as the final layer of the neural
Fig. 10. Dropout Neural Network Model. (a) A Standard Neural Network. (b)
network. After the Dropout is Applied, the Same Network. Dotted Lines Indicate a
Node that has been Dropped.
I. Optimization
Optimization is a basic issue in the learning process in deep
learning applications. Its techniques are utilized to find the
optimum value in solving non-linear problems. RMSprop,
adagrad, adadelta, adam, adamax. Moreover, there are
differences between each of these algorithms in terms of
performance and speed. In this study, the optimization
algorithm of Adaptive Moment Optimization (Adam) was
applied.
Fig. 7. Relu Activation Function. J. Loss Function
The loss function is a type of function that measures both
the error rate and performance of a designed model. In DL, the
last layer of a NN is the layer where the loss of function is
defined. In DL applications, the function calculates the
dissimilarity between the estimation of the designed model and
the required real value. In case that a model with good
estimation capability is designed, the difference between the
real value and estimated value will be lower. An output of a
higher loss value indicates that the designed model contains
defects. In the literature, there are various loss functions such
as mean squared error, mean absolute percentage error, mean
Fig. 8. Sigmoid Function Activation. squared logarithmic error, hinge, logcosh, sparse categorical
cross-entropy, binary cross-entropy, kullback, poisson, and
many others. In this study, the meager straight out cross-
entropy misfortune work was utilized.
119 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
In the models where training is made by iteration with data, classifier Missing values PCA accuracy
the duration of learning must be terminated at the right time.
RF (100) Mean 97
Otherwise, if training is not stopped, all of the samples in the
data set for training will be memorized by the system. These RF (10) Mean 95
outcomes in a decrease in the capability of estimation of RF (10) Mean 1 98
unknown samples. In case of early termination, the
RF (100) Mean 1 98
performance of the system will decline in that it could not fully
analyze the data. The same outcome will also arise in the case RF (10) Remove 98
of over-training. In case of an overfitting possibility for the RF (100) remove 95
program, a parameter of early stopping was defined; the
KNN (10) Mean 1 97
training will be stopped regardless of the number of iterations.
KNN ( 3) Mean 1 98
VI. RESULTS AND DISCUSSION KNN ( 3) Remove 96
In this section, performance evaluation is discussed using NB Mean 1 97
accuracy which is used as the percentage of correct predictions.
Table 1 shows the comparative study of different classifiers Svm (rbf) Remove 1 99
which easily analyses KNN, RBF, SVM, and NB. Some Svm (rbf) Mean 96
experiments handled missing data by Mean Imputation Svm (rbf) Remove 97
technique and others by Missing Data Ignoring Technique. It
declares that RF gives a better result when using 10 trees, and Svm (rbf) Mean 1 98
KNN with 3 neighbors which reduces the complexity of the Svm (linear) Mean 1 97
model and consumes less processing time. PCA+SVM with
RBF kernel when using missing data ignoring technique TABLE II. DEEP LEARNING RESULTS
considered as a better classifier as compared to others which
achieved 99%. Number Activation
Epochs Dropout accuracy
Of layers functions
A. Deep Learning Usage with the Dataset 2 250 Sigmoid,Softmax 0.5 99
The proposed model utilizes two layers at the start, then
2 100 Sigmoid,Softmax 0.3 98.54
eventually experiments with more layers which have been
observed that the convergence time is larger for deeper 2 100 Sigmoid,Softmax 0.5 97.85
networks. Many parameters control the deep learning model. 2 2000 Relu,Sigmoid 99.3
One of them is the number of hidden layers, if data is less
3 150 Sigmoid,Sigmoid,Softmax 0.3 98
complex and is having fewer features then neural networks
with 1 to 2 hidden layers will work, but if data is having large 3 150 Relu,Relu,Softmax 0.3 97
features, so to get an optimum solution, 3 to 5 hidden layers 3 250 Relu,Relu,Softmax 0.3 97.08
can be used. It should be noticed that increasing hidden layers
3 1000 Relu,Relu,Softmax 0.3 97.08
will also increase the complexity of the model which may
sometimes lead to overfitting. Another one is the number of 3 100 Relu,Sigmoid,Softmax 0.5 98.54
hidden neurons; it should be between the size of the input layer 3 100 Relu,Sigmoid,Softmax 0.3 97.8
and the size of the output layer. It may be 2/3 the size of the
Sigmoid,Sigmoid,Sigmoid
input layer, plus the size of the output layer and it should be 4 250
,Softmax
0.5 99
less than twice the size of the input layer [29]. The experiments
Sigmoid,Sigmoid,Sigmoid
are based on using batch size 16 and 9 neurons in each layer; 4 1000 0.3 98.5
,Softmax
the result is as shown in Table 2.
Sigmoid,Sigmoid,Sigmoid
4 100 0.3 99.3
As shown above, in Table 2, the best accuracy achieved is ,Softmax
99.3% with 2 hidden layers and epochs 2000 while the Sigmoid,Sigmoid,Sigmoid
accuracy reduces to 99% with 250 epochs only. More epochs 4 150 0.3 97.08
,Softmax
mean more iteration and more consumption of time and
Sigmoid,Softmax,Softmax
resources. However, the difference in accuracy is not 4 150 0.3 98.54
,Softmax
significantly considerable to endure more time consumption.
5 15 Sigmoid,…,softmax 93.3
Also, the same accuracy level of 99.3% is attained using 4
hidden layers and only 100 epochs. Besides, plots of the 5 15 Softmax ,…,softmax 94.3
characteristic of the 4 hidden layers model are shown in Figure 5 20 Softmax,…,softmax 95.2
11. In graph (a) the training accuracy visibly increases over
time, until it reaches nearly 95%, while the validation accuracy 5 250 Sigmoid,…,softmax 0.25 96
reaches a plateau at a range of 98–99.3% after 21epochs.
Moreover, the validation loss, presented in a graph (b), reaches
its minimum after 50 epochs and then halts, while the training
loss keeps decreasing exponentially until it drops to nearly 0.
120 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
Number of
population
Activation
accuracy
Fig. 11. (a) Training vs Validation Loss, (b) Training vs Validation Accuracy
iteration
function
features
features
epoch
folds
VII. CONCLUSION
Breast cancer prediction is very significant in the area of
Medicare and Biomedical. This study aims to enhance the
accuracy of the diagnosis of breast cancer with the deep
learning method. Analysis of WBCD with traditional
(b)
classifiers such as NB, SVM, KNN, and RF achieved high
Fig. 12. Two Hidden Layers Model with 2000 Epoch, (a) Training vs accuracy. Proposed a model that predicts breast cancer based
Validation Loss (b) Training vs Validation Accuracy.
on a deep investigation in the performance of different deep
121 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 3, 2021
networks on this dataset. It has been implemented by Python to [11] S. AhmedMedjahed, T. Ait Saadi, and A. Benyettou, “Breast Cancer
be the most effective in classifying the diagnostic data set into Diagnosis by using k-Nearest Neighbor with Different Distances and
Classification Rules,” Int. J. Comput. Appl., vol. 62, no. 1, pp. 1–5, 2013,
the two classes because of the seriousness of cancer; it's found doi: 10.5120/10041-4635.
that the accuracy of the proposed model ranges between 93.5% [12] M. M. Islam, H. Iqbal, M. R. Haque, and M. K. Hasan, “Prediction of
and 99.3%. In the case of the two hidden layers model, the breast cancer using support vector machine and K-Nearest neighbors,”
highest outcomes result with 250 and 2000 epochs are 99% and 5th IEEE Reg. 10 Humanit. Technol. Conf. 2017, R10-HTC 2017, vol.
99.3% respectively. The same result might be obtained with 2018-Janua, no. February 2018, pp. 226–229, 2018, doi: 10.1109/R10-
four hidden layers models and 100 epochs. It is noticed that DL HTC.2017.8288944.
hybrid with DA as a feature selection model achieved an [13] H. Singh, Practical Machine Learning with AWS. 2021.
accuracy of 97.907%. Such comparative analysis of breast [14] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci.
Eng., vol. 9, no. 3, pp. 90–95, 2007, doi: 10.1109/MCSE.2007.55.
cancer classification would provide insights on the efficient
approaches for the detection of cancer problems. [15] S. Van Der Walt, S. C. Colbert, and G. Varoquaux, “The NumPy array: A
structure for efficient numerical computation,” Comput. Sci. Eng., vol.
13, no. 2, pp. 22–30, 2011, doi: 10.1109/MCSE.2011.37.
VIII. FUTURE WORK
[16] H. Li and D. Phung, “Journal of Machine Learning Research: Preface,” J.
The proposed model is applied to numerical data only. It Mach. Learn. Res., vol. 39, no. 2014, pp. i–ii, 2014.
would be interesting to see its behavior when it is applied to [17] L. Vig, “Comparative Analysis of Different Classifiers for the Wisconsin
different types of data available in the medical field such as Breast Cancer Dataset,” OALib, vol. 01, no. 06, pp. 1–7, 2014, doi:
mammograms. In the future, the research may be carried out 10.4236/oalib.1100660.
for a screening of features to diagnose breast cancer tumors. [18] Y. Qu, G. Ostrouchov, N. Samatova, and A. Geist, “Principal Component
Analysis for Dimension Reduction in Massive Distributed Data Sets,”
REFERENCES Work. High Perform. Data Min. Second SIAM Int. Conf. Data Min., no.
[1] S. Chopra and E. L. Davies, “Breast cancer,” Med. (United Kingdom), June 2014, pp. 4–9, 2002.
vol. 48, no. 2, pp. 113–118, 2020, doi: 10.1016/j.mpmed.2019.11.009. [19] N. Varghese, “A Survey Of Dimensionality Reduction And Classification
[2] B. Sahu, S. Mohanty, and S. Rout, “A Hybrid Approach for Breast Methods,” Int. J. Comput. Sci. Eng. Surv., vol. 3, no. 3, pp. 45–54, 2012,
Cancer Classification and Diagnosis,” ICST Trans. Scalable Inf. Syst., doi: 10.5121/ijcses.2012.3304.
vol. 0, no. 0, p. 156086, 2018, doi: 10.4108/eai.19-12-2018.156086. [20] V. Saravanan and R. Mallika, “An effective classification model for
[3] M. Paredes, “Can Artificial Intelligence help reduce human medical cancer diagnosis using micro array Gene expression data,” Proc. - 2009
errors? Two examples from ICUs in the US and Peru,” vol. 2009, pp. 1– Int. Conf. Comput. Eng. Technol. ICCET 2009, vol. 1, pp. 137–141,
12, 2018, [Online]. Available: https://techpolicyinstitute.org/wp- 2009, doi: 10.1109/ICCET.2009.38.
content/uploads/2018/02/Paredes-Can-Artificial-Intelligence-help-reduce- [21] İ. Yıldız and A. T. Karadeniz, “Enhancement Of Breast Cancer Diagnosis
human-medical-errors-DRAFT.pdf. Accuracy With Deep Learning,” Eur. J. Sci. Technol., no. October, pp.
[4] Dr. WIlliam H. Wolberg, “UCI Machine Learning Repository: Breast 452–462, 2019, doi: 10.31590/ejosat.638428.
Cancer Wisconsin (Original) Data Set.” https://archive.ics.uci.edu/ml/ [22] Y. Bengio, Learning deep architectures for AI, vol. 2, no. 1. 2009.
datasets/Breast+Cancer+Wisconsin+%28Original%29 (accessed Dec. 16, [23] M. M. Mafarja, D. Eleyan, I. Jaber, A. Hammouri, and S. Mirjalili,
2020). “Binary Dragonfly Algorithm for Feature Selection,” Proc. - 2017 Int.
[5] S. Aruna, S. P. Rajagopalan, and L. V Nandakishore, “Knowledge Based Conf. New Trends Comput. Sci. ICTCS 2017, vol. 2018-Janua, pp. 12–
Analysis of Various Statistical Tools in Detecting Breast Cancer,” 17, 2017, doi: 10.1109/ICTCS.2017.43.
Comput. Sci. Inf. Technol., vol. 2, pp. 37–45, 2011, doi: [24] H. (National U. of S. Liu, H. (Osaka U. Motoda, R. Setiono, and Z. Zhao,
10.5121/csit.2011.1205. “Feature Selection : An Ever Evolving Frontier in Data Mining,” J. Mach.
[6] K. Polat and S. Güneş, “Breast cancer diagnosis using least square Learn. Res. Work. Conf. Proc. 10 Fourth Work. Featur. Sel. Data Min.,
support vector machine,” Digit. Signal Process. A Rev. J., vol. 17, no. 4, pp. 4–13, 2010.
pp. 694–701, Jul. 2007, doi: 10.1016/j.dsp.2006.10.008. [25] C. S. Yang, L. Y. Chuang, Y. J. Chen, and C. H. Yang, “Feature selection
[7] A. F. M. Agarap, “On breast cancer detection: An application of machine using memetic algorithms,” Proc. - 3rd Int. Conf. Converg. Hybrid Inf.
learning algorithms on the Wisconsin diagnostic dataset,” ACM Int. Technol. ICCIT 2008, vol. 1, pp. 416–423, 2008, doi:
Conf. Proceeding Ser., no. 1, pp. 5–9, 2018, doi: 10.1109/ICCIT.2008.81.
10.1145/3184066.3184080. [26] M. Mafarja, A. A. Heidari, H. Faris, S. Mirjalili, and I. Aljarah,
[8] H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using Machine Dragonfly algorithm: Theory, literature review, and application in feature
Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis,” selection, vol. 811. Springer International Publishing, 2020.
Procedia Comput. Sci., vol. 83, no. Fams, pp. 1064–1069, 2016, doi: [27] Q. Song and M. Shepperd, “Missing data imputation techniques,” Int. J.
10.1016/j.procs.2016.04.224. Bus. Intell. Data Min., vol. 2, no. 3, pp. 261–291, 2007, doi:
[9] P. S. Kohli and A. L. Regression, “2020 IEEE 5th International 10.1504/IJBIDM.2007.015485.
Conference on Computing Communication and Automation, ICCCA [28] D. Berrar, “Cross-validation,” Encycl. Bioinforma. Comput. Biol. ABC
2020,” 2020 IEEE 5th Int. Conf. Comput. Commun. Autom. ICCCA Bioinforma., vol. 1–3, no. April, pp. 542–545, 2018, doi: 10.1016/B978-
2020, pp. 1–4, 2020. 0-12-809633-8.20349-X.
[10] K. B. Nahato, K. N. Harichandran, and K. Arputharaj, “Knowledge [29] F. S. Panchal and M. Panchal, “International Journal of Computer
mining from clinical datasets using rough sets and backpropagation Science and Mobile Computing Review on Methods of Selecting Number
neural network,” Comput. Math. Methods Med., vol. 2015, no. April, of Hidden Nodes in Artificial Neural Network,” Int. J. Comput. Sci. Mob.
2015, doi: 10.1155/2015/460189. Comput., vol. 3, no. 11, pp. 455–464, 2014, [Online]. Available:
www.ijcsmc.com.
122 | P a g e
www.ijacsa.thesai.org