An Optimal Deep Learning Approach For Breast Cancer
An Optimal Deep Learning Approach For Breast Cancer
To cite this article: Meena L. C & Joe Prathap P. M (27 Nov 2024): An optimal deep learning
approach for breast cancer detection and classification with pre-trained CNN-based
feature learning mechanism, Journal of Biomolecular Structure and Dynamics, DOI:
10.1080/07391102.2024.2430454
CONTACT Meena L. C [email protected] Department of Computer Science and Engineering, R.M.D. Engineering College, Tiruvallur, India
� 2024 Informa UK Limited, trading as Taylor & Francis Group
2 M. L. C AND J. P. P. M
performance by integrating the following advantages: auto the input data because the traditional classification systems
mated feature learning, dealing with complex and large data, fail to learn sequential information from the input data. The
generalization, scalability and enhanced performance (Chugh optimal version of LSTM helps improve classification per
et al., 2021). Also, the models generate higher classification formance for BC detection by minimizing the loss function.
outcomes when trained on large amounts of highly anno Additionally, our model uses an optimal feature selection sys
tated data (Çayır et al., 2022; Lotter et al., 2021). The widely tem to select essential features from the extracted feature
used DL models in prediction-related tasks include convolu set, which diminishes the network training time and
tion neural network (CNN), recurrent neural network (RNN), improves the prediction performance of the classifier by
deep neural network (DNN), long short-term memory eliminating irrelevant features for classification. The signifi
(LSTM), etc. cant contributions of the current research work are listed as
Amongst all, CNN is more accurate when imaging modal follows:
ities detect cancer. It automatically learns the hierarchical
features from the images by applying different sizes of con � The system uses DCUNet to segment the cancer-affected
volution kernels. The convolved features are given to pooling regions from the BUI. Combined with the advantages of
layers to reduce the dimensions of the extracted feature sets. DC, more image feature information can be extracted,
Finally, the reduced feature sets are inputted into a fully con and segmentation accuracy can be improved.
nected layer for classification. However, CNN requires lots of � We are using the SCADN-121 to perform feature learning
data to achieve target accuracy, which leads to a higher that helps to attain higher classification performance.
training time for classifying the tumors. Transfer learning, on � The proposed system uses the ECSO algorithm to obtain
the other hand, improves classification performance by only the most optimal features, which helps the classifier make
requiring a minimal amount of data to attain target accuracy. more accurate predictions and removes irrelevant data
The models used millions of images to train on large data from the extracted feature sets.
sets (such as ImageNet). Instead of training a CNN from � The ECSO-LSTM plays a pivotal role in our system, classi
scratch for a specific dataset, the pre-trained weights can fying the image into three categories: benign, malignant,
reduce the required amount of labelled data for training and and normal. The network parameters, including weights
the training time for the specific task. Also, the pre-trained and bias, are finely tuned using the ECSO algorithm,
CNN models learn deep hierarchical features from diverse ensuring optimal prediction outcomes. The utilization of
datasets. The knowledge is transferred to the specific task by comprehensive multi-phase techniques, such as prepro
tuning these models on the specific dataset, which results in cessing, segmentation, feature learning, selection, and
better generalization performance, mainly when limited data classification, helps to attain robust performance in BC
is utilized. Some commonly used pre-trained CNN models classification.
are UNet, DenseNet, ResNet, AlexNet, GoogleNet, etc. In our
proposed system, we are using two pre-trained models, UNet The rest of the manuscript is organized as follows:
and DenseNet, for two different purposes: segmentation and Section 2 surveys recent works regarding BC segmentation
feature learning. Like feature learning, segmentation of and classification. Section 3 presents a detailed explanation
breast lesions is essential to achieve remarkable performance of the proposed system. Section 4 compares the outcomes
in tumor detection (Vigil et al., 2022). The improved version of the proposed and existing works and discusses the pro
of UNet-based image segmentation avoids the need for posed system’s superiority over existing works. Finally, in sec
handcrafted features or immediate processing steps used in tion 5, the conclusion of the proposed system is given, along
traditional segmentation models, making the model less with future research challenges.
prone to errors and more efficient. Also, the UNet supports
pixel-level segmentation that results in precise and detailed
2. Literature survey
segmentation because the conventional segmentation tech
niques perform regions or boundaries-based segmentation This section surveys the recent methodologies of various
(Rezaei, 2021). authors for BC segmentation and classification using several
Also, our study chose an improved version of DenseNet machine and DL frameworks. It addresses the limitations of
to perform feature learning, which learns high-level and the surveyed models and discusses the solutions offered by
abstract features from the segmented cancer lesions com the proposed system to overcome those challenges.
pared to shallower connections. Also, the densely connected Ali et al. (2023) presented a CNN based BC classification
layers of the model encourage feature reuse and propaga system with the help of meta learner. The system collected
tion, resulting in an overfitting reduction, particularly when the data from the BUSI dataset and then performed prepro
the dataset has limited training data. So, in our study, we are cessing of the collected data samples by carrying out the
using these two variants of pre-trained mechanisms for seg noise removal and contrast enhancement processes that
mentation and feature learning. After feature learning, the improved the quality of the data for classification. The pre
learned features are given to the classifier to detect the can processed data was given to the CNN and metal learner for
cer level of the patients. In our study, we chose the optimal detecting and classifying the BCs as benign, malignant and
LSTM network for classification, which allows the system to normal. The model achieved 90% accuracy, which was better
learn temporal dependencies or sequential information from than previous schemes. Balaha et al. (2022) recommended a
JOURNAL OF BIOMOLECULAR STRUCTURE AND DYNAMICS 3
BC detection framework using hybrid DL and genetic algo diagnosis-BC (WDBC) dataset, and noise removal of the col
rithm. The system collected data from BUSI and then CNN lected images was done to improve the data quality. Then
was utilized for learning the features from the collected data, feature extraction process was performed manually, and the
as well as the parameters of the CNN were optimized using principal-component-analysis and linear discriminate-analysis
the genetic algorithm. Finally, the BC was classified using the (PCA-LDA) model was utilized to perform the feature reduc
transfer learning model, and the system attained a maximum tion. The reduced features were given to the ANFIS for can
area under the curve (AUC) of 0.89% and an accuracy of cer classification, and the system achieved 98.6% accuracy
89.52% for the tested datasets. for the tested dataset.
Pathan et al. (2022) suggested a multi-headed CNN for BC Hirra et al. (2021) suggested a patch-based deep belief
classification. Initially, the system used the WDBC dataset to network (DBN) model for BC classification in histopatho
collect the input breast images and the preprocessing was logical images. The data was collected from the whole slide
performed on the collected data to invert and reshape the histopathology image dataset and then preprocessing was
input images. The preprocessed data was given to the CNN applied to the collected dataset for removing the noises and
for final classification. The method attained an accuracy of extra backgrounds in the input data. Next, the system per
78.97% when using the raw images directly and 81.02% formed a feature extraction process and the learned features
when using the masked images of the collected dataset. were inputtedto the backpropagation neural network for
Arooj et al. (2022) presented a transfer learning model called cancer detection and classification. The system attained a
AlexNet for BC detection and classification. The trained maximum of 86% accuracy, which was satisfactory to the
model was stored in the cloud if the learning conditions previous approaches. Alhussan et al. (2023) presented a CNN
were met by them, otherwise, it was retrained. The experi by combining GoogleNet and dynamic dipper-throated opti
ments were carried out on two different datasets (one con mization for BC classification. The system initially performed
tains ultrasound and the other contain histopathology preprocessing on the BUSI dataset and then the features
images), and the system achieved 99% as the maximum were extracted using the GoogleNet model. The network
accuracy in BC classification. was fine-tuned using the dipper-throated optimization to get
Podda et al. (2022) developed a segmentation and classifi optimal feature extraction outcomes. The most relevant fea
cation scheme for BUI using a fully-automated DL approach. tures were selected from the extracted feature sets using the
Initially, the segmentation of the tumor lesions from the col probabilistic method. Finally, the system used CNN for tumor
lected ultrasound images was carried out using the benign- classification and it attained 98.1% of accuracy for cancer
malignant and lesion-normal ensemble methods. Finally, classification.
CNN was utilized to classify the segmented tumor lesions
into normal, benign, and malignant cancers. The system
2.1. Problem statement
attained a 91% of accuracy in classification and 82% of dice
score in segmentation on the tested image dataset, which The above-mentioned surveys provide satisfactory results,
was competitive with the existing schemes. Cruz-Ramos et al. but they have some limitations.
(2023) presented a DL model for BC detection and classifica
tion from mammogram and ultrasound images. Initially, � The traditional image segmentation approaches used for
tumor lesions were segmented from the mammography and the classification of BC result in a high level of complex
ultrasound images using a manual segmentation procedure. ity, minimal robustness, and a lack of adaptability because
Then deep and hand-crafted features were extracted from of the use of handcrafted features or intermediate proc
the segmented images using DenseNet and breast imaging essing in their steps (Cruz-Ramos et al., 2023). Recently,
reporting and database system. The extracted features were pre-trained CNN models like UNet have been widely used
fed into the classifiers such as multilayer perceptron, in medical image segmentation tasks to produce more
XGBoost, and Adaboost for classification. The system attained accurate segmentation outcomes than the conventional
a precision, recall, f-score, and accuracy of 98%, 98%, 98% segmentation approaches regarding end-to-end learning
and 97.6% for the screening mammography (DDSM) and capability, adaptability, and accuracy. Also, the pre-trained
BUSI datasets. model performs pixel-level segmentation, which helps to
Ayana et al. (2022) introduced an ultrasound BC classifica attain detailed and accurate segmentation, whereas the
tion system using a transfer learning approach. Initially, traditional algorithms perform segmentation at the region
transfer learning was applied to the cancer cell line micro or boundary level.
scopic images, which learned the features similar to the � Only a few researchers focus on the ML algorithm for BC
ultrasound images to change the natural domain into a classification (Cruz-Ramos et al., 2023; Preetha & Jinny,
microscopic domain. Finally, CNN was utilized to perform the 2021). The ML system accurately categorizes BC, but it
task of classification on the extracted feature sets. The sys suffers from computational overhead because an enor
tem was tested on MT-small-dataset obtained from the BUSI mous quantity of data is required for training.
and achieved an accuracy of 98.7%, which was higher than Additionally, it employs hand-engineered features, i.e.
the existing methods. Preetha and Jinny (2021) proffered the subject-matter specialists are required for feature engin
BC classification system using an adaptive neuro inference eering, leading to performance variability and a lack of
system (ANFIS). The data was collected from the Wisconsin- consistency (Hirra et al., 2021; Preetha & Jinny, 2021).
4 M. L. C AND J. P. P. M
It enhances the intensity level of the images by separating Where floorðÞ indicates rounds down to the nearest inte
the dominant intensity levels from the images. Let assume ger. The filtered and enhanced image quality improves the
000
^
€ 00 as the denoised image and the normalized
An ðx, yÞ and H diagnosis accuracy by offering richer visualization of func
^ 000 tional structures and abnormalities, diminishing noise and
histogram of An ðx, yÞ, where the intensity of the pixel val artefacts, and enlightening the performance of numerical
ues ranges from 0 to ^L − 1: Here ^L indicates the number of analysis techniques. It also influences image segmentation by
possible intensity values that is frequently 256. The computa allowing more detailed boundary detection, improved fea
€ 00 is mathematically given as follows:
tion of H ture extraction, minimal variability, and better incorporation
with diagnostic procedures.
€ 00c ¼ NPwi
H (2)
TNp
Where c ¼ 0, 1, ::::::, ^L − 1, NPwi indicates the number of 3.2. Image segmentation
pixels with intensity c, and TNp refers the total number of
pixels. Thus, the histogram equalized image ðCE IM Þ will be The segmentation of cancerous cells from medical images is
mathematically computed by, a necessary process, as it isolates the objects from the back
ground and partitions the image into non-overlapping
0 1 regions. Our study develops DCUNet to segment breast
^000
B X, yÞ
An ðx
C tumor portions from the preprocessed data. The UNet is a
CE IM ðx, yÞ ¼ floor@ð^L − 1Þ € 00 c A
H (3) two-dimensional CNN architecture consisting of three blocks:
c¼0
the encoder (downsampling), decoder (up sampling), and
6 M. L. C AND J. P. P. M
skip-connection. The encoder block diminishes the possibil ordinary convolution kernel. A filter k is applied according to
00
ities of overfitting, expands the receptive field, and speeds Equation (4) for every location d on the output ^F DI and
up the computation by enhancing the model’s resistance. each kernel size ks when DC is executed to a 2-dimensional
00
The decoder performs the re-decoding of the abstract fea feature map ^FCL :
tures to the original size of the image. Skip connections, con 00 X 00 � �
^FDI ½d� ¼ ~ DR � ks k½d�
^FCL d þ R (4)
versely, improve segmentation accuracy by connecting every
ks
layer in a feed-forward manner. Each layer receives the fea
ture maps of the previous layers. This information is passed Where the dilation rate ðR ~ DR Þ is equal to the stride at
to the subsequent layers to perform feature reuse, which which the input image is sampled. This process is similar to
00
avoids the problem of vanishing gradients by offering mul convolving the input ^FCL with the up-sampled filters attained
tiple paths for gradient flow in training. However, the max ~ DR − 1Þ zeroes between two consecutive filter
by inserting ðR
pooling layer is used in the encoder block to reduce the spa values along each spatial dimension. Each DC is followed by
tial dimensions of the extracted feature sets after the convo a ReLU activation function ðn�AF Þ that conveys non-linearity
lution process, sometimes leading to spatial information loss into the network for image generalization in training.
of the input data. However, this limitation of the UNet model n�AF ðPRIM Þ ¼ maxð0, PRIM Þ (5)
is avoided with the help of skip connections that combine
the encoder blocks’ high-level features with the decoder Where PRIM refers to the preprocessed image. After the
block’s low-level features. So, the network preserves the DC process, the network uses a 2 � 2 max pooling operation
spatial information of the target masks. However, the UNet to reduce the feature maps’ spatial dimensions (height and
convolutions have a small receptive field, and encoder breadth). After each max pooling operation, the convolution
down-sampling may cause the pixels’ correlation to deterior layer’s filters are doubled with an initial kernel size of 32.
ate. Our study includes dilated convolution (DC) in UNet to This process is repeated four times to extract the features
obtain more expansive receptive fields without lowering the from the imaging regions.
resolution. Thus, the system is named DCUNet. The structural
design of the DCUNet is shown in Figure 2.
Step 2: Decoder
The decoder up-samples the feature maps with the help of
Step 1: Encoder 2 � 2 transposed convolution and two 3 � 3 convolutional
Initially, the preprocessed image is passed into the convolu operations, which are repeated four times. The decoder uses
tional layer in the encoder part. The encoder is a convolu skip connections to link the layers of the decoder blocks
tional block containing two 3 � 3 convolutional kernels, with the preceding outputs to prevent data loss from the
which move across the input image’ receptive field to preceding levels. The skip connections also improve results
retrieve the image features. The kernel moves throughout and model convergence. The final decoder is passed through
the entire image iteratively to generate the feature maps a 1 � 1 convolution to obtain the segmented lesions. To
from them. After that, the obtained feature maps from the reduce the impact of interclass similarity while encoding
convolution block are passed to the DC layer, which per breast images, the loss is computed using the dice coeffi
forms image sampling with an interval and dilation rate of cient, which measures the overlap between two masks using
−1 and incorporates the expansion coefficient to the the equation below (Kumar et al., 2022).
Table 2. Parameter setting of SCADN-121 network. � The number of available host nests is fixed, and there is a
Layers Outside size Settings probability that the host bird will find a cuckoo egg.
Convolution 112 � 112 3 � 3 conv, stride2
Pooling 56 � 56 3 � 3 max pool, stride 2 Apart from its advantages, the model needs help with
h i
Dense Block 1 56 � 56 1x1 conv
x6 premature convergence problems and local optimal solu
3x3 conv
Transition layer 56 � 56 1 � 1 conv, 3 � 3 conv, 3 � 3 conv, tions. So, our study uses a tent chaotic map (TCM) based ini
1 with SCA 3 � 3 conv, 1 � 1 conv, tialization strategy to improve the diversity of the algorithm.
2 � 2 average pool, stride2
Dense Block 2 28 � 28
h
1x1 conv
i The small changes in the initial conditions of the algorithm
x12 using TCM lead to meaningfully different outcomes, which is
3x3 conv
Transition layer 28 � 28 1 � 1 conv, 3 � 3 conv, 3 � 3 conv, helpful to performing exploration in CSO because it aids in
2 with SCA 3 � 3 conv, 1 � 1 conv,
2 � 2 average pool, stride2 covering a widespread range of the solution space. In add
h i ition, the boundary mutation is included in CSO to prevent
Dense Block 3 14 � 14 1x1 conv
x24
3x3 conv the algorithm from local optimal solutions by expanding sol
Transition layer 14 � 14 1 � 1 conv, 3 � 3 conv, 3 � 3 conv,
3 with SCA 3 � 3 conv, 1 � 1 conv,
utions, promoting solution space’ exploration, preventing
2 � 2 average pool, stride2 early convergence, and balancing exploration and exploit
h i
Dense Block 4 7�7 1x1 conv ation. This eventually upsurges the probability of discovering
x32
3x3 conv better solutions to optimization problems. So, the TCM and
Prediction Layer 15 � 1 ReLU activation, global average
pooling 2D, SoftMax layer BOM improvisations in conventional CSO are termed ECSO.
The algorithmic procedures for feature selection are
explained as follows:
3.4. Feature selection
The system uses ECSO to choose the optimal set of features
Step 1: Population initialization
from the extracted feature maps, avoiding overfitting BC classi
Initially, the generation of population is done using TCM for
fication by eliminating the irrelevant features extracted from
n hosts, which enhances population diversity and decreases
SCADN-121. The CSO is a novel population-based metaheuris
the impact of the initial population distribution. It is math
tic search method that mimics cuckoo birds’ reproductive
ematically expressed as follows:
behaviour and uses random search, evaluation, and replace 8
ment iteratively to enhance solutions to optimization prob _ >
<
_
:::
_
:::
::: 2Z l � 0 � Z � 0:5
lems. The algorithm provides numerous benefits over another Z lþ1 ¼ � _ _ (12)
optimization system, which makes it a valuable model for par >
: 2 1 − ::: :::
Z l , 0:5 < Z � 1
ameter optimization in LSTM. One of the significant benefits of _
:::
using CSO is its simple implementation process compared to Where Z lþ1 refers to the initial population of the individ
complex optimization techniques. Also, the method requires ual using TCM and at iteration l:
fewer resources to fine-tune the network parameters, reducing
computation overhead. The following are the central princi
Step 2: Fitness calculation
ples of CSO for performing optimization.
Compute each individual’s fitness in the population by
considering the classifier’s accuracy, computed using
� Each cuckoo only lays one egg at a time, and the egg is Equation (13). The individual attaining higher accuracy (i.e.
dropped into another bird’s nest. minimal loss) is the best in the current iteration.
� The nest, which has high-quality eggs, is utilized for the
00 Þ
next generation. Fitness ¼ MaxðYacc (13)
JOURNAL OF BIOMOLECULAR STRUCTURE AND DYNAMICS 9
8 p
Table 3. Dataset description. _p
_ p > _
< :::
_
::: :::
_
:::
Number of samples Normal Benign Malignant Total ::: Z l þ Z max, j − v: Randð0, 1Þ, if Z l > Z max, j
Z lþ1 ¼ _p _ _p _
Training 92 311 135 538 >
: ::: ::: ::: :::
Testing 40 118 75 233 Z l þ Z min, j − v: Randð0, 1Þ, if Z l < Z min, j
Total 132 429 210 771 (15)
_
:::
Where Z max, j indicates_ the decision variable’s upper
:::
00
Where Yacc indicates the accuracy that is computed by bound with dimension j, Z min, j refers to the decision varia
dividing the number of accurate forecasts by the total num ble’s lower bound with dimension j, and v indicates the con
ber of forecasts and it is evaluated as: trol parameter. The above steps are continued until the
VRþ þ VR− optimum solutions are found (i.e. optimal features). The
Y 00acc ¼ (14) pseudocode of the ECSO is given in Figure 4.
Tpn
Where Tpn indicates the total count of samples, VRþ and VR−
denotes true positive and true negative rates of the classifier 3.5. Classification
respectively. Finally, the optimally selected features are fed into the ECSO-
LSTM to classify the cancers into three class levels: normal,
Step 3: Position updating benign, and malignant. The feature maps from the pre-
The position updating of CSO is generally done using levy trained CNN model are changed into a sequential format by
flight, which results in individuals’ poor searching ability due considering each feature map as a time step in a sequence
to the utilization of unified boundary limitation to control that represents the temporal evolution of features across the
boundary violations of constraints. To solve this problem,_ p the ultrasound images. LSTM processes this sequence and learns
:::
proposed system uses BOM to update the position Z lþ1 of the temporal dependencies and patterns within the feature
the cuckoo p, which is expressed as maps. After processing all feature maps, the network outputs
10 M. L. C AND J. P. P. M
a sequence of hidden states or the final hidden state, which the sigmoid and tanh activation functions, in which sigmoid
encodes the temporal information of the feature maps. The controls the flow of information via the gates and tanh regu
network finally includes a fully connected layer to perform lates the input modulation and output generation.
the final classification task to obtain benign, malignant and Finally, weighted focal loss (Equation (22)) is used to find
normal tumor. However, initializing the network’s parameters the prediction loss that solves the class imbalance problem
(weights and bias) is crucial since randomly initialized param of the BUSI datasets by combining the ideas of class weight
eters affect the classifier’s performance and speed of conver ing and focal loss. Using this function, the higher weights
gence during network training. The random initialization of are assigned to minority classes and focal loss is applied to
network parameters also lengthens the training time. So, our emphasize hard-to-classify examples within each class that
study uses the ECSO algorithm to optimally select the solves class imbalance. It makes the network learn better
parameters (weight and bias) of the LSTM model that representations for all classes, causing enhanced perform
improve the network performance by minimizing the classifi ance on imbalanced datasets.
cation loss. Thus, the optimal parameter selection in the clas $ 00 k � �
sical LSTM using ECSO is named ECSO-LSTM. The structure of WFL ¼ −W PC class 1 − AC class log 1 − PC class
$ 00 �k � (22)
the LSTM model is shown in Figure 5. −B 1 − PC class AC class log PC class
The LSTM includes four gates: input, memory, forget, and
_ _
Where PC class refers to the predicted class, AC class indicates
output gates. Let assume C Gk as the LSTM’s memory unit, the actual class, and k signifies the focusing parameter.
which memorizes information of all sequences up to time k,
_ _ s
C G k refers to the update value of the memory call, OF k sig 4. Results and discussion
nifies the input selected feature set at time k, H^0k indicates
_ _ This section discusses and compares the outcomes of the
the hidden layer output. The input gate I Gk regulates the proposed and existing models for BC segmentation and clas
_ _
stored information, the output gate O Gk regulates the out sification when tested on the BUSI dataset. The suggested
_ _ work is implemented in Python programming language with
put information, and the forget gate F Gk regulates the for
64-bit Windows 10 OS and Intel (R) Xeon (R) Silver 4210 CPU
getting of previous information. The mathematical
@ 2.20 GHz (2 processors) with 128GB RAM. The proposed
formulations of the LSTM information flow are given by:
system uses the BUSI dataset to train and test the efficacy of
� h i $ � the proposed work, which is publicly available and accessed
_ _ $ 00 s
F Gk ¼ n�AF ^
W __ : H k−1 , OF k þ B00__
0 (16) via https://www.kaggle.com/datasets/aryashah2k/breast-ultra
FG FG
sound-images-dataset. The breast ultrasound images were
� h i $ �
_ _ $ 00 s obtained from 600 female patients aged between 25 and 75,
I Gk ¼ n�AF ^
W __ : H k−1 , OF k þ B00__
0 (17)
IG IG and the data was collected in 2018. The description of the
� dataset with training and testing ratio is given in Table 3.
$ 00
h i $ �
_ _ S The sample dataset images are shown in Figure 6 and the
C Gk ¼ 8�AF W __ : H^0 k−1 , OF k þ B00__ (18)
CG CG output attained by the proposed system in preprocessing
� � � �
_ _ _ _ _ _ _ _ _ _ and segmentation for both benign (a) and malignant cancers
C G k ¼ F Gk :C G k−1 þ C G : I Gk (19) (b) are shown in Figure 7.
� h i $ �
_ _ $ 00 s
O Gk ¼ n� W __ : H^0 k−1 , OF þ B00__
AF k (20)
OG OG
� � 4.1. Performance analysis
_ _ _ _
H^0 k ¼ O G k 8�AF C G k (21)
$ 00 $ 00 $ 00 $ 00 $ $ $ $
Here, the outcomes of the proposed ECSO-LSTM are investi
Where W __ , W __ , W __ , W __ and B00__ , B00__ , B00__ , B00__ gated against the existing classifiers such as LSTM, deep
FG IG OG CG FG IG OG CG
indicates the optimal weights and biases for input matrices of neural network (DNN), recurrent neural network (RNN), and
the forget gate, input gate, output gate, and cell state, respect random forest (RF) approaches regarding the classification
ively selected by the ECSO algorithm and n�AF and 8�AF refers to metrics say accuracy, precision, recall, f-measure, false
JOURNAL OF BIOMOLECULAR STRUCTURE AND DYNAMICS 11
positive rate (FPR), false negative rate (FNR), area under on true positive, true negative, false positive and false nega
curve (AUC), receiver operating characteristic curve (ROC) tive rate of the classifier obtained for predicting the BC from
and classification time. These metrics are computed based the dataset images. The time complexity and convergence
12 M. L. C AND J. P. P. M
performance of our proposed system is also discussed at the one because the proposed one achieves 99.86% accuracy,
end. The confusion matrix of the proposed system for three 99.57% precision, 99.96% recall, and 99.79% f-measure for BC
class labels in the dataset is given in Figure 8, which shows classification. Thus, the overall outcomes show that the pro
that the model effectively predicts cancers with a higher pre posed method achieves superior performance compared to
diction rate. the existing methods. The diagrammatic representation of
Table 4 shows the outcomes attained by the proposed Table 1 is given in Figure 9.
and existing models for BC classification regarding accuracy, Next, the outcomes of the proposed work with existing
precision, recall, and f-measure. The existing LSTM offers methods in terms of FPR and FNR are shown in Figure 10.
97.98% accuracy, 97.28% precision, 98.04% recall, and FPR is the percentage of the total predictions incorrectly
97.79% f-measure; the existing DNN provides 95.57% accur labelled as positive despite being negative cases. The ratio of
acy, 95.17% precision, 95.67% recall, and 95.36% f-measure, false negatives to the total of false and true positives is
the existing RNN proffers 93.78% accuracy, 93.26% precision, known as FNR. Concerning the FPR metric, the existing
93.82% recall, and 93.68% f-measure, and the existing RF LSTM, DNN, RNN, and RF have 0.085, 0.108, 0.642, and 0.903,
provides 91.67% accuracy, 91.18% precision, 91.78% recall, higher than the proposed one. It is considered a better sys
and 91.58% f-measure; these existing LSTM, DNN, RNN, and tem if the system shows fewer FPR and FNR values. Herein,
RF provides lesser prediction outcomes than the proposed the proposed system also has a lower FPR value of 0.021.
Likewise, concerning the FNR metric, the proposed one has
0.065 FNR, which is lower than the existing methods because
the existing methods have an FNR of 0.326, 0.261, 0.299, and
0.342. Thus, Figure 6 concludes that the proposed one per
forms better than the existing models in detecting and clas
sifying breast tumors from BUI. Next, the outcomes are
compared based on AUC and classification time, shown in
Figure 11.
The system performed better at differentiating between
the positive and negative classes, as shown by a higher AUC.
When the classifier receives an AUC score of 1, it can differ
Figure 8. Confusion matrix. entiate between the positive and the negative class points.
The proposed one achieves a high AUC of 0.995%, which is
Table 4. Analysis of the proposed and existing methods. higher than the existing methods because the existing LSTM,
Techniques/Metrics (%) Accuracy Precision Recall F-measure DNN, RNN, and RF attained AUC of 0.976%, 0.943%, 0.918%,
Proposed ECSO-LSTM 99.86 99.57 99.96 99.79 and 0.894%, which is lower when compared to the existing
LSTM 97.98 97.28 98.04 97.79 methods. Next, considering the classification time metric, the
DNN 95.57 95.17 95.67 95.36 existing LSTM, DNN, RNN, and RF takes 1.01, 1.95, 2.28, and
RNN 93.78 93.26 93.82 93.68
RF 91.67 91.18 91.78 91.58 2.99 m time to classify the classes as normal, benign, and
malignant, which is higher than the proposed one, because
the proposed one takes just 0.85 m to complete the classifi existing ML technique (Cruz-Ramos et al., 2023) proffers 97.6%
cation process. Figure 12 shows the ROC analysis of the pro accuracy and also the existing (Ayana et al., 2022; Balaha
posed method for 5-fold cross-validations, which shows that et al., 2022; Podda et al., 2022), and (Alhussan et al., 2023)
the method attains higher true positive rates in predicting used CNN-based DL approaches that provide 89.52%, 91%,
BCs. Thus, the overall performance analysis shows that the 98.7%, and 98.1% accuracy for cancer detection, which are
proposed method achieves outstanding outcomes compared lower when compared to the proposed method, because the
to the existing methods. proposed one achieves maximum accuracy of 99.86%. Thus,
the overall experimental results showed that the proposed
one achieves more high-level outcomes than the existing
4.2. Comparative analysis
methods. The reason is that the proposed one initially per
Here, the proposed work is compared with related works in forms the preprocessing on the collected images rather than
the literature which used the BUSI dataset. The models are using raw images directly for segmentation or classification.
compared based on the accuracy they achieved for cancer Next, the proposed system used DCUNet for segmentation
classification, which is shown in Table 5. purposes, which accurately segments the lesions from the
The proposed one provides more promising results for BC preprocessed data using DC and pixel-level segmentation
classification than the existing frameworks. For example, the maps. In addition, the system avoids manual feature learning
14 M. L. C AND J. P. P. M
for classification. Instead, it uses the SCADN-121 network to its dense connectivity, which enables feature reuse and
extract the features from the segmented image. Moreover, it propagation. The features selected using ECSO contribute to
uses the ECSO approach for feature selection, which results in quicker convergence and enhanced classification perform
higher accuracy, minimal training time and overfitting in ance when combined into optimal LSTM. In classification,
classification. Finally, the optimal LSTM helps to produce the nursing convergence by means of measures, says training
optimal results for the classification of BCs by learning loss, validation accuracy, and early stopping criteria, aids
the long-term dependencies in the input data and modelling guarantee the model converges to an optimal solution with
the complex sequential data in the images, which results in out overfitting or underfitting. So, integrating these practical
improved prediction performance. The time complexity ana components in all phases of the proposed system ensures
lysis of the proposed system is shown in Table 6. faster convergence of our model for BC detection.
Like time complexity, convergence is an essential factor in
analysing performance. The proposed system’s convergence
5. Conclusion
rate is influenced by the convergence behaviour of each pro
cess, including U-Net for segmentation, DenseNet for feature This paper proposes an optimal DL-based BC detection sys
extraction, ECSO for feature selection, and optimal LSTM for tem with efficient segmentation and feature learning mecha
classification. The proposed method can achieve rapid and nisms. The system uses the BUSI dataset to train and test the
accurate detection of BC from BUSI by guaranteeing effectual proposed work, and the performance of the proposed ECSO-
convergence of individual components and integrating them LSTM is weighted against the existing LSTM, DNN, RNN, and
effectively. U-Net naturally converges quickly in training RF approaches regarding accuracy, precision, recall, f-meas
because of its symmetric architecture and skip connections. ure, FPR, FNR, AUC, and classification time. Herein, the pro
Likewise, DenseNet’s error rate is usually effectual because of posed system achieves 99.86% accuracy, 99.57% precision,
JOURNAL OF BIOMOLECULAR STRUCTURE AND DYNAMICS 15
99.96% recall, 99.79% f-measure, 0.021 FPR, 0.065 FNR, slide images of breast cancer tissue. Neural Computing and
0.995% AUC, and 0.85 m classification time, which are higher Applications, 34(20), 17837–17851. https://doi.org/10.1007/s00521-022-
07441-9
outcomes than the existing methods. Similarly, the proposed
Chugh, G., Kumar, S., & Singh, N. (2021). Survey on machine learning
work is compared against the existing methods in the litera and deep learning applications in breast cancer diagnosis. Cognitive
ture review; herein, the proposed system achieves better out Computation, 13(6), 1451–1470. https://doi.org/10.1007/s12559-020-
comes than the existing methods. Our model is a 09813-6
comprehensive approach by combining improved versions of Cruz-Ramos, C., Garc�ıa-Avila, O., Almaraz-Damian, J. A., Ponomaryov, V.,
Reyes-Reyes, R., & Sadovnychiy, S. (2023). Benign and malignant
pre-trained models, say DCUNet and SCADN-121, for seg breast tumor classification in ultrasound and mammography images
mentation and feature learning, ECSO for feature selection via fusion of deep learning and handcraft features. Entropy, 25(7), 991.
and optimal LSTM for classification, which learns spatial and https://doi.org/10.3390/e25070991
temporal information from the input data effectively and Das, A., Mohanty, M. N., Mallick, P. K., Tiwari, P., Muhammad, K., & Zhu,
leads to higher accuracy and utility in BC diagnosis tasks. H. (2021). Breast cancer detection using an ensemble deep learning
method. Biomedical Signal Processing and Control, 70, 103009. https://
Our model offers essential potential for enhancing BC detec doi.org/10.1016/j.bspc.2021.103009
tion accuracy, but it also offers challenges associated with Hirra, I., Ahmad, M., Hussain, A., Ashraf, M. U., Saeed, I. A., Qadri, S. F.,
interpretability, complexity, and parameter sensitivity. These Alghamdi, A. M., & Alfakeeh, A. S. (2021). Breast cancer classification
drawbacks will be addressed in the future by considering a from histopathological images using patch-based deep learning mod
resource-efficient approach for BC detection. Also, this work eling. IEEE Access, 9, 24273–24287. https://doi.org/10.1109/ACCESS.
2021.3056516
will be extended to work with other imaging modalities such Khan, S. I., Shahrior, A., Karim, R., Hasan, M., & Rahman, A. (2022).
as MRI, and mammography using another advanced DL MultiNet: A deep neural network approach for detecting breast can
models say transformers or graph based neural networks to cer through multi-scale feature fusion. Journal of King Saud University
further improve the system performance. - Computer and Information Sciences, 34(8), 6217–6228. https://doi.org/
10.1016/j.jksuci.2021.08.004
Kumar, M., Singhal, S., Shekhar, S., Sharma, B., & Srivastava, G. (2022).
Disclosure statement Optimized stacking ensemble learning model for breast cancer detec
tion and classification using machine learning. Sustainability, 14(21),
No potential conflict of interest was reported by the author(s). 13998. https://doi.org/10.3390/su142113998
Liu, H., Cui, G., Luo, Y., Guo, Y., Zhao, L., Wang, Y., Subasi, A., Dogan, S.,
& Tuncer, T. (2022). Artificial intelligence-based breast cancer diagno
Funding sis using ultrasound images and grid-based deep feature generator.
International Journal of General Medicine, 15, 2271–2282. https://doi.
The author(s) reported there is no funding associated with the work fea org/10.2147/IJGM.S347491
tured in this article. Lotter, W., Diab, A. R., Haslam, B., Kim, J. G., Grisot, G., Wu, E., Wu, K.,
Onieva, J. O., Boyer, Y., Boxerman, J. L., Wang, M., Bandler, M.,
Vijayaraghavan, G. R., & Gregory Sorensen, A. (2021). Robust breast
References cancer detection in mammography and digital breast tomosynthesis
using an annotation-efficient deep learning approach. Nature
Alhussan, A. A., Eid, M. M., Towfek, S. K., & Khafaga, D. S. (2023). Breast Medicine, 27(2), 244–249. https://doi.org/10.1038/s41591-020-01174-9
cancer classification depends on the dynamic dipper throated opti Madani, M., Behzadi, M. M., & Nabavi, S. (2022). The role of deep learning
mization algorithm. Biomimetics, 8(2), 163. https://doi.org/10.3390/ in advancing breast cancer detection using different imaging modal
biomimetics8020163 ities: A systematic review. Cancers, 14(21), 5334. https://doi.org/10.
Ali, M. D., Saleem, A., Elahi, H., Khan, M. A., Khan, M. I., Yaqoob, M. M., 3390/cancers14215334
Farooq Khattak, U., & Al-Rasheed, A. (2023). Breast cancer classifica Masud, M., Hossain, M. S., Alhumyani, H., Alshamrani, S. S.,
tion through meta-learning ensemble technique using convolution Cheikhrouhou, O., Ibrahim, S., Muhammad, G., Rashed, A. E. E., &
neural networks. Diagnostics, 13(13), 2242. https://doi.org/10.3390/ Gupta, B. B. (2021). Pre-trained convolutional neural networks for
diagnostics13132242 breast cancer detection using ultrasound images. ACM Transactions
Arooj, S., Atta-Ur-Rahman, Zubair, M., Khan, M. F., Alissa, K., Khan, M. A., on Internet Technology, 21(4), 1–17. https://doi.org/10.1145/3418355
& Mosavi, A. (2022). Breast cancer detection and classification empow Michael, E., Ma, H., Li, H., Kulwa, F., & Li, J. (2021). Breast cancer segmen
ered with transfer learning. Frontiers in Public Health, 10, 924432. tation methods: Current status and future potentials. BioMed Research
https://doi.org/10.3389/fpubh.2022.924432 International, 20212021, 9962109–9962129. https://doi.org/10.1155/
Ayana, G., Park, J., Jeong, J. W., & Choe, S. W. (2022). A novel multistage 2021/9962109
transfer learning for ultrasound breast cancer image classification. Nassif, A. B., Talib, M. A., Nasir, Q., Afadar, Y., & Elgendy, O. (2022). Breast
Diagnostics, 12(1), 135. https://doi.org/10.3390/diagnostics12010135 cancer detection using artificial intelligence techniques: A systematic
Balaha, H. M., Saif, M., Tamer, A., & Abdelhay, E. H. (2022). Hybrid deep literature review. Artificial Intelligence in Medicine, 127, 102276. https://
learning and genetic algorithms approach (HMB-DLGAHA) for the doi.org/10.1016/j.artmed.2022.102276
early ultrasound diagnoses of breast cancer. Neural Computing and Pathan, R. K., Alam, F. I., Yasmin, S., Hamd, Z. Y., Aljuaid, H., Khandaker,
Applications, 34(11), 8671–8695. https://doi.org/10.1007/s00521-021- M. U., & Lau, S. L. (2022). Breast cancer classification by using multi-
06851-5 headed convolutional neural network modeling. Healthcare, 10(12),
Botlagunta, M., Botlagunta, M. D., Myneni, M. B., Lakshmi, D., Nayyar, A., 2367. https://doi.org/10.3390/healthcare10122367
Gullapalli, J. S., & Shah, M. A. (2023). Classification and diagnostic pre Podda, A. S., Balia, R., Barra, S., Carta, S., Fenu, G., & Piano, L. (2022).
diction of breast cancer metastasis on clinical data using machine Fully-automated deep learning pipeline for segmentation and classifi
learning algorithms. Scientific Reports, 13(1), 485. https://doi.org/10. cation of breast ultrasound images. Journal of Computational Science,
1038/s41598-023-27548-w 63, 101816. https://doi.org/10.1016/j.jocs.2022.101816
Çayır, S., Solmaz, G., Kusetogullari, H., Tokat, F., Bozaba, E., Karakaya, S., Preetha, R., & Jinny, S. V. (2021). Early diagnose breast cancer with PCA-
€
Iheme, L. O., Tekin, E., Yazıcı, Ç., Ozsoy, G., Ayaltı, S., Kayhan, C. K., LDA based FER and neuro-fuzzy classification system. Journal of
_Ince, U.,
€ Uzel, B., & Kılıç, O. (2022). MITNET: A novel dataset and a Ambient Intelligence and Humanized Computing, 12(7), 7195–7204.
two-stage deep learning approach for mitosis recognition in whole https://doi.org/10.1007/s12652-020-02395-z
16 M. L. C AND J. P. P. M
Priyanka, K. S. (2021). A review paper on breast cancer detection using Sharma, A., & Mishra, P. K. (2022). Performance analysis of machine learn
deep learning. In IOP conference series: Materials science and engineer ing based optimized feature selection approaches for breast cancer
ing (Vol. 1022, No. 1, p. 012071). IOP Publishing. https://doi.org/10. diagnosis. International Journal of Information Technology, 14(4), 1949–
1088/1757-899X/1022/1/012071 1960. https://doi.org/10.1007/s41870-021-00671-5
Ragab, M., Albukhari, A., Alyami, J., & Mansour, R. F. (2022). Ensemble Tong, Y., Liu, Y., Zhao, M., Meng, L., & Zhang, J. (2021). Improved U-net
deep-learning-enabled clinical decision support system for breast can MALF model for lesion segmentation in breast ultrasound images.
Biomedical Signal Processing and Control, 68, 102721. https://doi.org/
cer diagnosis and classification on ultrasound images. Biology, 11(3),
10.1016/j.bspc.2021.102721
439. https://doi.org/10.3390/biology11030439
Vigil, N., Barry, M., Amini, A., Akhloufi, M., Maldague, X. P. V., Ma, L., Ren,
Rezaei, Z. (2021). A review on image-based approaches for breast cancer
L., & Yousefi, B. (2022). Dual-intended deep learning model for breast
detection, segmentation, and classification. Expert Systems with cancer diagnosis in ultrasound imaging. Cancers, 14(11), 2663. https://
Applications, 182, 115204. https://doi.org/10.1016/j.eswa.2021.115204 doi.org/10.3390/cancers14112663
Saber, A., Sakr, M., Abo-Seida, O. M., Keshk, A., & Chen, H. (2021). A novel Zewdie, E. T., Tessema, A. W., & Simegn, G. L. (2021). Classification of
deep-learning model for automatic detection and classification of breast cancer types, sub-types and grade from histopathological
breast cancer using the transfer-learning technique. IEEE Access, 9, images using deep learning technique. Health and Technology, 11(6),
71194–71209. https://doi.org/10.1109/ACCESS.2021.3079204 1277–1290. https://doi.org/10.1007/s12553-021-00592-0