Ensemble Learning Based Features Extraction For Brain MR Images Classifcation With Machine Learning Classifers
Ensemble Learning Based Features Extraction For Brain MR Images Classifcation With Machine Learning Classifers
Abstract
Diagnosing brain tumors is a complex and time-consuming process that relies heavily on radiologists’ expertise and
interpretive skills. However, the advent of deep learning methodologies has revolutionized the field, offering more
accurate and efficient assessments. Attention-based models have emerged as promising tools, focusing on salient
features within complex medical imaging data. However, the precise impact of different attention mechanisms,
such as channel-wise, spatial, or combined attention within the Channel-wise Attention Mode (CWAM), for brain
tumor classification remains relatively unexplored. This study aims to address this gap by leveraging the power
of ResNet101 coupled with CWAM (ResNet101-CWAM) for brain tumor classification. The results show that
ResNet101-CWAM surpassed conventional deep learning classification methods like ConvNet, achieving exceptional
performance metrics of 99.83% accuracy, 99.21% recall, 99.01% precision, 99.27% F1-score and 99.16% AUC on
the same dataset. This enhanced capability holds significant implications for clinical decision-making, as accurate
and efficient brain tumor classification is crucial for guiding treatment strategies and improving patient outcomes.
Integrating ResNet101-CWAM into existing brain classification software platforms is a crucial step towards
enhancing diagnostic accuracy and streamlining clinical workflows for physicians.
Keywords Brain tumor, Deep learning, ResNet101, CWAM, Attention mechanism
Introduction
*Correspondence:
Mohd Asif Shah The brain, which serves as the central command centre
[email protected] of the body, controls bodily functions and plays a vital
1
Department of Computer Science and Engineering, Vel Tech Rangarajan role in maintaining general health. Brain tumours and
Dr. Sagunthala R&D Institute of Science and Technology, Chennai
600062, India other anomalies can present substantial hazards. Malig-
2
Department of Computer Science and Engineering, Faculty of nant tumours, which are characterised by the rapid and
Engineering and Technology, SRM Institute of Science and Technology, aggressive proliferation of cells, provide significant chal-
Ramapuram , Chennai, India
3
School of Electrical and Electronics Engineering, VIT Bhopal University, lenges in terms of management due to their fast growth.
Bhopal, Indore Highway, Kothrikalan, Sehore, Madhya Pradesh Conversely, benign tumours, although less menacing, can
466114, India nonetheless lead to difficulties [1]. Accurate diagnosis
4
School of Computer Science and Engineering, Galgotias University,
Greater Noida 203201, India and treatment planning require a thorough understand-
5
Department of Economics, Kardan University, Parwan-e-Du, Kabul ing of the distinction between malignant and benign
1001, Afghanistan tumours. Progress in medical technology and research
6
Division of Research and Development, Lovely Professional University,
Phagwara, Punjab 144001, India is constantly enhancing the effectiveness of therapies for
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use,
sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included
in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The
Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available
in this article, unless otherwise stated in a credit line to the data.
A.G et al. BMC Medical Imaging (2024) 24:147 Page 2 of 17
brain tumours, leading to better results for patients [2]. elements with CNN principles, such as correlation
The World Health Organisation (WHO) has devised a learning mechanisms (CLM) for deep neural network
classification system for brain tumours, categorising architectures in CT brain tumor detection, achieving an
them into four groups. Tumours classified as Grade I and accuracy rate of 96% [10]. Research in brain tumor image
II are considered lower-grade and have a more favourable classification has also explored the effectiveness of archi-
prognosis. Tumours classified as Grade III and IV are tectures like AlexNet, GoogLeNet, and ResNet50. The
characterised by a more severe nature, displaying aggres- study presents two deep learning models for brain tumor
sive behaviour and resulting in poorer outcomes [3]. classification, ResNet50 and VGG16. ResNet50 has the
Comprehending these grades is essential for clinicians to highest accuracy rate at 85.71%, indicating its poten-
customise treatment methods and offer precise prognosis tial for brain tumor classification [11]. The models were
information. This technique enables healthcare practi- trained on a comprehensive dataset of 3,064 and 152 MRI
tioners to categorise individuals according to the sever- images, sourced from publicly available datasets. The
ity of their tumours, so improving the effectiveness of VGG16 architecture achieved classification accuracies of
treatment and the outcomes for patients. Brain tumours approximately 97.8% and 100% for binary and multiclass
present a substantial risk to life, and precise diagnosis is brain tumor detection, respectively [12].
essential for successful treatment. Magnetic Resonance Nevertheless, additional enhancements are required.
Imaging (MRI) and Computed Tomography (CT) scans, The objective of the work is to incorporate an attention
in conjunction with biopsy procedures and pathologi- mechanism into the brain tumour classification task,
cal examinations, are employed to validate diagnosis [4]. since it has been demonstrated to improve the detec-
MRI is favoured since it is non-invasive. Nevertheless, tion of important characteristics in intricate datasets.
manual examination poses difficulties and inaccuracies. This integration has the potential to enhance accuracy
Computer-Aided Diagnosis (CAD) approaches have rates and minimise misclassifications, resulting in more
transformed the discipline by employing artificial intel- precise diagnoses and better patient outcomes [13]. The
ligence and machine learning. These algorithms aid work offers a potential path for improving and perfect-
neuro-oncologists in the identification, classification, ing algorithms used to classify brain tumours. The author
and grading of tumours, improving diagnostic precision employed the recurrent attention mechanism (RAM)
and optimising workflows [5]. This method enhances model and channel attention mechanism to enhance the
patient outcomes in the intricate realm of brain tumour classification accuracy of biomedical images. According
identification and therapy. The application of deep learn- to [14], the RAM model demonstrated superior perfor-
ing techniques has greatly enhanced computer-assisted mance compared to typical CNNs when dealing with dif-
medical diagnosis, specifically in the detection and clas- ficulties in imaging data.
sification of brain tumours. Transfer learning, a branch The channel attention mechanism, which focuses on
of artificial intelligence, has demonstrated promise in brain tissue spatial distribution, was also integrated into
tasks such as visual categorization, object identification, the classification process. This approach improved the
and image classification [6]. Neuro-oncology researchers accuracy of identifying and categorizing brain tumors
have employed pre-trained networks to extract charac- based on their spatial characteristics. These techniques
teristics from brain MRI scans, resulting in a remarkable offer promising avenues for medical image analysis, lead-
accuracy rate of 98.58%. Convolutional neural network ing to more accurate diagnoses and improved patient
architectures such as AlexNet and Shuffle-Net have been outcomes [15]. This proposed study presents a novel
assessed for their ability to extract features and classify approach to brain tumor classification by combining
data [7]. Convolutional neural networks (CNNs) are cru- deep learning techniques with channel-wise attention
cial in the prediction of brain tumours, as they extract mechanisms. The study focuses on enhancing the accu-
diverse features using convolution and pooling layers. racy and efficiency of brain tumor classification, crucial
Nevertheless, there is a limited availability of attention- for effective diagnosis and treatment planning. Through
based models for the categorization of brain tumours. the fusion of deep learning models and attention mech-
The predominant approach in current models is the utili- anisms, the proposed method aims to improve feature
sation of Convolutional Neural Networks (CNNs) and extraction and classification accuracy. The paper outlines
transfer learning approaches [8]. Several research have the methodology, experimental results, and discusses the
employed 3D-CNNs with innovative network structures implications of the findings for future research and clini-
for the categorization of multi-channel data, resulting cal applications. Overall, the study contributes to advanc-
in an accuracy rate of 89.9%. Prior research has con- ing the field of medical image analysis and underscores
centrated on dividing brain tumours in MRI imaging by the importance of integrating innovative techniques for
utilising fully convolutional neural networks [9]. Recent improved brain tumor classification. The research contri-
advancements have combined traditional architectural bution of this study is as follows,
A.G et al. BMC Medical Imaging (2024) 24:147 Page 3 of 17
• Utilization of Channel-wise Attention Mechanism: an automated tool for classifying brain tumors using
The proposed approach leverages the Channel-wise a Squeeze and Excitation ResNet model based on Con-
Attention mechanism to accurately classify different vNet. Preprocessing techniques like zero-centering and
types of MRI images of the brain, including glioma, intensity normalization are used, resulting in an accuracy
meningioma, no tumor, and pituitary classes. This rate of 93.83%. This approach shows promising advance-
mechanism allows the model to focus on relevant ments in sensitivity and specificity compared to current
features within the images, thereby improving methods. Wenna Chen et al. (2024) [17] brain tumor
classification accuracy. classification is crucial for physicians to develop tailored
• Effective Data Preprocessing: The study emphasizes treatment plans and save lives. An innovative approach
the importance of effective data preprocessing called deep feature fusion uses convolutional neural net-
techniques, which likely contributed to the works to enhance accuracy and reliability. Pre-trained
high accuracy of the proposed method. Proper models are standardized, fine-tuned, and combined to
preprocessing helps ensure that the input data is classify tumors. Experimental results show that combin-
clean, standardized, and well-suited for training deep ing ResNet101 and DenseNet121 features achieves clas-
learning models. sification accuracies of 99.18% and 97.24% on Figshare
• Integration into Clinical Decision-Making: Given the and Kaggle datasets, respectively. Muhannad Faleh Ala-
impressive performance of the proposed method, nazi et al. (2022) [18] presents a transfer learning model
the authors advocate for its integration into software for early identification of brain tumors using magnetic
platforms used by physicians. This integration has resonance imaging (MRI). The model uses convolutional
the potential to enhance clinical decision-making neural network (CNN) models to assess their efficacy
and ultimately improve patient care by providing with MRI images. A 22-layer binary classification CNN
more accurate and efficient diagnosis of brain model is then fine-tuned using transfer learning to cat-
tumors. egorize brain MRI images into tumor subclasses. The
• Future Research Directions: The study outlines model achieves an impressive accuracy of 95.75% when
future research directions, including the utilization tested on the same imaging machine. It also shows a high
of additional brain tumor datasets and exploration accuracy of 96.89% on an unseen brain MRI dataset, indi-
of different deep learning techniques to further cating its potential for real-time clinical use.
enhance brain tumor diagnosis. This highlights the Hanan Aljuaid et al. (2022) [19] breast cancer is a
researchers’ commitment to ongoing improvement global issue, with increasing frequency due to insuffi-
and innovation in the field. cient awareness and delayed diagnoses. Convolutional
• Identification of Computational Complexity: The neural networks can expedite cancer detection and
study also identifies the computational complexity classification, aiding less experienced medical practi-
associated with the proposed model, particularly tioners. The proposed methodology achieves top-tier
due to the addition of CWAM attention modules accuracy rates in binary and multi-class classification,
to the ResNet101 architecture. Understanding and with ResNet, InceptionV3Net, and ShuffleNet achieving
acknowledging these limitations are essential for 99.7%, 97.66%, and 96.94% respectively. Nazik Alturki et
guiding future research efforts and optimizing model al. (2023) [20] brain tumors are among the top ten dead-
development processes. liest illnesses, and early detection is crucial for success-
ful treatment. A study uses a voting classifier combining
The structure of this paper is as follows: Chap. 2 dis- logistic regression and stochastic gradient descent to dis-
cusses about the recent state-of-the-art methods and its tinguish between cases with tumors and those without.
outcomes. Chapter 3 provides details about the dataset Deep convolutional features from primary and secondary
utilized in this study and outlines the complete structure tumor attributes enhance precision. The voting classifier
of the proposed classification algorithm. Chapter 4 pres- achieves an accuracy of 99.9%, outperforming cutting-
ents the experimental results obtained through the meth- edge methods.
odology. Chapter 5 discusses the conclusions drawn from Ginni Arora et al. (2022) [21] this study focuses on
the study and outlines avenues for future research con- evaluating the effectiveness of deep learning networks
cerning the proposed model. in categorizing skin lesion images. The research uses a
dataset of approximately 10,154 images from ISIC 2018,
Related work and the results show that DenseNet201 achieves the
Palash Ghosal et al. (2019) [16] brain tumors pose a sig- highest accuracy of 0.825, improving skin lesion classi-
nificant threat to life and socio-economic consequences. fication across multiple diseases. The study contributes
Accurate diagnosis using MRI data is crucial for radi- to the development of an efficient automated classifica-
ologists and minimizing risks. This research introduces tion model for multiple skin lesions by presenting various
A.G et al. BMC Medical Imaging (2024) 24:147 Page 4 of 17
parameters and their accuracy. Jun Cheng et al. (2015) three models. Experimentation yields peak accuracies
[22] this study focuses on classifying three types of brain of 95.11%, 93.88%, 94.19%, 93.88%, 93.58%, 94.5%, and
tumors in T1-weighted contrast-enhanced MRI (CE- 96.94% for VGG16, InceptionV3, VGG19, ResNet50,
MRI) images using Spatial Pyramid Matching (SPM). InceptionResNetV2, Xception, and IVX16. Explainable
The method uses an augmented tumor region generated AI is used to assess model performance and reliability.
through image dilation as the ROI, which is then parti- Lokesh Kumar et al. (2021) [27] the increasing number of
tioned into fine ring-form subregions. The efficacy of brain tumor cases necessitates the development of auto-
the approach is evaluated using three feature extraction mated detection and diagnosis methods. Deep neural
methods: intensity histogram, gray level co-occurrence networks are being explored for multi-tumor brain image
matrix (GLCM), and bag-of-words (BoW) model. The classification. However, these networks face challenges
results show substantial improvements in accuracies like vanishing gradient problems and overfitting. A deep
compared to the tumor region, with ring-form partition- network model using ResNet-50 and global average pool-
ing further enhancing accuracies. These results highlight ing is proposed, which outperforms existing models in
the feasibility and effectiveness of the proposed method classification accuracy, with mean accuracies of 97.08%
for classifying brain tumors in T1-weighted CE-MRI and 97.48%, respectively. Nirmalapriya et al. (2023) [28]
scans. Deepak et al. (2021) [23] automated tumor charac- brain tumors pose a significant health risk, and manual
terization is crucial for computer-aided diagnosis (CAD) classification is complicated by MRI data. An innovative
systems, especially in identifying brain tumors using MRI optimization-driven model is proposed for classifying
scans. However, the limited availability of large-scale brain tumors using a hybrid segmentation approach. This
medical image databases limits the training data for deep model merges U-Net and Channel-wise Feature Pyra-
neural networks. A proposed solution is combining con- mid Network for Medicine (CFPNet-M) models, using
volutional neural network (CNN) features with support Tanimoto similarity. The model accurately segments and
vector machine (SVM) for medical image classification. classifies both benign and malignant tumor samples. The
The fully automated system, evaluated using the Figshare SqueezeNet model is trained into four grades, and the
open dataset, achieved an overall classification accuracy model weights are optimized using Fractional Aquila Spi-
of 95.82%, surpassing state-of-the-art methods. Experi- der Monkey Optimization (FASMO). The model achieves
ments on additional brain MRI datasets validated the 92.2% testing accuracy, 94.3% sensitivity, 90.8% specific-
enhanced performance, with the SVM classifier showing ity, and 0.089 prediction error.
superior performance in scenarios with limited training The proposed ResNet101 coupled with CWAM (Chan-
data. Fatih Demir et al. (2022) [24] brain tumors pose a nel-wise Attention Mechanism) aims to address the
global threat, and Magnetic Resonance Imaging (MRI) demerits and research gaps identified in previous stud-
is a widely used diagnostic tool. This study presents an ies regarding brain tumor classification using MRI data.
innovative deep learning approach for automated brain These include challenges such as limited classification
tumor detection using MRI images. Deep features are accuracy, overfitting, and the need for more effective
extracted through convolutional layers, and a new multi- feature extraction methods. ResNet101, known for its
level feature selection algorithm called L1NSR is applied. strong performance in image classification tasks, serves
Superior classification performance is achieved using as the backbone network to extract high-level features
the Support Vector Machine (SVM) algorithm with a from MRI images with greater accuracy, thus improv-
Gaussian kernel. The methodology achieves 98.8% and ing classification performance. Additionally, the CWAM
96.6% classification accuracies, respectively. Navid Ghas- technique helps mitigate overfitting by selectively attend-
semi et al. (2020) [25] this paper presents a deep learn- ing to informative channels in the feature maps, reducing
ing method for classifying tumors in MR images. The noise and enhancing the model’s ability to generalize to
method starts with pre-training a deep neural network new data. By focusing on relevant channels in the feature
using diverse datasets. The network then fine-tunes to maps, CWAM enhances the feature extraction process,
distinguish between three tumor classes using six layers enabling the model to capture more meaningful informa-
and 1.7 million weight parameters. Techniques like data tion from MRI images and leading to improved classifi-
augmentation and dropout are used to mitigate overfit- cation accuracy. Table 1 illustrates the addressed various
ting. The method outperforms state-of-the-art tech- limitation of the state-of-the-art methods.
niques in 5-fold cross-validation. Shahriar Hossain et al.
(2023) [26] this study focuses on multiclass classification Materials and methods
of brain tumors using deep learning architectures like Deep learning models play a vital role in classifying brain
VGG16, InceptionV3, VGG19, ResNet50, Inception- scans, detecting intricate patterns for accurate diagnosis.
ResNetV2, and Xception. It proposes a transfer learning- Integrating the ResNet101-CWAM fusion technique fur-
based model, IVX16, which combines insights from top ther enhances diagnostic precision by capturing nuanced
A.G et al. BMC Medical Imaging (2024) 24:147 Page 5 of 17
the ResNet101-CWAM fusion technique is integrated, The procedure involves breaking down a low-contrast
focusing on capturing the nuances of brain images and image into sub-histograms based on its median value,
their contextual relationships. This fusion methodol- using a histogram-based methodology. This involves
ogy enriches the model’s understanding of various brain meticulous examination of every pixel within the image
conditions, enhancing its ability to accurately detect and delineating clusters based on prominent peaks. This
and classify them. The process involves meticulous data process persists until no additional clusters appear, indi-
gathering, preprocessing, model selection, and rigorous cating completion. Histogram-based equalization has an
training and testing. Data is assembled to ensure repre- inherent advantage as it requires only a singular pass for
sentative samples, and preprocessing refines and stan- each individual pixel. Dynamic Histogram Equalization
dardizes the collected data for training. Model selection (DHE) starts by smoothing each histogram, then identi-
involves careful consideration of various architectures fies local maxima points by comparing histogram values
and techniques, and the model undergoes rigorous test- with neighboring pixels. The algorithm calculates the
ing to ensure optimal functionality and reliability in real- histogram’s length, ensuring a balanced enhancement
world scenarios. Good contrast is essential for clear and distance. The novelty of the approach lies in the integra-
impactful visual content, making it easier to understand tion of the Channel-wise Attention Mechanism (CWAM)
messages. Techniques like FDHE help improve contrast with the ResNet101 architecture for the classification of
by adjusting overly bright or dark images, making details MRI brain images, which represents a significant innova-
stand out more. The study focused on fixing brightness tion in the field of medical image analysis. This combi-
issues and making visual details clearer, making the nation enhances the model’s ability to focus on pertinent
viewing experience better. The transformation of data- features within the images, thereby improving classifica-
set classes before and after FDHE was demonstrated in tion accuracy for various brain tumor types, including
Fig. 2, demonstrating the efficacy of the technique in revi- glioma, meningioma, no tumor, and pituitary classes.
talizing visual content. To ensure optimal performance, Furthermore, the study’s meticulous data preprocess-
preprocessing steps were taken, including resizing, nor- ing techniques ensure high-quality input for training the
malization, and histogram equalization. The model was deep learning model, contributing to its impressive per-
trained using a curated training set and underwent itera- formance. By proposing this advanced method and advo-
tive refinement. After training, the model was tested cating for its integration into clinical decision-making
using dedicated testing sets to evaluate its efficacy in software, the research not only demonstrates immediate
accurately interpreting and analyzing the visual data. This practical applicability but also sets the stage for future
systematic approach showcases the transformative power advancements through the identification of computa-
of contrast enhancement techniques and underscores tional complexities and suggestions for further research.
their pivotal role in unlocking the true potential of visual
content, enabling it to be scrutinized and interpreted
with precision and clarity.
A.G et al. BMC Medical Imaging (2024) 24:147 Page 7 of 17
Smoothing
Noise infiltrates high-frequency elements of an image, effectively eliminates redundant, minimal, and maximal
introducing jagged artifacts that can disrupt the view- noisy peaks, thereby enhancing the image’s quality. Fol-
ing experience and obscure important details. To coun- lowing this smoothing process, the maximum points on
teract these effects, a smoothing technique is employed the Receiver Operating Characteristic (ROC) curve are
by adjusting the intensity levels of individual pixels, pre- identified, facilitating the separation of the darkest and
serving crucial details while reducing the prominence of brightest points within the region.
noise-induced artifacts. The Gaussian function is central
to this process, which dynamically alters the intensity of Finding local maxima
pixels to achieve a more uniform and visually appealing Local maxima in a histogram are points where the inten-
result [31]. Each pixel undergoes a transformation tar- sity value peaks above its neighboring values, indicating
geting the removal of blur, a common consequence of significant features in the image. They serve as reference
noise interference. This transformation adheres to the points for identifying the darkest and brightest areas [32].
principles of the normal distribution, ensuring adjust- To locate these local maxima and minima, the histogram
ments are statistically coherent and consistent with nat- of the smoothed image is analyzed, tracing the highest
ural visual perception. Applying this transformation to and lowest intensity values. Intensity 0 represents the
every pixel enhances the overall clarity and fidelity of the lowest, and 255 the highest. Partitioning the image based
image, resulting in a more visually pleasing and informa- on these extreme values divides it into segments. This
tive representation. segmentation relies on histograms to define boundaries
between regions, using a histogram-based method for
1 −a2+b 2
accuracy. In this context, the median is determined from
X (a, b) = 2
e 2σ2 (1)
2πσ the image histogram. The median is computed by,
Here, ‘a’ represents the distance between the origins of N
E
2 m−1
the horizontal axes, ‘b’ denotes the distance between the Kmeidan = Im + B (2)
em
origins of the vertical axes, and ‘σ’ signifies the standard
deviation. Consequently, the smoothed image gains flex-
ibility for Contrast Enhancement (CE). This function
A.G et al. BMC Medical Imaging (2024) 24:147 Page 8 of 17
where, Im is the lowest value of median, N is the num- attention evaluates individual channel importance by
ber of observations, Em−1 is a Cumulative frequency, em adjusting weights, enhancing the model’s focus on signifi-
is the frequency of each image and B is a median value. cant features. Spatial attention directs focus to specific
The image is divided into segments using this median spatial locations, enabling detailed analysis. Despite their
value. The intervals between successive local maxima are distinct roles, these mechanisms synergize, maximizing
termed as intervals. Partitioning is necessary to group the model’s ability to extract relevant information from
related pixel values together, facilitating ease of analysis. data. CWAM’s collaborative approach ensures nuanced
pattern recognition, leading to accurate insights. Figure 3
Proposed resNet101-CWAM approach depicts the detailed architecture of the brain tumor clas-
In this study, we utilized ResNet101 as our primary sification model.
model architecture, leveraging pre-trained weights from The feature extraction process uses ResNet101 archi-
the ImageNet dataset. This allowed for the extraction of tecture’s layers to generate a feature map with dimensions
intricate features from our meticulously pre-processed C representing the number of channels and H and W
images, establishing a strong foundation for subse- representing the spatial dimensions. This map provides
quent analysis. To maintain model stability, we froze a comprehensive understanding of the spatial structure
the weights of convolutional and max-pooling layers and content encoded within the extracted features, high-
during training, ensuring the preservation of valuable lighting the richness of information captured within each
knowledge [33]. ResNet was chosen for its exceptional channel. The Channel-wise Attention Module (CWAM)
performance across various computer vision tasks and integrates spatial and channel-wise attention mecha-
its ability to address the vanishing gradient problem. By nisms to enhance feature refinement. The input feature
harnessing ResNet’s strengths and pre-trained weights map undergoes transformations such as max pooling and
from ImageNet, we aimed to equip our model with the average pooling layers to condense spatial dimensional-
capabilities necessary for effective task handling, ulti- ity. The global average pooling layer computes the mean
mately striving for optimal performance and insight- value for each channel across spatial dimensions, while
ful outcomes. Features from ResNet101 were extracted the global max pooling layer identifies maximum values
and input into CWAM, a framework integrating spatial per channel. The Channel Attention Map (CAM) is com-
and channel-wise attention mechanisms [34]. Channel puted through dense layers to reveal the significance of
Table 3 Building blocks of proposed reseNet101 architecture representations, each providing insights into the spatial
Layer Output 101-layers intricacies ingrained within the data. Within this frame-
conv1 112 × 112 7 × 7, 64, stride 2 work, the attention map serves as a conduit between spa-
conv2_x 56 × 56 3 × 3 max pool, stride 2 tial and channel-wise dimensions. Integrated seamlessly
with the channel-refined feature map R, this amalgama-
1 × 1, 64
3 × 3, 64 × 3 tion provides a nuanced understanding of both spatial
context and channel-specific significance, enriching the
1 × 1, 256
model’s comprehension of the data landscape. As the
conv3_x 28 × 28 1 × 1, 128 journey progresses, the CWAM module emerges as a
3 × 3, 256 × 4 cohesive force, merging spatial and channel-wise atten-
1 × 1, 512 tion to refine features comprehensively. This amalgam-
ated output encapsulates the core of feature refinement,
conv4_x 14 × 14 1 × 1, 256 ready to reveal hidden truths within the data. Through
3 × 3, 256 × 23 global average pooling, the model engages in a collective
1 × 1, 1024 contemplation of the statistical attributes of the feature
space, delving deeper into the essence of the data. Finally,
conv5_x 7×7 1 × 1, 512
3 × 3, 512 × 3 as the fully connected layer activates with SoftMax, the
1 × 1, 2048 model’s insights are refined and ready for action, enabling
it to navigate the intricate data terrain with confidence,
1×1 3 × 3 average pool, 1000-d fully connected extracting valuable insights and informing strategic
decisions.
Table 4 Hyperparameters in the ResNet101-CWAM model
Performance metric parameters
Parameters Model-I Model-II
The evaluation of the performance of the suggested
Rate of learning 0.001 0.001
model has been completely comprehensive, taking into
Size of batches 32 16
account a wide range of important characteristics to
Optimizing method Adam SGD
determine how successful it is. A few examples of these
No. of epochs 25 25
parameters are as follows: accuracy (Acc), which is a
measurement of the proportion of instances that have
each channel, facilitating channel-specific refinement. been correctly classified out of the total number of
The CAM is then multiplied element-wise with the instances; precision (Pr), which evaluates the accuracy of
original feature map F, resulting in a refined feature map positive predictions; F1-score, which is a harmonic mean
denoted as R. Each element in R is weighted according of precision and recall that provides a balanced assess-
to its channel’s importance, enhancing the discriminative ment of the model’s performance; and recall, which eval-
power of the features for subsequent stages of analysis. uates the proportion of true positive instances that were
Table 3 demonstrates the building block for proposed correctly identified by the model. By taking into account
ResNet101 model. these many characteristics, we are able to get a full pic-
The model employs a meticulously crafted feature ture of the capabilities and limits of the model in relation
map to delve into the essence of crucial features resid- to various elements of classification accuracy and predic-
ing within each channel. At the heart of this pursuit tion performance.
lies the spatial attention module, which orchestrates
the compression of the channel-refined feature map T.positive + T.negative
Acc =
T.positive + T.negative + F.positive + F.negative
(3)
through operations such as maximum and average
pooling. This transformation results in two distinct 2D
impressive F1-score of 99.27%, recall of 99.21%, accuracy might have something to do with Adam’s dynamic opti-
of 99.83%, precision of 99.06%, and AUC of 99.33% on mising process. SGD, on the other hand, often exhib-
the training dataset. During cross-validation, the model its a convergence trajectory that is more gradual and is
sustained high performance, with an average F1-score of marked by modest advancement and a kinder descent
98.82%, recall of 98.83%, accuracy of 99.41%, precision towards optimal solutions. Despite these modifications,
of 99.02%, and AUC of 99.12%, exhibiting minimal stan- the models’ resilience and robustness may be deduced
dard deviation across these metrics. Conversely, Model from the significant stability and consistency shown in
II demonstrated slightly lower performance metrics on performance metrics for both optimizers. Regardless of
the training dataset, with an F1-score of 97.08%, recall the optimisation method used, the models’ capacity to
of 97.11%, accuracy of 98.77%, precision of 98.05%, and provide consistent performance is shown by the small-
AUC of 98.13%. Throughout cross-validation, Model est standard deviation displayed in these metrics. Conse-
II maintained consistency with an average F1-score of quently, confidence in the models’ reliability and efficacy
97.88%, recall of 97.12%, accuracy of 98.98%, precision for real-world applications is reinforced. Figure 5 illus-
of 98.06%, and AUC of 97.95%, indicating a marginally trates the training and testing accuracy and loss curves
higher standard deviation across these metrics compared for two models.
to Model I. Figure 4 depicts the performance metric The receiver operating characteristic (ROC) curve
comparison of two models. plots are shown in detail in Fig. 6, which also offers
The patterns that can be seen in the models’ accuracy insights into how well the models perform over a range
and loss graphs correspond to the well-established char- of categorization criteria. A thorough evaluation of
acteristics of the Adam (I) and SGD (II) optimisation the models’ discriminatory capacity is made possible
techniques. Not only does Adam employ adaptive learn- by the way each curve illustrates the trade-off between
ing rates to effectively navigate complex loss landscapes, the true positive rate (sensitivity) and the false positive
but it is also highly respected for its ability to fast reach rate (1 - specificity). Additionally, the models’ classifi-
early convergence. However, since Adam’s optimisation cation performance is quantified by the accompanying
process is dynamic, fluctuations may sometimes disrupt area under the curve (AUC) score for each class, which
this rapid convergence in the early training stages. This provides a detailed knowledge of the models’ capacity
Fig. 5 The train and test accuracy of (a) model-I, (b) model-II
to discriminate between various classes. This thorough crucial role that these layers have in identifying under-
visualisation makes it easier to make decisions about how lying patterns in the incoming data, thereby creating a
well the models work for certain categorization tasks, foundation for further hierarchical processing within the
which improves the assessment findings’ interpretability neural network’s architecture. Looking at Fig. 7 (b) and
and usefulness. We have conducted a meticulous process (c), we can see that the feature maps get more abstract
of visualising the feature maps, shown in Fig. 7 (a) – (c), as the model goes deeper. This indicates their ability to
to assess the models’ ability to comprehend the primary capture more intricate features within brain MRIs. Fig-
visual attributes of the images and the contextual rela- ure 7 (b) is important because it shows how the CWAM
tionships among them. The model consists of three lev- module highlights specific parts in the feature maps. This
els: the beginning, intermediate, and final layers. These shows us where important stuff is in terms of space and
layers are visually represented by feature maps employed channels. We hope this helps make the important areas
in the model. and channels clearer, which should make predictions
After doing a thorough analysis of the feature maps more accurate. Simultaneously, less significant aspects
obtained from the first three layers, it becomes evident of the data may not stand out as prominently. This pri-
that they possess the capability to accurately capture oritization enables us to concentrate on the critical
fundamental characteristics such as edges, textures, and details essential for sorting and analyzing the data effec-
basic shapes. Furthermore, this capability emphasises the tively. Our method was meticulously compared with
A.G et al. BMC Medical Imaging (2024) 24:147 Page 13 of 17
Fig. 7 Feature maps of (a) first layer, (b) middle layer, (c) final layer
top-performing techniques in the field, all utilizing the comparison are shown in Table 6, which helps us under-
same dataset. This comparative analysis was conducted stand how well different methods work. It’s important to
due to the exceptional performance of our approach. Our mention that we used the same training and testing meth-
ResNet101-CWAM model did better than the others, as ods from previous studies to test our ResNet101-CWAM
we found out from this comparison. The details of this model, as explained in Table 6. This ensures fairness and
A.G et al. BMC Medical Imaging (2024) 24:147 Page 14 of 17
Table 6 Performance metric comparison of proposed and other Table 7 Proposed model for brain tumor classification ablation
state-of-the-art methods study
Authors Models Performance metric Models Accuracy Precision F1-score Re-
parameters (%) (%) (%) call
Ac- Preci- Re- F1- (%)
curacy sion call score ResNet101 98.91 98.12 97.98 98.02
(%) (%) (%) (%) ResNet101 + CA 99.29 98.88 98.64 98.54
Remzan et al. EfficientNetB1, 95.98 95.98 96.03 95.98 ResNet101 + SA 98.96 97.91 97.85 97.71
ResNet50 Proposed model 99.83 99.06 99.27 99.21
Tahiry K et al. Hybrid CNN 95.65 95.65 95.67 95.65
Zhang Z et al. VGG16 94.55 96.5 96.01 96.02
Dewan JH et al. VGG19 97.02 96.10 97.01 97.11 precision score of 96.5%. Dewan JH et al. presented
Sheng M et al. CNN 98.40 97.17 96.75 96.75 results from their VGG19 model, boasting an accuracy
Proposed ResNet101 + CWAM 99.83 99.06 99.21 99.27 of 97.02% and notably high F1-score of 97.11%. Sheng M
model et al. introduced a CNN model with impressive accuracy
at 98.40% and precision at 97.17%. Lastly, the proposed
makes it easier to compare the different methods, which model, ResNet101 + CWAM, exhibited exceptional per-
makes our results more believable and trustworthy. formance, achieving the highest accuracy of 99.83% and
In Table 6, evaluating various models’ performance, F1-score of 99.27%, indicating its robustness in classifi-
several authors contributed their findings. Remzan et al. cation tasks. Figure 8 represents the performance met-
employed EfficientNetB1 and ResNet50 models, achiev- ric parameter outcome of proposed and state-of-the-art
ing an accuracy of 95.98%, with corresponding preci- methods.
sion, recall, and F1-score metrics hovering around the
same high level. Tahiry K et al. introduced a Hybrid Ablation study
CNN model, which demonstrated commendable perfor- Furthermore, researchers carried out a study on the
mance across all metrics, particularly with an accuracy of model, using specific settings for how it works and divid-
95.65% and consistent precision, recall, and F1-score val- ing the data into parts, with 70% used for training the
ues. Zhang Z et al. explored the VGG16 model, achiev- model and 30% for testing it. They put together all the
ing a slightly lower accuracy of 94.55% but with a higher findings in Table 7. In the preprocessing stage, crucial
Fig. 8 Performance metric outcome comparison of proposed and other existing models
A.G et al. BMC Medical Imaging (2024) 24:147 Page 15 of 17
steps optimized the model’s performance. Initially, resiz- might be worth exploring methods such as teaching the
ing images to 256 × 256 pixels ensured uniformity and model fewer things at once or using a different approach
compatibility, easing input. Min-max normalization to training. The investigation on ablation provided valu-
prevented overfitting by scaling pixel values. Dynamic able insights into the model’s functionality. It highlighted
histogram equalization (DHE) further enhanced medi- the effectiveness of the model’s attention processes in
cal image quality, preserving diagnostic details. These highlighting important features while minimizing irrel-
techniques collectively bolstered the model’s perfor- evant noise, which greatly contributes to its high perfor-
mance, enabling better generalization and more reliable mance. What’s particularly intriguing is the comparison
diagnostic outcomes. When they took out each piece of between two types of attention mechanisms - Channel
the model, it made the predictions for brain tumors less attention (CA) and Spatial attention (SA). The results
accurate. However, when they used all the parts together, showed that ResNet101 with Channel attention outper-
their recommended method worked better than any formed ResNet101 with Spatial attention. This suggests
other. This highlights how essential it is to include all that, when dealing with brain tumor classification, focus-
the parts when trying to predict brain tumors accurately. ing on specific features within the data may be more
Based on our research, Model-I performed better than beneficial than considering spatial arrangements. This
Model-II both during data analysis and cross-valida- underscores the importance of carefully selecting and
tion. This suggests that Model-I was able to learn more fine-tuning attention mechanisms based on the unique
effectively. One possible reason for this is that we used a characteristics of the problem at hand. It’s important
technique called the Adam optimizer with Model-I. The to mention that although ResNet101 didn’t achieve the
Adam optimizer adjusts the learning speed for different highest performance in our experiments, it still outper-
parts of the model, which is useful for complex tasks. In formed some of the methods discussed in Table 6. This
contrast, Model-II used a different technique called SGD, study focused on using the ResNet101-CWAM model to
which makes everything learn at the same speed. When classify brain tumors in MR images, particularly aiming
dealing with brain tumors, there are many factors to con- at multiclass classification. The results of our experiments
sider, and some might require more careful attention. show that our approach performs better than the current
The Adam optimizer helps by adapting the learning best ConvNet models in terms of accuracy. Additionally,
speed for different aspects of the brain tumor problem MRI images have unique features and are captured using
while training the model. To improve performance, it various techniques, which can make it challenging for
A.G et al. BMC Medical Imaging (2024) 24:147 Page 16 of 17