0% found this document useful (0 votes)
22 views13 pages

1 s2.0 S016926072400097X Main

This study presents a multi-task fusion model utilizing a residual-Multi-layer perceptron network for enhanced breast cancer screening through mammography. The model integrates various specific tasks, such as density classification, mass segmentation, and lesion classification, to improve diagnostic accuracy and provide comprehensive assessments. Experimental results demonstrate superior performance compared to traditional single-task models, achieving area-under-the-curve scores of 0.92 and 0.95 on publicly available datasets.

Uploaded by

harshini anagha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

1 s2.0 S016926072400097X Main

This study presents a multi-task fusion model utilizing a residual-Multi-layer perceptron network for enhanced breast cancer screening through mammography. The model integrates various specific tasks, such as density classification, mass segmentation, and lesion classification, to improve diagnostic accuracy and provide comprehensive assessments. Experimental results demonstrate superior performance compared to traditional single-task models, achieving area-under-the-curve scores of 0.92 and 0.95 on publicly available datasets.

Uploaded by

harshini anagha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computer Methods and Programs in Biomedicine 247 (2024) 108101

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine


journal homepage: www.elsevier.com/locate/cmpb

A multi-task fusion model based on a residual–Multi-layer perceptron


network for mammographic breast cancer screening
Yutong Zhong a, Yan Piao a, *, Baolin Tan b, Jingxin Liu c
a
School of Electronic Information Engineering, Changchun University of Science and Technology, Changchun 130022, PR China
b
Technology Co. LTD, Shenzhen 518000, PR China
c
Department of Radiology, China-Japan Union Hospital, Jilin University, Changchun 130033, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: Background and objective: Deep learning approaches are being increasingly applied for medical computer-aided
Deep learning diagnosis (CAD). However, these methods generally target only specific image-processing tasks, such as lesion
Mammography segmentation or benign state prediction. For the breast cancer screening task, single feature extraction models
Density classification
are generally used, which directly extract only those potential features from the input mammogram that are
Mass segmentation
Lesion classification
relevant to the target task. This can lead to the neglect of other important morphological features of the lesion as
Multi-tasks fusion well as other auxiliary information from the internal breast tissue. To obtain more comprehensive and objective
diagnostic results, in this study, we developed a multi-task fusion model that combines multiple specific tasks for
CAD of mammograms.
Methods: We first trained a set of separate, task-specific models, including a density classification model, a mass
segmentation model, and a lesion benignity–malignancy classification model, and then developed a multi-task
fusion model that incorporates all of the mammographic features from these different tasks to yield compre­
hensive and refined prediction results for breast cancer diagnosis.
Results: The experimental results showed that our proposed multi-task fusion model outperformed other related
state-of-the-art models in both breast cancer screening tasks in the publicly available datasets CBIS-DDSM and
INbreast, achieving a competitive screening performance with area-under-the-curve scores of 0.92 and 0.95,
respectively.
Conclusions: Our model not only allows an overall assessment of lesion types in mammography but also provides
intermediate results related to radiological features and potential cancer risk factors, indicating its potential to
offer comprehensive workflow support to radiologists.

1. Introduction tissue density in mammograms [5], their visual assessment can lead to
variation in radiologists’ interpretations of mammograms, which have
BREAST cancer is the leading cause of cancer-related death in disadvantages such as hypothetical recall and sham biopsies [6].
women [1]. Over the past 30 years, many countries around the world Computer-aided detection/diagnosis (CAD) systems can be of great
have established mammography for breast cancer screening, substan­ help to radiologists in detecting and classifying breast lesions [7], but
tially improving breast cancer prognosis and reducing breast the development of reliable CAD systems is a challenging task. Tradi­
cancer-related mortality in women [2,3]. tional approaches to CAD of mammograms usually focus on describing
A mammogram is a two-dimensional projection of the attenuation the original image using manually extracted features (e.g., texture,
properties of the three-dimensional breast tissue along the radiography colour, and shape), and manual annotation means that image processing
path [4]. Diagnosis from mammograms is performed by radiologists is costly due to it being time consuming and requiring additional ex­
using visual assessment; however, due to the great variability in breast periments to verify the applicability of those features [8].

This work was supported by the National Natural Science Foundation of China (NSFC) (60977011), and the the Jilin Province Science and Technology Plan Project
(20220201062GX).
* Corresponding author.
E-mail address: [email protected] (Y. Piao).

https://doi.org/10.1016/j.cmpb.2024.108101
Received 20 April 2023; Received in revised form 13 January 2024; Accepted 23 February 2024
Available online 24 February 2024
0169-2607/© 2024 Elsevier B.V. All rights reserved.
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

In the past 5 years, the artificial intelligence (AI) revolution driven extremely dense breasts. As shown in the mammograms in Fig. 1, the
by deep learning and convolutional neural networks (CNNs) has pene­ fatty breast clearly shows the presence of a mass and the morphology of
trated the field of automated breast cancer detection by digital the mass edges; however, the lesion morphology is less clear in the dense
mammography and has demonstrated the potential of these new tech­ breast, which affects the radiologist’s ability to diagnose the condition
niques [9]. Examples include the use of deep learning algorithms to and in turn increases the risk of late diagnosis of breast cancer. Although
perform breast mass segmentation tasks [10,11], breast density classi­ BI-RADS stratification cannot be used as a definitive diagnostic crite­
fication tasks [12,13], and cancer risk prediction tasks [14,15]. Due to rion, it is necessary to refer to the BI-RADS score when observing
the increasing volume of imaging data, these automated methods can common features.
accelerate the reading workflow of radiologists and can even be used as In general, in mammogram analysis, breast cancer detection cannot
decision support tools for medical image interpretation and diagnosis solely rely on the results of one particular task, and to reduce false-
[16]. However, the information utilisation of these single-task models positive diagnoses, radiologists usually have to combine the results of
from mammograms is limited, and despite using a large number of multiple tasks to determine whether the patient needs a pathological
training samples, these models do not take into account other available, biopsy. Therefore, following the clinical reasoning process of radiolo­
clinically important information. gists, we developed a deep learning model for multi-task fusion from a
Recent studies in cancer screening and diagnosis have shown that novel perspective, with the aim of building a more powerful predictor
obtaining more detailed intermediate results related to radiological than single-task models. The model effectively fuses the features of
features from AI models, than those provided by single-task models (e.g., mammograms from different tasks and provides a comprehensive pre­
benignity/malignancy assessment task models and mass segmentation diction of benign or malignant breast masses with improved interpret­
task models), can assist clinicians in performing more comprehensive ability. The main contributions of this study are as follows.
and accurate assessments [17,18]. Specifically, during visual assessment To solve the above-mentioned challenges, our main contributions
of mammograms, radiologists need to first localise the presence of a can be summarized as:
suspicious mass in the breast and suggest the probability of malignancy
of the suspicious mass based on the observed imaging indicators (e.g., (1) We developed a residual–multi-layer perceptron (MLP)-based
shape and marginal features of the mass). In general, the breast cancer backbone network for the diagnoses of benign and malignant
classification results depend heavily on the segmentation results, and masses on mammograms.
the more irregular the shape of the mass, the higher the probability of (2) We developed two auxiliary networks for performing specific
malignancy [19]. Radiologists assign the breast tissues in each mammography-assisted analysis tasks: (i) a density classification
mammogram to one of the following four standardised density cate­ (DC) model, which is a CNN-based model for breast density
gories based on the American College of Radiology Breast Imaging classification tasks, and (ii) a mass segmentation (MS) model,
Reporting and Data System (BI-RADS) [20]: BIRADS-1 for entirely fatty which is a lightweight self-attention-based model for breast mass
breasts, BIRADS-2 for mostly fatty breasts with some fibroglandular segmentation tasks.
tissue, BIRADS-3 for heterogeneously dense breasts, and BIRADS-4 for

Fig. 1. Images of benign and malignant masses at different densities. Groups (a) and (b) are fatty breast, where group (a) is a benign mass; group (b) is a malignant
mass; group (c) and group (d) are dense breast, where group (c) is a benign mass and group (d) is a malignant mass. In each group of images, the mammographic
image is shown on the left side and the ground-truth of the mass mask is shown on the right side.

2
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

(3) We investigated the efficacy of a multi-task fusion model, in CNN, aiming to improve the diagnosis of breast cancer in mammog­
which segmentation information and density information were raphy, by using cross loss and spatial perception loss terms for model
introduced into the mass classification network, in improving the training and evaluated the model on the DDSM-400 and CBIS-DDSM
accuracy of benign and malignant mass diagnoses and assisting datasets. Xing et al. [29] combined the guidance information provided
radiologists in making accurate diagnostic decisions based on by the BI-RADS with a powerful deep learning (DL) framework for breast
comprehensive results. mass classification in ultrasound images, which includes a BI-RADS
attention branch that merges BI-RADS layering into network training,
The multi-task fusion model was evaluated using two publicly allowing the DL framework to learn BI-RADS-related features and
available datasets, namely CBIS-DDSM and INBreast. Our proposed network correlation between semantic features. Wimmer et al. [30]
method achieves impressive results on the CBIS-DDSM and INBreast proposed a multi-branch deep learning model consisting of three main
datasets, with accuracy rates of 93 % and 91 %, respectively. Addi­ task-specific models – a breast density classification model, a lesion
tionally, the area under the curve (AUC) values are 92 % and 95 %. The localization model, and an outcome classifier – as a basis for fusion,
experimental results revealed the superior performance of the developed which effectively fuses features from different tasks and mammograms
model in specific tasks, compared with the single-task models, and its to obtain comprehensive predictions. The overview of these methods is
great potential to improve the overall breast cancer diagnostic perfor­ presented in Table 1, as illustrated below.
mance in clinical practice.
3. Methods
2. Related work
We developed a multi-task fusion model for mass classification in
2.1. Single-task models for benign–malignant classification of breast mammograms, as shown in Fig. 2. Unlike other studies, we not only
cancer developed single-task models for the benign–malignant classification of
masses but also used the real interpretation practice of radiologists as a
With the great success of deep neural networks in computer vision, guide to examine the boundary features of the mass, shape features of
several deep learning approaches have been explored to facilitate the the lesion, and density information within the whole receptive field of
automatic classification of lesions in mammography [21–23]. Arora the breast. The intermediate results of breast density assessment and
et al. [24] used a deep learning framework that integrates AlexNet, VGG, lesion segmentation outputted by the task-specific models were then
ResNet, GoogLeNet, and InceptionNet for benign–malignant classifica­ fused to finally obtain the mass classification results (benign/malignant)
tion of the CBIS-DDSM dataset. Shu et al. [25] constructed a neural for the mammograms.
network classifier based on a region pooling structure for the problem of
large image resolution and small lesions in medical images. The classi­
fier first divides the image into multiple regions and then selects a few 3.1. Mass classification (MC) model
masses with high probability of malignancy as a representation of the
whole mammogram; this model was evaluated on two datasets, In the MC-model, the feature extraction component is the Tumor
CBIS-DDSM and INBreast. Shen et al. [26] proposed the GMIC network Classification Learner (MCL). The core component within MCL is the
in which a global module is first applied over the entire image using RSMLP, which consists of a cascade of Res-Conv blocks for extracting
ResNet-22 as the baseline to generate a heat map to provide a coarse local features and Res-Mixing blocks for establishing global de­
localisation of possible benign/malignant lesions. Then, the region of pendencies. The specific structure is illustrated in Fig. 3.
interest (ROI) is identified using the greedy algorithm, and fine-grained The input to the network is the entire mammographic image, which
visual details are extracted from the ROI using the local module. Finally, is initially partitioned into multiple small patches using the patch
information from the global context and local details are combined to partition module. These patches are then fed into the Learning Embed­
predict the presence of benign and malignant lesions in the breast. ding layer, which maps the local features based on the patches to a
lower-dimensional space, thus obtaining more expressive and discrimi­
native features. Next, the RSMLP block is introduced, consisting of two
2.2. Multi-task fusion model for benign–malignant classification of breast parts: the Res-Conv module for extracting local features and the Res-
cancer Mixing module for establishing global dependencies. After passing
through the Res-Mixing module, the feature map undergoes recombi­
Masses, calcifications, and microcalcification clusters in the breast nation using the patch partition module [31]. This process involves
are important risk factors for and indicators of breast cancer. Therefore, merging all adjacent patches into a smaller feature map and adjusting
some researchers have focused on applying a multi-task fusion model to the channel dimension of the merged feature map using a 1 × 1
achieve a more comprehensive analysis of mammography images than convolution.
that afforded by single-task models. Li et al. [27] proposed a dual-path The specific structure of RSMLP is illustrated in Fig. 4. The Res-Conv
conditional residual network (DUALCORENET) based on a dual-path module consists of a bottleneck structure residual block [32], while the
CNN model for mammogram analysis, where one path, called the local Res-Mixing module is composed of Residual Token-mixing and Residual
preservation learner, is dedicated to hierarchically extracting and Channel-mixing blocks. These blocks combine spatial and channel in­
exploiting the intrinsic features of the input. The other path, called the formation to further extract higher-level features and semantic infor­
conditional graph learner, focuses on generating geometric features by mation. The Residual Token-mixing block employs a sparse MLP [33]
modelling pixel-level images to mask correlations. By integrating these structure with multiple paths. Subsequently, the features are recom­
two learners so that the component learning paths complement each bined into larger blocks using the Patch Merging module to capture
other, both cancer semantics and cancer representation are well learned, more global contextual information. After the final set of RSMLP blocks,
allowing simultaneous implementation of segmentation and classifica­ global average pooling is applied to the feature map, and the resulting
tion tasks for mammography. Tsochatzidis et al. [28] proposed a method pooled features are then mapped to the final output through a linear
to integrate mammogram-based mass segmentation information into a layer.

3
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Table 1
Overview on related works. (CL= classification, SEG= segmentation, DET=detection).
Author Type Task(s) Image Dataset Method

Carneiro et al. Single-task CL MG InBreast, Multi-view+ pre-trained AlexNet.


[21] model DDSM-
Arevalo et al. CL MG BCDR CNN+SVM.
[22]
Dhungel et al. DET MG DDSM-BCRP, Multi-scale deep belief network combined with a Gaussian mixture model (GMM) classifier for
[23] INbreast candidate ROI. LeNet+AlexNet models for true positive regions. Random forest classifier for
classification.
Arora et al. [24] CL MG CBIS-DDSM AlexNet+VGG+ResNe+GoogLeNet +InceptionNet Concatnate.
Shu et al. [25] CL MG CBIS-DDSM, Mammographic is divided into multiple regions, and each region is classified. Based on the
INBreast classification results, the most suspicious lesion is selected as the final feature.
Shen et al. [26] CL MG NYUBCS, CBIS- ResNe+Heap Map for lesion localization, greedy algorithm for identifying ROI. Local and global
DDSM. features fusion for Classification.

Li et al. [27] Multi-task SEG+CL MG DDSM The extracted multi-scale ROIs are used as inputs to a dual-path network, which outputs
fusion model INbreast segmentation masks and diagnostic labels.
Tsochatzidis SEG+CL MG DDSM-400, The breast tumor segmentation information is integrated into the classification network and a
et al. [28] CBIS-DDSM new loss function is introduced, directing the network’s attention towards tumor regions.
Xing et al. [29] BI-RADS + BU BUS dataset The network combines binary classification of benign/malignant tumors with BI-RADS
CL classification information.
Wimmer et al. Density+ MG OPTIMAM A multi-branch deep learning model integrates features from different tasks to achieve
[30] DET+ DDSM comprehensive patient-level predictions.
CL

Fig. 2. Overview of a breast cancer screening framework. With the three task-specific models. MC-model: performs a single task of mass classification; DC-model: an
auxiliary model which performs a separate task of density classification; MS-model: an auxiliary model which performs a separate task of mass segmentation. MC-MS:
a multi-path prediction model that incorporates mass segmentation mask.

Fig. 3. MC-Model structure diagram.

3.1.1. Residual token-mixing block computational parameters and over-fitting.


The Residual Token-mixing block consists of a batch normalization Given an input feature X ∈ RH × W × Cwith height H, width W and
(BN) layer, a dual-path MLP layer, and a 1 × 1 convolutional layer. The channel C, R represents the real number field. In the horizontal mixing
MLP module is utilized to establish contextual relationships. Given the path, the data tensor is reshaped to HC × W with weights WW ∈ RW × Wis
limitations of the MLP in handling large numbers of parameters and high a linear layer applied to each HC row to mix the information. A similar
computational complexity, we used the sparse MLP block proposed by operation is applied in the vertical mixing path, where the linear layer is
Tang et al. [33], which consists of three paths, two of which are spatially characterized by the weights WH ∈ RH × H. The third branch does re­
mixed token-mixing paths, one each along the horizontal and vertical sidual connection, and finally, the output of the three branches is
directions. These paths ensure that a token only interacts directly with concatenate and convolved by 1 × 1 kernel of convolution to get the
tokens on the same row or column, instead of interacting with all tokens. final output, and skip-connect with the input of token-Mixing-block, as
Such a structure can overcome the problems of excessive number of follows:

4
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Fig. 4. RSMLP structure. Res-conv Module consists of 1 × 1 and 3 × 3 convolutional blocks to form bottle neck, each convolutional block contains cascaded BN
layers, activation layers and convolutional layers. Res-mixing module consists of the spatial token-mixing block and the channel-mixing block are composed of two
parts, and the token-mixing block adopts the sMLP block structure.

Y = X + FC(Concat(XH , XW , X) (1) 3.2. Density classification models

3.1.2. Residual channel-mixing block Our proposed DC model uses Re2net101 [34] as a feature extractor
The Residual Channel-mixing module is composed of a LayerNorm to perform density classification tasks for mammograms. As shown in
layer and an MLP block. The output of the Residual Token-mixing block Fig. 5-(a), Res2Net is a novel multiscale network that divides the feature
first passes through the Layer Norm layer, which normalizes the feature mapping within the residual block of ResNet into multiple channels and
map along the channel dimension. The MLP block consists of two linear designs a residual-like connection between different channels, which
fully connected layers and a GeLU activation layer. The first linear layer enables the model to improve multi-scale representation at a finer
extends the dimensionality from D to αD, and the second layer decreases granularity level. Specifically, in the Res2NetBlock, the input features
the dimensionality from αD back to D, where α is the hyperparameter. are divided into s groups, denoted as xi, where i ∈ {1, 2, ⋅⋅⋅, s}. The
channel dimension of each group feature map is 1/s of the channel
number of input feature map. Except for x1, each group of feature map
undergoes a kernel with 3 × 3 convolution operation. Additionally,
except for x1 and x2, the feature map of the i th group is first added to the
output of the previous group before the convolution operation. Finally,
the outputs of the s groups are concatenated along the channel dimen­
sion and passed through a 1 × 1 convolutional layer. When S = 4, the
structure of the Res2NetBlock is illustrated in Fig. 4-(b).

3.3. Mass segmentation model

Our proposed MS-model uses an encoder-decoder structure to


perform the mass mask segmentation task. In the encoder we adopt a
long-term contextual relationship modeling based on the self-attention
mechanism, and the MS-model structure is shown in Fig. 6. The self-
attentive module uses MedT proposed by Valanarasu et al. [35] which
is an axial self-attentive based visual transformer (ViT) model. On the
other hand, considering the prohibitive computational cost of tradi­
tional ViT models, the use of axial attention allows the decomposition of
2D self-attentiveness into two 1D self-attentiveness in a way that
significantly saves computational cost. The decoder part of the
MS-model uses ConvNet to upsample the extracted depth features to the
Fig. 5. DC-Model structure diagram. (a) Res2Net network structure diagram; input resolution for pixel-level semantic prediction, and co-mbines
(b) Res2Net block structure diagram.
high-resolution from the encoder at different scales features are fused
with jump connections to reduce the loss of spatial information due to

5
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Fig. 6. MS-model structure diagram.

downsampling, and the model accomplishes accurate segmentation of 3.4.2. Mask feature learner
breast masses with fewer parameters. The model uses a CNN constituting a mask feature learner (MFL),
similar to VGGNet, consisting mainly of five convolutional blocks
(ConvBlocks) and five pooling layers. Each ConvBlock consists of a
3.4. Multi-task fusion classification network convolutional layer with a convolutional kernel of 3 × 3 and a padding
of 1, a batch normalisation layer, and a ReLU activation layer. A
3.4.1. 1 mc-ms fusion model maximum pooling layer with a kernel size of 2 × 2 and a step size of 2 is
We first developed a mass classification-mass segmentation (MC-MS) applied after every two blocks to gradually reduce the spatial dimension.
fusion network, as shown in Fig. 1. The network consists of an MC The numbers of channels used in each convolution block are 32, 64, 128,
module, an MS module, and a feature fusion module and uses 256, and 512. The output is a 7 × 7 × 512 feature map, and the high-
mammogram Xi and segmentation mask image Ximask outputted by the level feature vector of the mass mask is obtained by averaging pooling
MS-model as dual input. The model was inspired by the radiologists’ and fully concatenated operations. The final diagnosis (benign/malig­
practice of examining mammograms; that is, first analysing the whole nant) of the MC-MS fusion model is obtained by feature-level fusion with
image while focusing on identifying the presence of suspicious masses in the MCL output, followed by classification.
the breast and then using the shape and boundary of the masses as the
key features to determine breast cancer. Following this screening 3.4.3. DC-MS-MC-Fusion model
approach, the MC-MS model aimed to not only classify breast cancer as We constructed a multi-task hybrid model for obtaining compre­
benign or malignant within the whole receptive field of the mammo­ hensive assessment results while retaining the predictions of separate
gram but also analyse and integrate the segmentation task results with single-task models related to radiological features and risk factors. We
the density classification results to improve the accuracy of mass clas­ followed the principle that given a mammogram Vi from patient x, the
sification. MC-MS model structure is show in Fig. 7.

Fig. 7. MC-Model structure diagram.

6
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

extent to which the MFL contributes to the MC model is considered 4. Performance evaluation
based on the results of the density assessment (e.g., visually, the denser
is the breast, the more blurred are the edges of the mass mask). If the 4.1. Dataset
density assessment reveals the breast to be fibrous, the more confident
are the mass features learned by the MFL, and the corresponding The CBIS-DDSM is a selected breast imaging subset of the Digital
contribution to the MC-MS fusion network is larger. Thus, we first Database for Screening Mammography (DDSM) [36], including a total of
defined a gating function as follows: 3071 digitized mammograms stored in DICOM format, with each image
provided with mask groud-truth annotated by human experts at
Gate = σ (x) (2)
pixel-level markers. The lesions in CBIS-DDSM contain both masses and
1 calcified spots, and based on the study content of this paper, we only
σ (y) = y
(3) considered 1596 images related to masses.
1 + e−
The INbreast dataset [37] is a full-field digital mammography
Where σ( ⋅ ) denotes the sigmoid function and Weight denotes the (FFDM) dataset, which was acquired at a Breast Centre in Hospital de
learnable parameter. São João, Porto, Portugal. The INbreast dataset contains 410 mammo­
Next, the following gates are defined to learn the fusion logic. The grams with lesions including masses, calcifications, and here we
gate can learn all possible cases. consider only 107 images that contain mass masks ground-truth anno­
XMC = MCL(Xin ) (4) tated by human experts with pixel-level markers.
Both datasets were applied with data enhancement preprocessing to
XMs = MSL(Xmask ) (5) reduce the effects of overfitting, including random horizontal flipping,
image contrast and saturation adjustment, rotation at 15-degree in­

XMC = Gate(XMC ) ∗ XMC (6) tervals, and finally mammograms were resized to a standard size of 224
× 224.

XMS = Gate(XMS ) ∗ XMS (7)
4.2. Evaluation metrics
Finally, the results of the density evaluation guide to the definition of
an enhanced feature representation F by fusing features from the MCL
We evaluated the performance of different models by computing
and features from the MFL.
widely used metrics, namely accuracy (ACC), precision, and recall
( ∗ )
F = Concat XMC ∗
, XMS (8) metrics for the classification task and dice coefficient (DSC), precision,
and intersection of union (IoU) for the segmentation task. The mathe­
Therefore, the diagnostic results are represented by the following Eq. matical representation of each metric is shown in Eqs. (12)-(17):
(9)
TP + TN
ZF = MC MS Model(Xin , Xmask ) ACC = (12)
(9) TP + FP + TN + FN
= Softmax(F)
TP
Accordingly, if the density assessment shows that the breast is dense, precision = (13)
the diagnosis is represented by the following equation: TP + FP

ZD = MC Model(Xin ) (10) recall =


TP
(14)
TP + FN
The prediction results of the multi-task fusion model are represented
by the following equation: 2TP
F1 = (15)
2TP + FP + FN
Z = PDC ZF + (1 − PDC )ZD (11)
Where PDC is the prediction of density label. DSC =
2|A ∩ B|
(16)
The flow of the prediction phase based on the multitask fusion model |A| + |B|
is represented by the following pseudo-code (Algorithm 1):
A∩B
IoU = (17)
A∪B
Algorithm 1 Where TP (True Positive) indicates the number of correctly classified
Multi-task fusion for breast mass diagnosis. malignant masses, TN (true negative) indicates the number of correctly
# DC-Model: Density Classification Model classified benign masses, FP (false positive) indicates the number of
# MC-Model: Mass Classification Model benign masses that were incorrectly classified as malignant masses, and
# MS-Model: Mass Segmentation Model FN (false negative) indicates the number of malignant masses that were
# MC-MS-Model: Mass Classification Model with Mask Input from MS-Model incorrectly classified as benign masses. Where A and B are the seg­
# Density- Type: D: Dense; F: Fatty
mentation ground truth and prediction, respectively.
# Crop-ROI: Extract mass ROI from Input
Input: Mammogram image X In addition, we calculated the area under (AUC) the receiver oper­
1. Density-Type = DC-Model(X) ating characteristic (ROC), which shows the true-positive rate versus the
2. if Density- Type = "D": false-positive rate.
Diagnosis-Result = MC-Model(X)
else if Density-Type = "F":
X-ROI = Crop-ROI(X) 4.3. Implementation details
Mass-Mask = MS-Model(X-ROI)
Diagnosis-Result = MC-MS-Model(X, Mass-Mask) In this paper, we construct and train the proposed model using
end if Pytorch, and all experiments are done on a hardware platform with a 10-
Output: Diagnosis-Result, Reference Result (Density-Type, Mass-Mask)
core 5.20 GHz Intel Core I9–10,900 CPU and NVIDIA 3070 Geforce RTX

7
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

GPU. We used SGD optimizer for gradient descent optimization to up­ Table 4 shows that the MS model outperformed the UNet-based CNN
date the model for training, and the logistic regression layer was method on both the CDIS-DDSM and INBreast datasets for all perfor­
initialized with a learning rate of 10–4 and decayed every 10 epoches mance metrics, achieving at least 0.84 %, 8.8 %, and 2.93 % improve­
with a decay rate of 0.98. The parameters were updated in small batches ment in DSC, precision, and IoU, respectively, for the CDIS-DDSM data
with a batch size of 16, and the training was stopped at the 200th epoch. and at least 3.44 %, 10.31 %, and 1.06 % improvement, respectively, for
The number of layers in each module of the MC-model is set to the same the INBreast dataset.
number as the sMLP method [33]. Specifically, N1, N2, M3, and M4 are To further highlight the advantages of our MS model, we compared
configured as 2, 8, 14, and 2, respectively. In the residual channel fusion its complexity with that of the baseline methods. The results showed that
block of the MC-model, the hyperparameter α is also set to 3. As for the our MS model has only 16 M parameters and substantially fewer pa­
DC-Model, Res2Net101 is employed, and thus, the number of layers in rameters than the Unet-based segmentation methods, which can
the four groups of Res2Net Blocks (M1, M2, M3, M4) is set to 3, 4, 23, contribute to saving computational resources.
and 3, respectively. Fig. 9 shows the visualisation results based on the CBIS-DDSM
dataset. The proposed MS model exhibited a better mass segmentation
5. Experimental results effect than other models.
We compared the proposed MS-model with current SOTA methods,
5.1. Experimental results of the density classification model including MaX-DeepLab [47], Axial-DeepLab [48], MedT [35], AUNet
[49] and FCN–CRF-Net [50]. The comparative results are presented in
To evaluate the validity of the density classification task, we ana­ Table 5. Remarkably, our segmentation model MS outperforms all other
lysed the intermediate results obtained using the corresponding task- models in terms of DSC and precision, exhibiting the highest perfor­
specific model (i.e., the DC model), which divides the mammogram mance metrics.
into two classes: dense breast and non-dense breast. We compared the Furthermore, we conducted a statistical analysis of the DSC for each
DC model with several other state-of-the-art (SOTA) methods on two model using the Wilcoxon signed-rank test, with a significance level set
publicly available datasets – CBIS-DDSM and INBreast – from a quan­ at α = 0.05. The results are presented in Tables 4 and 5. The p-values
titative assessment perspective. indicate the differences between the proposed method and other models;
The comparative results of the Baseline model are shown in Table 2. a p-value less than 0.05 indicates a significant difference between two
We chose four different baseline methods to compare with our adopted models. These values suggest that the proposed model exhibits signifi­
density classification model DC-model (Res2net101), which are cant differences compared to the baseline model based-UNet and the
ResNet50, ResNet101, Res2Net50 and MobileNet [38]. The density current SOTA models.
classification performance of the DC model on both the CBIS-DDSM and
INBreast datasets was significantly superior to that of the baseline 5.3. Experimental results of the mass classification model
models. In particular, the ACC of the DC model was 2.1 % and 3 % higher
than that of the baseline models for the CBIS-DDSM and INBreast 5.3.1. Comparison with the baseline models
datasets, respectively, and the recall metric was 8 % higher for the We evaluated some classical CNNs, including ResNet50, ResNet101,
INBreast dataset. Res2Net50, Res2Net101, and MobileNet; CNNs with attention mecha­
Fig. 8 shows the ROC curves for all metrics of the DC model for the nism hybrid models, including ResNet50+channel attention and
CBIS-DDSM and INBreast datasets, with AUC scores of 0.911 and 0.992, ResNet101+channel attention; and MLP-based models without atten­
respectively, which are 1.6 % and 0.4 % higher than the AUC scores of tion, including sMLP and our MC model, on both datasets. The quanti­
the ResNet101 model. tative evaluation results are shown in Tables 6 and 7.
The results of comparison with state-of-the-art (SOTA) methods are On the CBIS-DDSM dataset, our MC model showed at least 0.3 %
shown in Table 3. Compared with other SOTA methods, our DC-MODEL improvement in ACC compared with the other three individual ap­
obtained the best ACC performance on both INBreast and DDSM proaches, with the best performance achieved by ResNet101 for the
datasets. precision metric. In the recall metric, the best performance was achieved
by the ResNet101+CA and ResNet101+CA models. In the F1 metric, the
MC model and the ResNet101+CA model were evenly matched. How­
5.2. Experimental results of the mass segmentation model ever, in terms of overall performance, the MC model was found to be the
most superior.
To evaluate the effectiveness of the mass segmentation task, we On the INBreast dataset, our MC model showed at least 0.7 %
compared the intermediate results obtained using the corresponding improvement in ACC compared with the other three types of methods. In
task-specific model (i.e., the MS model) on the two datasets – CBIS- the precision and recall metrics, the Resnet101+CA and Res2Net101
DDSM and INBreast – with those obtained using several baseline models showed the best performance, respectively, while in the F1
methods, these baseline methods are based on the UNet network, metric, our MC model was evenly matched with both Res2Net101 and
including Unet [44], R2UNet [45], and UNet++ [46]. MobileNet at 0.83. The overall performance of the MC model was found
to be the most robust.
Table 2 As can be seen from Fig. 10, the MC model had the highest AUC,
Quantitative evaluation of different methods of mammographic density classi­ which was 0.3 % higher than the AUC scores of the CNN-based and
fication on two data sets. CNN+attention models on the CBIS-DDSM dataset and 1.9 % higher
Method Precision Recall ACC than the AUC score of the CNN+attention model on the INBreast dataset.
CBIS-DDSM Resnet50 0.81 0.88 0.80
Resnet101 0.82 0.91 0.83 5.3.2. Ablation studies
MobileNet 0.84 0.85 0.82 We conducted ablation experiments to compare different diagnostic
Res2Net50 0.85 0.88 0.83
models. As can be seen from Table 8, on the CBIS-DDSM dataset, the
DC model 0.85 0.91 0.86
INBreast Resnet50 0.85 0.83 0.88 AUC for the mass diagnostic model MC-model is 0.89. The MC-Model
Resnet101 0.82 0.92 0.91 takes only mammographic images as input, performing image-level
MobileNet 0.96 0.92 0.91 feature extraction from mammographic images and directly outputting
Res2Net50 0.89 0.89 0.93 diagnostic categories (benign/malignant). However, in clinical practice,
DC model 0.96 1.00 0.96
radiologists typically do not make direct benign/malignant diagnoses

8
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Fig. 8. ROC curves for each baseline model of density classification. (a) The ROC of the different networks on the CBIS-DDSM dataset. (b) The ROC of the different
networks on the INBreast dataset.

recall of 0.87 on the test set. Our proposed multi-task fusion model
Table 3 showed improvements of 15 % in ACC, 12.85 % in AUC, 17 % in pre­
Quantitative evaluation of SOTA methods of mammographic density classifi­
cision, and 2.5 % in recall metrics for the CDIS-DDSM dataset and of 5.4
cation on two data sets.
% in ACC, 3.3 % in AUC, 17 % in precision, and 10 % in recall metrics for
Method Dataset ACC the INBreast dataset compared with the single-task models.
Mohamed et al. [39] INBreast 0.80 For both dense and non-dense breasts, the edge features of the mass
Li et al. [40] INBreast 0.89 mask could be selectively utilised after considering the density features
Wu et al. [41] INBreast 0.91
in our multi-task fusion model. The experimental results demonstrated
Nguyen et al. [42] INBreast 0.73
Li et al. [43] INBreast 0.84 that the fusion model afforded more objective, accurate, and interpret­
DC-Model INBreast 0.96 able assessment results than the single-task models.
Wimmer et al. [30] DDSM 0.85
DC-Model CBIS-DDSM 0.86 5.3.3. Comparison with SOTA models
We compared the multi-task fusion model after introducing mass
based solely on the patient’s mammographic. They comprehensively segmentation mask and density information with some of the SOTA
analyze various features in the mammographic, such as breast density methods for breast mass diagnosis. The performance of the SOTA
classification, potential issues related to lesion obscuration, tumor size, methods was obta ined from previous papers and is described in Table 9.
location, contour characteristics, etc. Therefore, we attempted to Compared with the SOTA methods, our multi-task fusion models showed
incorporate contour features obtained after mass segmentation into the improvements of at least 1.8 % in ACC and 2.5 % in AUC for the CDIS-
mass diagnostic network to further enhance the accuracy of mass DDSM dataset and of at least 2.9 % in ACC and 7.1 % in AUC for the
diagnosis. INBreast dataset.
The MC-MS-Model is a multi-task fusion diagnostic network that
takes mammographic images and mass masks as dual inputs. It achieved 6. Discussion
an improved AUC of 0.95 and Precision of 0.92 for diagnostic results, but
recall only increased by 0.1. The experimental results demonstrate that our developed multi-task
We further analyzed the features of input image. When the input fusion model to assess mammograms for breast cancer diagnosis has
image Xi corresponds to fatty breasts, the morphological features of le­ improved diagnostic performance and interpretability compared with
sions in the image are more distinct, and the segmented lesion mask SOTA models. The fusion model outputs breast density and breast mass
information is relatively accurate. The edge information provided by the morphology results for cancer prediction and radiological assessment in
MC-MS model can offer valuable auxiliary information to the network. a manner similar to the assessment practice followed by radiologists.
However, for highly dense breasts, the extracted edge contour infor­ These intermediate results can provide the clinicians more interpretable
mation may introduce inaccurate interference, limiting the performance predictions and estimates of multiple reference indicators in the pre­
improvement of model. Therefore, we introduced density information to diagnostic process, thus improving human–machine collaboration for
control the contribution of mass edge information to the diagnostic mammography and significantly reducing the workload of radiologists.
network. The density-guided multi-task fusion diagnostic network ulti­
mately achieved an ACC of 0.91, AUC of 0.95, Precision of 0.99, and

Table 4
Quantitative evaluation of different methods of mammographic mass segmentation on two data sets.
Method Parameters CBIS-DDSM INBreast

DSC Precision IoU P-value DSC Precision IoU P-value

UNet [44] 34M 0.88 0.808 0.78 0.0086<0.01 0.88 0.84 0.79 0.0031<0.01
R2UNet [45] 101M 0.848 0.848 0.78 0.0029<0.01 0.85 0.85 0.79 0.0024<0.01
UNet++ [46] 36M 0.888 0.85 0.78 0.0049<0.01 0.87 0.83 0.79 0.0001<0.01
MS-model 16M 0.898 0.93 0.81 ∞ 0.91 0.96 0.81 ∞

9
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Fig. 9. Comparison of the visualization of model ROI segmentation results of different methods on CBIS-DDDSM dataset: (a) Image; (b)Groundtruth; (c) UNet; (d)
R2UNet; (e) UNet++; (f) MS model.

Table 5
Quantitative evaluation of SOTA methods of mammographic mass segmentation on two data sets.
Method CBIS-DDSM INBreast

DSC Precision P-value DSC Precision P-value

MaX-DeepLab [47] 0.87 0.92 0.0000<0.01 0.82 0.88 0.0000<0.01


Axial-DeepLab [48] 0.88 0.92 0.0015<0.01 0.90 0.91 0.0150<0.05
MedT [35] 0.88 0.92 0.0394<0.05 0.90 0.96 0.0055<0.01
AUNet [49] 0.89 – 0.1567>0.05 0.88 – 0.0012<0.01
FCN–CRF-Net [50] – – – 0.89 – –
MS-model 0.89 0.93 ∞ 0.91 0.96 ∞

(1) Density Classification Task: We used Res2Net to build a DC model


Table 6
to improve the multiscale characterisation ability of the CNN-
Quantitative evaluation of different methods of mammographic mass classifi­
based model at a finer level of granularity. Res2Net was used
cation on CBIS-DDSM data sets.
because i) it constructs similar residual connections with hierar­
Type Method ACC Precision Recall F1 chy within a single residual block, ii) it replaces the 3 × 3 con­
CNN-based ResNet50 0.79 0.68 0.83 0.74 volutional kernel with a smaller filter set of n channels, iii) it can
ResNet101 0.81 0.78 0.74 0.76 utilise features of different resolutions to improve the multiscale
Res2Net50 0.87 0.72 0.81 0.76
feature acquisition capability of a network, and iv) it expands the
Res2Net101 0.82 0.70 0.81 0.75
MobileNet 0.83 0.70 0.81 0.75 perceptual field of each network layer.
MLP-based sMLP 0.85 0.70 0.73 0.71 (2) Mass Segmentation Task: In terms of implementation methods,
CNN+attention ResNet50+CA 0.85 0.63 0.86 0.72 the prevailing research approach is to use deep-learning tech­
ResNet101+CA 0.87 0.61 0.99 0.76 niques based on CNNs for feature extraction, and many studies
CNNþMLP MC-model 0.87 0.69 0.84 0.76
have demonstrated the excellent characterisation capability of
CNNs. For the mass segmentation task, the MS model adopted a
similar structure to U-Net, by using a symmetric encoder–decoder
Table 7 framework with skip connections. However, the design of the
Quantitative evaluation of different methods of mammographic mass classifi­
feature extraction blocks accounted for the inherent limitations of
cation on INBreast datasets.
traditional convolutional blocks, which render them unable to
Type Method ACC Precision Recall F1 effectively model the display of remote relationships, especially
CNN-based ResNet50 0.79 0.74 0.74 0.74 in medical images, which usually have a large background range
ResNet101 0.82 0.83 0.79 0.81 and small range of target-lesion characteristics. Moreover, target
Res2Net50 0.78 0.77 0.74 0.75 structures exhibit large differences in texture, shape, and size
Res2Net101 0.87 0.81 0.86 0.83
MobileNet 0.85 0.71 0.77 0.83
between patients, resulting in the need to establish global
MLP-based sMLP 0.80 0.75 0.83 0.79 contextual relationships, which is challenging. Thus, inspired by
CNN+attention ResNet50+CA 0.78 0.78 0.78 0.78 the attention mechanism of natural language processing, the
ResNet101+CA 0.86 0.88 0.68 0.70 feature extraction block of the MS model used an attention
CNNþMLP MC-model 0.87 0.83 0.83 0.83
mechanism, which has better remote modelling capability. As

10
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

Fig. 10. ROC curves for each baseline model of mass classification. (a) The ROC of the different networks on the CBIS-DDSM dataset. (b) The ROC of the different
networks on the INBreast dataset.

Table 8
Quantitative evaluation results of the multitask fusion model.
Dataset Model Input ACC AUC Precision Recall

CBIS-DDSM MC-Model Image 0.89 0.82 0.82 0.84


MC-MS-Model Image+mass mask 0.90 0.95 0.92 0.85
(No density tpye)
MC-MS-Model+ MC-Model Image+mass mask+density 0.91 0.95 0.99 0.87
(Introducing density)
INBreast MC-Model Image 0.87 0.88 0.83 0.83
MC-MS-Model Image+mass mask 0.89 0.88 0.92 0.87
(No density tpye)
MC-MS-Model+ MC-Model Image+mass mask+density 0.93 0.92 1.00 0.93
(Introducing density)

benign–malignant classification of masses significantly


Table 9
improved. Even in high-noise scenarios (e.g., in dense breast
Quantitative evaluation results of the multi-task fusion model compared with
tissue), where masses may be hidden, malignant masses were
the SOAT model.
accurately identified via multi-task fusion prediction.
Dataset Method AUC ACC

CBIS-DDSM Dhungel et al. [51] 0.80 – In summary, the logic of our proposed multi-task fusion diagnostic
Dhungel et al. [52] 0.76 0.91 model in the practical prediction process is as follows: upon acquiring a
Zhu et al. [53] 0.89 0.90
set of mammographic images for a group of patients, the model does not
MTF-models 0.92 0.93
INBreast Tulder et al. [54] 0.80 – perform direct image-level malignancy classification. Instead, it initiates
Zhu et al. [53] 0.79 0.74 an assessment of the density type of the input images. Subsequently,
Arora et al. [24] 0.88 0.88 based on the results of the density assessment, the diagnostic mode
MTF-models 0.95 0.91 determines whether to consider contour information of tumors. For
breasts with low density, mass contour features are introduced into the
seen in Fig. 7, our MS model distinguished the foreground mask mass diagnostic model for multi-task information fusion classification,
and background region more accurately. However, as seen from aiming to achieve a more accurate and interpretable prediction. How­
the data presented in Table 2, the number of parameters in the MS ever, in the case of highly dense breasts where masses are challenging to
model exhibits a significant advantage from the number of pa­ identify, the reference value of mass contour features for diagnostic
rameters in other CNN-based models. This means that the MS purposes is limited. Therefore, it is relatively more suitable to directly
model is superior to other CNN-based models, as it uses far fewer perform image-level malignancy classification on the input images. The
computational resources while maintaining a performance output for clinicians includes not only the final prediction of breast
advantage. cancer diagnosis but also density classification results and mass seg­
(3) Mass Classification Task: Although several studies have demon­ mentation results as reference results.
strated the powerful remote modelling capability of attention- Further research into DC models and MS models can help to further
based mechanisms, the MLP-based modelling approach remains improve the performance and accuracy of our multi-task fusion model
the best option for establishing a pure global relationship. For the that incorporates these two individual models. Furthermore, compared
mass classification task, our MC model combined the local feature with physical features observed in clinical practice, such as breast
extraction capability of a CNN with the remote modelling capa­ density asymmetry between the left and right breasts, which may be an
bility of a feedforward neural network to achieve higher areas indicator of cancer, our proposed multi-task fusion model can provide
under the curve. However, as shown in Tables 3 and 4, it did not clinicians with more refined assessment results in multi-view scenarios.
show a clear advantage for each evaluation metric. Nevertheless,
when we introduced a multi-task fusion model containing density
evaluation results and MS results, its performance in the

11
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

7. Conclusion [12] A.A. Mohamed, W.A. Berg, H. Peng, Y. Luo, R.C. Jankowitz, S. Wu, A deep learning
method for classifying mammographic breast density categories, Med. Phys. 45 (1)
(2018) 314–321.
In this study, we developed a multi-task fusion model that combines [13] S. Seyyedi, M.J. Wong, D.M. Ikeda, and C.P. Langlotz, “Screenet: a multi-view deep
different task-specific models to achieve comprehensive breast cancer convolutional neural network for classification of high-resolution synthetic
screening from mammograms. We trained and evaluated our fusion mammographic screening scans,” arXiv preprint arXiv:2009.08563, 2020.
[14] S.M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, H. Ashrafian,
model for multiple task objectives related to mammogram analysis. T. Back, M. Chesus, G.S. Corrado, A. Darzi, et al., International evaluation of an ai
First, we developed a CNN+MLP-based backbone network MC model for system for breast cancer screening, Nature 577 (7788) (2020) 89–94.
X-ray mammography-based benign and malignant diagnosis of lesions. [15] A. Yala, C. Lehman, T. Schuster, T. Portnoi, R. Barzilay, A deep learning
mammography-based model for improved breast cancer risk prediction, Radiology
Second, we developed two auxiliary networks for performing specific X- 292 (1) (2019) 60–66.
ray mammography-assisted analysis tasks: (i) the Res2Net-based breast [16] A.J. Barnett, F.R. Schwartz, C. Tao, C. Chen, Y. Ren, J.Y. Lo, C. Rudin, arXiv
DC model and (ii) the ViT-based lightweight breast MS model. Finally, preprint, 2021.
[17] P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, M. Janda,
we investigated the efficacy of a multi-task fusion model, in which A. Lallas, C. Longo, J. Malvehy, et al., Human–computer collaboration for skin
segmentation information and density information were introduced into cancer recognition, Nat. Med. 26 (8) (2020) 1229–1234.
the mass classification network, in improving the benign–malignant [18] C.J. Cai, S. Winter, D. Steiner, L. Wilcox, M. Terry, " hello ai": uncovering the
onboarding needs of medical practitioners for humanai collaborative decision-
classification of masses. The experimental results demonstrate the su­ making, Proceed. ACM on Humancomput. Interact. 3 (CSCW) (2019) 1–24.
perior performance of our fusion model to other SOTA models, sug­ [19] N. Dhungel, G. Carneiro, A.P. Bradley, Deep learning and structured prediction for
gesting its potential to improve the overall breast cancer diagnostic the segmentation of mass in mammograms, in: International Conference on
Medical image computing and computer-assisted intervention, Springer, 2015,
performance in clinical practice. This fusion model can provide
pp. 605–612.
comprehensive diagnostic prediction to assist radiologists in diagnostic [20] B. Rigaud, O.O. Weaver, J.B. Dennison, M. Awais, B.M. Anderson, T.Y.D. Chiang,
decision making. W.T. Yang, J.W. Leung, S.M. Hanash, K.K. Brock, Deep learning models for
automated assessment of breast density using multiple mammographic image
types, Cancers (Basel) 14 (20) (2022) 5003.
Data availability [21] G. Carneiro, J. Nascimento, A.P. Bradley, Unregistered multiview mammogram
analysis with pre-trained deep learning models, in: International Conference on
Medical Image Computing and Computer-Assisted Intervention, Springer, 2015,
The DDSM dataset is available online at http://marathon.csee.usf. pp. 652–660.
edu/Mammography/Database.html. [22] J. Arevalo, F.A. González, R. Ramos-Pollán, J.L. Oliveira, M.A.G. Lopez,
Representation learning for mammography mass lesion classification with
convolutional neural networks, Comput. Method. Program. Biomed. 127 (2016)
CRediT authorship contribution statement 248–257.
[23] N. Dhungel, G. Carneiro, A.P. Bradley, Automated mass detection in mammograms
Yutong Zhong: Methodology, Software, Writing – original draft. using cascaded deep learning and random forests, in: 2015 international
conference on digital image computing: techniques and applications (DICTA),
Yan Piao: Conceptualization, Funding acquisition, Project administra­ IEEE, 2015, pp. 1–8.
tion, Supervision. Baolin Tan: Validation. Jingxin Liu: Data curation, [24] R. Arora, P.K. Rai, B. Raman, Deep feature–based automatic classification of
Validation. mammograms, Med. Biol. Eng. Comput. 58 (6) (2020) 1199–1211.
[25] X. Shu, L. Zhang, Z. Wang, Q. Lv, Z. Yi, Deep neural networks with region-based
pooling structures for mammographic image classification, IEEE Trans. Med.
Imaging 39 (6) (2020) 2246–2255.
Declaration of competing interest [26] Y. Shen, N. Wu, J. Phang, J. Park, K. Liu, S. Tyagi, L. Heacock, S.G. Kim, L. Moy,
K. Cho, et al., An interpretable classifier for high-resolution breast cancer screening
No other competing interests are declared by any of the authors. images utilizing weakly supervised localization, Med. Image Anal. 68 (2021)
101908.
[27] H. Li, D. Chen, W.H. Nailon, M.E. Davies, D.I. Laurenson, Dual convolutional
References neural networks for breast mass segmentation and diagnosis in mammography,
IEEE Trans. Med. Imaging 41 (1) (2021) 3–13.
[28] L. Tsochatzidis, P. Koutla, L. Costaridou, I. Pratikakis, Integrating segmentation
[1] H. Sung, J. Ferlay, R.L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, F. Bray,
information into cnn for breast cancer diagnosis of mammographic masses,
Global cancer statistics 2020: globocan estimates of incidence and mortality
Comput. Method. Program. Biomed. 200 (2021) 105913.
worldwide for 36 cancers in 185 countries, CA Cancer J. Clin. 71 (3) (2021)
[29] J. Xing, C. Chen, Q. Lu, X. Cai, A. Yu, Y. Xu, X. Xia, Y. Sun, J. Xiao, L. Huang, Using
209–249.
bi-rads stratifications as auxiliary information for breast masses classification in
[2] N. Vállez, G. Bueno, O. Déniz, J. Dorado, J.A. Seoane, A. Pazos, C. Pastor, Breast
ultrasound images, IEEE J. Biomed. Health Inform. 25 (6) (2020) 2058–2070.
density classification to reduce false positives in cade systems, Comput. Method.
[30] M. Wimmer, G. Sluiter, D. Major, D. Lenis, A. Berg, T. Neubauer, K. Bühler, Multi-
Program. Biomed. 113 (2) (2014) 569–584.
task fusion for improving mammography screening data classification, IEEE Trans.
[3] E. Paci, M. Broeders, S. Hofvind, D. Puliti, S.W. Duffy, E.W. Group, European breast
Med. Imaging 41 (4) (2021) 937–950.
cancer service screening outcomes: a first balance sheet of the benefits and harms,
[31] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer:
Cancer Epidemiol. Biomark. Prevent. 23 (7) (2014) 1159–1163.
hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/
[4] I. Muhimmah, R. Zwiggelaar, Mammographic density classification using
CVF International Conference on Computer Vision (ICCV), 2021,
multiresolution histogram information, in: Proceedings of the International Special
pp. 10012–10022.
Topic Conference on Information Technology in Biomedicine, ITAB, Citeseer,
[32] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
2006, pp. 26–28.
Proceedings of the IEEE conference on computer vision and pattern recognition,
[5] K. Bovis, S. Singh, Classification of mammographic breast density using a combined
2016, pp. 770–778.
classifier paradigm. 4th International Workshop On Digital Mammography,
[33] C. Tang, Y. Zhao, G. Wang, C. Luo, W. Xie, W. Zeng, Sparse mlp for image
Citeseer, 2002, pp. 177–180.
recognition: is self-attention really necessary? Proceed. AAAI Conferen. Artifi.
[6] S. Khan, M. Hussain, H. Aboalsamh, G. Bebis, A comparison of different gabor
Intell. 36 (2) (2022) 2344–2351.
feature extraction approaches for mass classification in mammography, Multimed.
[34] S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, P. Torr, Res2net: a new
Tool. Appl. 76 (1) (2017) 33–57.
multi-scale backbone architecture, IEEE Trans. Pattern Anal. Mach. Intell. 43 (2)
[7] R.L. Birdwell, The preponderance of evidence supports computer-aided detection
(2019) 652–662.
for screening mammography, Radiology 253 (1) (2009) 9–16.
[35] J.M.J. Valanarasu, P. Oza, I. Hacihaliloglu, V.M. Patel, Medical transformer: gated
[8] M. Muštra, M. Grgic, K. Dela’ c, Breast density classification usingˇmultiple feature
axial-attention for medical image segmentation, in: International Conference on
selection, Automatika 53 (4) (2012) 362–372.
Medical Image Computing and ComputerAssisted Intervention, Springer, 2021,
[9] I. Sechopoulos, J. Teuwen, R. Mann, Artificial intelligence for breast cancer
pp. 36–46.
detection in mammography and digital breast tomosynthesis: state of the art,
[36] R.S. Lee, F. Gimenez, A. Hoogi, K.K. Miyake, M. Gorovoy, D.L. Rubin, A curated
Semin. Cancer Biol. 72 (2021) 214–225.
mammography data set for use in computer-aided detection and diagnosis
[10] M.U. Dalmıs¸, G. Litjens, K. Holland, A. Setio, R. Mann, N. Karssemeijer, A. Gubern-
research, Sci Data 4 (1) (2017) 1–9.
Mérida, Using deep learning to segment breast and fibroglandular tissue in mri
[37] I.C. Moreira, I. Amaral, I. Domingues, A. Cardoso, M.J. Cardoso, J.S. Cardoso,
volumes, Med. Phys. 44 (2) (2017) 533–546.
Inbreast: toward a full-field digital mammographic database, Acad. Radiol. 19 (2)
[11] Y. Liu, H. Azizpour, F. Strand, K. Smith, Decoupling inherent risk and early cancer
(2012) 236–248.
signs in image-based breast cancer risk models, in: International conference on
medical image computing and computerassisted intervention, Springer, 2020,
pp. 230–240.

12
Y. Zhong et al. Computer Methods and Programs in Biomedicine 247 (2024) 108101

[38] A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. [47] H. Wang, Y. Zhu, H. Adam, A. Yuille, L. Chen, Max-deeplab: end-to-end panoptic
Andreetto, and H. Adam, “Mobilenets: efficient convolutional neural networks for segmentation with mask transformers, in: Proceedings of the IEEE/CVF conference
mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017. on computer vision and pattern recognition, 2021, pp. 5463–5474.
[39] A. Mohamed, Y. Luo, H. Peng, R. Jankowitz, S. Wu, J. Digit Imag. 31 (2018) [48] H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, L. Chen, Axial-deeplab: stand-alone
387–392. axial-attention for panoptic segmentation, in: European Conference on Computer
[40] C. Li, J. Xu, Q. Liu, Y. Zhou, L. Mou, Z. Pu, Y. Xia, H. Zheng, S. Wang, Multi-view Vision, Cham, Springer International Publishing, 2020, pp. 108–126.
mammographic density classification by dilated and attention-guided residual [49] H. Sun, C. Li, B. Liu, Z. Liu, M. Wang, H. Zheng, D. Feng, S. Wang, AUNet:
learning, IEEE/ACM Transact. Comput. Biol. Bioinform. 18 (3) (2020) 1003–1013. attention-guided dense-upsampling networks for breast mass segmentation in
[41] N. Wu, K.J. Geras, Y. Shen, J. Su, S.G. Kim, E. Kim, S. Wolfson, L. Moy, K. Cho, whole mammograms, Phys. Med. Biol. 65 (5) (2020) 055005.
Breast density classification with deep convolutional neural networks, in: 2018 [50] W. Zhu, X. Xiang, T. Tran, G. Hager, X. Xie, Adversarial deep structured nets for
IEEE International Conference on Acoustics, Speech and Signal Processing mass segmentation from mammograms, in: 2018 IEEE 15th international
(ICASSP), 2018, pp. 6682–6686. symposium on biomedical imaging (ISBI 2018), IEEE, 2018.
[42] H. Nguyen, S. Tran, D. Nguyen, H. Pham, H. Nguyen, A novel multi-view deep [51] N. Dhungel, G. Carneiro, A.P. Bradley, Fully automated classification of
learning approach for BI-RADS and density assessment of mammograms, in: 2022 mammograms using deep residual neural networks, in: 2017 IEEE 14th
44th Annual International Conference of the IEEE Engineering in Medicine & International Symposium on Biomedical Imaging (ISBI 2017), IEEE, 2017,
Biology Society (EMBC), IEEE, 2022, pp. 2144–2148. pp. 310–314.
[43] Z. Li, Z. Cui, L. Zhang, S. Wang, C. Lei, X. Ouyang, D. Chen, X. Zhao, Y. Gu, Z. Liu, [52] N. Dhungel, G. Carneiro, A.P. Bradley, A deep learning approach for the analysis of
C. Liu, D. Shen, and J. Cheng, "Domain Generalization for Mammographic Image masses in mammograms with minimal user intervention, Med. Image Anal. 37
Analysis via Contrastive Learning." arXiv preprint arXiv:2304.10226, 2023. (2017) 114–128.
[44] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic [53] W. Zhu, Q. Lou, Y.S. Vang, X. Xie, Deep multi-instance networks with sparse label
segmentation, in: Proceedings of the IEEE conference on computer vision and assignment for whole mammogram classification, in: International conference on
pattern recognition, 2015, pp. 3431–3440. medical image computing and computerassisted intervention, Springer, 2017,
[45] M.Z. Alom, M. Hasan, C. Yakopcic, T.M. Taha, and V.K. Asari,“Recurrent residual pp. 603–611.
convolutional neural network based on u-net (r2u-net) for medical image [54] G.v. Tulder, Y. Tong, E. Marchiori, Multi-view analysis of unregistered medical
segmentation,” arXiv preprint arXiv:1802.06955, 2018. images using cross-view transformers, in: International Conference on Medical
[46] Z. Zhou, M.M.R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: redesigning skip Image Computing and Computer-Assisted Intervention, Springer, 2021,
connections to exploit multiscale features in image segmentation, IEEE Trans. Med. pp. 104–113.
Imaging 39 (6) (2019) 1856–1867.

13

You might also like