A Deep Learning Classifier for Digital Breast Tomo
A Deep Learning Classifier for Digital Breast Tomo
Physica Medica
journal homepage: www.elsevier.com/locate/ejmp
A R T I C L E I N F O A B S T R A C T
Keywords: Purpose: To develop a computerized detection system for the automatic classification of the presence/absence of
Digital Breast Tomosynthesis mass lesions in digital breast tomosynthesis (DBT) annotated exams, based on a deep convolutional neural
Breast Tumor network (DCNN).
Machine Learning
Materials and Methods: Three DCNN architectures working at image-level (DBT slice) were compared: two state-
Convolution neural network
Computed Aided Diagnosis
of-the-art pre-trained DCNN architectures (AlexNet and VGG19) customized through transfer learning, and one
Deep Learning developed from scratch (DBT-DCNN). To evaluate these DCNN-based architectures we analysed their classifi
cation performance on two different datasets provided by two hospital radiology departments. DBT slice images
were processed following normalization, background correction and data augmentation procedures. The accu
racy, sensitivity, and area-under-the-curve (AUC) values were evaluated on both datasets, using receiver oper
ating characteristic curves. A Grad-CAM technique was also implemented providing an indication of the lesion
position in the DBT slice.
Results: Accuracy, sensitivity and AUC for the investigated DCNN are in-line with the best performance reported
in the field. The DBT-DCNN network developed in this work showed an accuracy and a sensitivity of (90% ± 4%)
and (96% ± 3%), respectively, with an AUC as good as 0.89 ± 0.04. A k-fold cross validation test (with k = 4)
showed an accuracy of 94.0% ± 0.2%, and a F1-score test provided a value as good as 0.93 ± 0.03. Grad-CAM
maps show high activation in correspondence of pixels within the tumour regions.
Conclusions: We developed a deep learning-based framework (DBT-DCNN) to classify DBT images from clinical
exams. We investigated also a possible application of the Grad-CAM technique to identify the lesion position.
1. Introduction and hence also helping to reduce the rate of recalls, in the last decades
research efforts have been directed towards new pseudo-3D or 3D X-ray
Breast cancer is the most common malignancy in women [1]. Breast breast imaging technologies such as digital breast tomosynthesis (DBT)
screening with digital mammography (DM) is considered the most [3–5] and breast computed tomography [6], respectively. These tech
effective method of detecting early-stage breast cancer and reducing niques allow to overcome – in part or totally, respectively – the overlap
related mortality. However, mammography is not performing ideally as of normal and pathological tissues in the direction of the incident beam,
a diagnostic exam, in terms of sensitivity and specificity, particularly for which can decrease the visibility of malignant abnormalities or simulate
dense breasts [2]. With the aim of improving its diagnostic performance the appearance of a lesion [7–9]. DBT acquires many two-dimensional
* Corresponding author at: Università di Napoli Federico II, Dipartimento di Fisica “Ettore Pancini”, I-80126 Napoli, Italy.
E-mail address: [email protected] (G. Mettivier).
1
These authors contributed equally.
2
ORCID: 0000-0001-6606-4304.
3
ORCID: 0000-0001-7656-8370.
4
ORCID: 0000-0002-3034-7166.
5
ORCID: 0000-0001-9409-0008.
https://doi.org/10.1016/j.ejmp.2021.03.021
Received 1 December 2020; Received in revised form 4 February 2021; Accepted 13 March 2021
Available online 31 March 2021
1120-1797/© 2021 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
projections from various angular positions of the X-ray tube around the DBT slices of a given exam, more relevant for the final classification task
compressed breast, at a comparable level of radiation dose with respect performed by the network. This permitted to explore the possibility of
to DM [5]. This permits to reconstruct approximately the radiodensity providing an indication of the position of the mass inside the slice(s)
map in different planes transverse to the direction of the beam (typically classified as abnormal, as well as for showing the location of possible
with a vertical separation of about 1 mm), thus obtaining a (pseudo) network activation in zones less relevant for the diagnostic task.
three-dimensional representation of the anatomy of the mammary tis
sues and a clearer localization of possible lesions (masses and micro 2. Materials and methods
calcifications) [10] combined with a synthetic representation of a
mammography view. The interpretation of a DBT exam requires the This section first describes our datasets, then briefly reviews state-of-
visualization and analysis of tens of image slices in a large dataset for the-art techniques that we used as benchmark (AlexNet and VGG19
each exam, in the craniocaudal or in the mediolateral oblique view networks) for comparison with our DCNN technique. Finally, it in
[11–13], hence adding complexity and increased reading time to the troduces the proposed DBT-DCNN architecture.
radiological clinical workflow with respect to a conventional DM exam.
In this context, robust computer-aided detection (CAD) systems - 2.1. Dataset
capable of managing the complexity of DBT lesion search space in the
diagnostic interpretation task - may represent a crucial tool, also The two DBT datasets used in this study have been made available by
determining a reduced inter-observer and intra-observer variability in two Hospitals6 (here indicated as H1 and H2) upon approval of their
the exam reading process. In line with research in past decades for institutional review board. For this research, digital images were used,
developing CAD systems for mammography exams, particularly using in disaggregated form, not attributable in any way to any specific pa
deep learning (DL) techniques [14–17], we developed a CAD system tient, managed in complete anonymity and in compliance with current
dedicated to classification of DBT exams, to improve radiologists’ EU legislation on the processing of sensitive data. Both sets of DBT
overall performance in DBT exam analysis and potentially improve exams were reconstructed to 1 mm slice spacing and an in-plane reso
diagnostic accuracy. A specific goal was an acceptable trade-off between lution of 90 × 90 μm2 for reconstructed planes, using iterative and FBP
the computational costs arising from the automatic analysis and the reconstruction techniques for H1 and H2, respectively [38]. The
classification performance, in terms of increased sensitivity and reduced reconstructed slices had a resolution of 16 bit/pixel.
false positive (FP) rate. To reduce the white noise, we processed all images with a denoising
Preliminary studies [18–35] for developing CAD systems dedicated algorithm developed using ImageJ software. The algorithm evaluates
to DBT were based on the sequence of these processes: the detection of the noise inside a selected region of interest (ROI) of 300 × 300 pixels in
the mass candidates; their segmentation from background; the exam the image and the measured noise value was subtracted from the whole
classification via the extraction of relevant features. Some studies image. Here, the size of the ROI is a fundamental choice, since applying
adopted hand-crafted features developed in general image processing the algorithm on a too large matrix implies a final image that is too
applications or conventional mass features [18–27]. Such an approach blurry to extract information relevant to network learning.
relies on expert knowledge. System model predictions in the automated Fig. 1 shows an example of the application of this algorithm: as
lesion detection task may be improved by exploiting recent advances in shown here, a significant noise reduction and a better edge-sharpness
machine learning, especially related to the use of deep convolutional can be obtained.
neural networks (DCNNs), as well as advances in graphical processing As last step, the dimensions of the slices were reduced from their
units (GPU) based technologies, availability of large datasets of anno initial size (1072 × 2356 pixel for H1 dataset and 1996 × 2457 pixel for
tated images and novel optimization methods [28–34]. At variance with H2 dataset) to 300 × 300 pixels using a resizing algorithm, which av
feature-based methods, DCNN-based methods produce a decision (i.e., erages adjacent pixels. The necessity to bin the image is due to a hard
classify data) directly from input raw images. They do not require seg ware limitation on the available GPU architecture. The maximum size of
mentation and hand-crafted feature extraction steps, which are neces 300 × 300 pixels was established after trial and error and by limiting the
sary for traditional feature-based classifiers, such as artificial neural search at the maximum value supported by our hardware (details are
networks (ANNs) or support vector machines (SVMs) [23–25]. Indeed, a provided in sec. 3.1).
DCNN can automatically extract some descriptors from an image, thus
avoiding developing specific image processing algorithms. On the other 2.1.1. Hospital 1 dataset
hand, the task of learning the complex patterns of masses requires a Anonymized DBT images of 100 patients at H1 site were acquired
large set of different training samples, as well as an architecture and a with a Giotto Class 40,000 system. Each image had a matrix size of 1072
regularization method. They might be less influenced by lesion-specific × 2356 pixel with 100 × 100 μm2 pixel dimension. The Giotto Class
features than feature-based methods, resulting in a better chance of 40,000 system uses an amorphous selenium (a-Se) flat panel digital
recognizing a mass in at least one of the DBT views and a significantly detector. The DBT cases were acquired in craniocaudal (CC) view using a
better breast imaging-based detection performance. total tomographic angular range of 30◦ with 2.7◦ increments and 11
In this paper, we developed a computerized detection system for the projections. The dataset consisted of 4692 slices with 137 masses
automatic classification of the presence/absence of mass lesions in the confirmed after biopsy on all patients. Of these, 104 masses were ma
dataset of slices of DBT annotated exams, aiming at improving over lignant and 33 were benign. Details are reported in Table 1. No data
existing deep learning-based techniques in terms of sensitivity and augmentation technique was applied to this dataset. As indicated in
specificity. We implemented an ad hoc DCNN architecture (termed DBT- Fig. 6, this dataset was used to train and validate the proposed DCNN
DCNN) performing a binary classification of individual slices belonging architecture. Seventy percent of these data were used for network
to the same DBT exam dataset. We then compared the DBT-DCNN per training, while the remaining part was used for test (see Table 1).
formance to MATLAB implementations [36,37] of popular architectures
(AlexNet and VGG19, respectively) available in this field. The classifi 2.1.2. Hospital 2 dataset
cation performance of the three neural networks was comparatively The dataset from H2 site is composed of DBT images of 9 patients
assessed on different datasets provided by two hospitals on two DBT
clinical systems from two manufacturers. This process allowed to eval
uate the robustness of the architecture versus the influence of different 6
Azienda Ospedaliera Cardarelli, Napoli, Italy (Hospital 1) and Azienda
hardware instrumentations or acquisition protocols. Additionally, we Ospedaliera Universitaria San Giovanni di Dio Ruggi d’Aragona”, Salerno, Italy
implemented a technique (Grad-CAM) to highlight those pixels in all (Hospital 2)
185
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
Fig. 1. Comparison between two tomosynthesis images belonging to the same case a) before and b) after preprocessing for feeding the deep CNN. In the post-
processed image (b) we can see a reduction of quantum noise and the accentuation of the edges of the structures inside the breast.
186
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
Fig. 2. TL-AlexNet architecture is composed of 24 levels: 1 level input, 5 convolutional levels, 3 fully connected classification levels and finally 2 dropout levels
immediately followed by an output level. The convolution kernel size (3 × 3, 5 × 5 or 11 × 11) is indicated at each step.
of AlexNet we adopted a transfer learning technique to train the VGG19 convolutional layer. The feature maps in one layer are generated by
network (termed in the following as TL-VGG19). convolving the kernels with the feature maps in the previous layers and
combining them with weights. Each feature maps is a connector to an
activation function. In this work, we developed a DBT-DCNN and
2.3. DCNN architecture compared its performance with those of the two benchmark DCNN ar
chitectures described above.
DGNNs are a type of artificial neural methods composed of con
volutional layers and fully connected layers within a deep architecture.
During training, a DCNN learns patterns through the kernels in each
Fig. 4. Deep Convolution Neural Network architecture developed for this work. It is made up of 24 convolution levels: 1 level input, 5 convolutional levels, 2 fully
connected classification levels and finally 1 softmax level immediately followed by an output level.
187
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
2.3.1. DBT-DCNN architecture functioning of each part) was realized by means of the Matlab function
The proposed DBT-DCNN architecture was developed from scratch. “AnalyzeNetwork”. For training, we used the “ADAM” algorithm
The images were accessed in raw format (dicom for processing) to avoid (adaptive moment estimation) as a solver [39], with an initial learning
dependence on manufacture’s processing methods; images from the rate of 0.0004, a gradient decay factor of 0.9 and a squared gradient
same case were assigned to the same subset to keep the two subsets decay factor of 0.999. Out of 4692 slices of the H1 dataset (Table 1), 70%
independent of each other. (3286) was used to train DCNN and the remaining 30% (1406 slices) for
The DCNN network was developed using “Matlab 2020b” develop the testing phase. The network has been trained for 15 eras character
ment code and the Matlab “Machine and Deep Leaning Matlab Toolbox”. ized by 29 interactions each, for a total of 435 maximum iterations.
The network architecture is shown in Fig. 4. It consisted of 24 levels: 1 To follow the progress of training and avoid network overfitting, we
level input, 5 convolutional levels, 2 fully connected classification levels monitored a “live” validation point during training, at each successive
and finally 1 softmax level immediately followed by an output level. All 20 iterations. Fig. 5 shows the training and loss curves for the DBT-
the convolutional layers used rectified linear units as an activation DCNN network. Both trends follow the expected behaviour. In partic
function given by f(x) = max(0, x): to archive rotational and translation ular, the training curve has a globally increasing trend as the number of
invariance to the input patterns, the feature map was sub-sampled epochs increases, while the loss curve follows globally a descending
through max-pooling. trend. The duration of the training task, running in parallel mode on two
We analyzed the operation and the characteristics of each layer NVIDIA GPU TitanX cards, was about 130 min. Finally, the network was
starting from the input one. In this level, the network loads all pre- tested on the remaining 1406 images of the dataset stored for the test
processed images present in the database with the relative labels and statistical analysis of the quality of network performance, by
(“sick” or “healthy”), which act as the basic truth at the time of vali building the confusion matrix.
dation. Then, the network starts a sequential image analysis, so initi
ating the network training. Initially, each pixel of the image is associated 2.4. Network performance evaluation
with a neuron whose bias values and relative weights take a random
value. This layer is then followed by the 5 convolution layers with 96, The effectiveness of the DCNN classification was calculated
128, 384, 192 and 128 filters with convolution kernels of size 11 × 11 regarding correct/incorrect classification using performance metrics,
and 5 × 5 pixels for the first two layers, and 3 × 3 pixels for the last three with usual definitions:
layers. The convolution step for each level is equal to 1. All the convo
lution kernels have been summed, so preserving the input and output Sensitivity =
TP
dimensions of the image. Following each convolutional level, for each TP + FN
filter applied to the image the ReLu layers (co-aided by normalization
TN
levels) generate the maps of the characteristics of the network by con Specificity =
TN + FP
necting neurons together and updating the weights and biases at each
iteration. In the second part of the network, the images are input to the TP
two fully connected layers which, after linearization, classify them in the Precision =
TP + FP
two possible classes. After each classification, the probability of
belonging to that class is evaluated using the cross-entropy function TP + TN
Accuracy =
within the softmax layer, which provides an output value of probability TP + FP + FN + TN
between 0 and 1 and finally saves it within that layer. The last layer is
the output layer for displaying the result of the operations. where TP, FN, TN and FP are the number of true positives, false nega
The analysis of the network structure (performed to check the correct tives, true negatives and false positives, respectively, obtained from the
automatic Matlab evaluation procedure in which the probability
Fig. 5. Scheme of training curves (a, in blue) and loss (b, in orange) with the corresponding validation curves (dashed in black) implemented for the DBT-DCNN
network. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
188
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
threshold was selected to optimize sensitivity and specificity. Further The evaluation was repeated five times and the results are reported in
more, we provided the integral of the area subtended by the ROC curve Tables 2 and 3. The best values were obtained by DBT-DCNN on the H1
(AUC) to measure the degree of separability between classes determined dataset. This architecture showed an accuracy and a sensitivity of (90%
by the classifier. The higher the AUC value, the better the model per ± 4%; 96% ± 3%), respectively (Table 2, third row), compared to (84%
forms in distinguishing slices with and without the disease. ± 1%; 99% ± 1%) (Table 2, first row) and (74% ± 1%; 88% ± 1%)
(Table 2, second row) provided by TL-AlexNet and TL-VGG19, respec
2.5. Grad-CAM techniques tively. However, these values were determined using grayscale images
instead of RGB images (on which the TL-AlexNet has been originally pre-
Grad-CAM is a generalization of the Class Activation Mapping (CAM) trained). On the other hand, the TL-VGG19 network was unaffected by
techniques, used to assess graphically the network functioning. this bias: though it is very similar to the TL-AlexNet network, it provides
Conceived by Selvaraju and co-authors [36], Grad-CAM uses the lower values than both the TL-AlexNet and the network specifically
gradient of the classification score with respect to the final convolutional developed for this work. To validate the results of DBT-DCCN network
feature map to show which parts of the image are most important for we performed a k-fold cross validation (with k = 4): a F1-score test
classification [40]. The final purpose of this type of algorithm is to provided a value as good as 0.93 ± 0.03 (on a 0–1 scale). We note the
display the sensitivity/saliency maps of the gradients formed by heat significant lower number of FP obtained with the DBT-DCNN network
maps superimposed on the input image. This technique allows to visu (FP = 108 ± 28) with respect to TL-AlexNet (FP = 206 ± 8) and TL-
alize on a color scale the most relevant pixels for the final classification. VGG19 (FP = 235 ± 11). For a fixed specificity of 80%, the corre
Here we calculated these maps and compared them with the mass po sponding sensitivity for the DBT-DCNN, TL-AlexNet and TL-VGG19 were
sition marked by the radiologist with a 2D bounding box for each planar 85% ± 3%, 67%±1% and 58%±2%, respectively. A similar trend was
DBT slice (ground truth), to evaluate the possibility of using this algo observed with the second dataset (Table 3), where we obtained (89% ±
rithm in our classification task to assist the radiologist in the mass 1%, 81% ± 5%), (81% ± 3% and 72% ± 3%), (78% ± 3%, 68 ± 8%) for
localization process. the (accuracy, sensitivity) of DBT-DCNN, TL-AlexNet and TL-VGG19,
respectively.
3. Results
3.3. DCNN performance evaluation: AUC
3.1. Effect of input image size
The ROC curves for all the investigated networks are shown in Fig. 8.
TL-AlexNet and TL-VGG19 adopted, respectively, an input image size For each curve is indicated the confidence interval obtained by
of 227 × 227 × 3 pixel and 224 × 224 × 3 pixel. When loading the image repeating the measurement 5 times. The graph was derived from test
dataset for processing by the DBT-DCNN, each input image of the data of the H1 database. Each image was classified by the trained
network (1072 × 2356 pixel for H1 dataset and 1996 × 2457 pixel for network and the predicted labels was compared with the real label
H2 dataset) was initially binned to 100 × 100 × 1, to 200 × 200 × 1 or to defined as ground truth. The algorithm then calculates the sensitivity as
300 × 300 × 1 pixel image. This last value represents the maximum True Positive Rate (TPR) and the specificity from which it then derives
image size which we were able to test given the computing power the false positive rate (FPR) as 1-Specificity. The corresponding AUC
available for our hardware. Fig. 6 shows the values of TP, TN, FP, FN, value was reported in Table 1. The higher value was as high as 0.89 for
accuracy and sensitivity of the DBT-DCNN network evaluated on H1 the DBT-DCNN network.
validation dataset as a function of the dimension of the input image,
showing the improvement of all the values with the increase of the 3.4. Gradient map
image dimension. A value of 300 × 300 × 1 pixel was then considered
adequate for reaching high values of accuracy and sensitivity in our Fig. 9a shows a DBT slice in which the DBT-DCNN indicated the
DBT-DCNN network, so that all subsequent processing steps were car presence of a mass. A Grad-CAM algorithm was applied to this image
ried out with 300 × 300 × 1 pixel images input to the DBT-DCNN. (Fig. 9b), highlighting (in red/yellow colors) the regions of input pixels
that cause greater activation of the network. The two mass lesions in
3.2. DCNN architecture evaluation Fig. 9a were marked by a radiologist (of Hospital 1) with a red circle. As
we can observe in Fig. 9b, the network correctly identified both lesions
Following the scheme reported in Fig. 7, we trained the three pre hence classifying the image as pathological, perfectly identifying the
sented DCNN architectures (DBT-DCNN, TL-AlexNet and TL-VGG19) retro-areolar lesion also following its local branches, typical of this type
using the two datasets (H1 and H2) to assess which architecture pro of carcinoma. Namely, pixels corresponding to the central region of the
duced the best performance in terms of the selected evaluation metrics. mass had a greater impact in the classification of the image as
Fig. 6. a) Number of TP, TN, FP, and FN occurrences, and b) accuracy and sensitivity of the DBT-DCNN network as a function of the linear size of the square
input image.
189
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
Fig. 7. Scheme of the training, validation and test processes implemented for all the implemented DCNN architectures using the two different datasets.
Table 2
Evaluation in terms of classification absolute numbers, accuracy, sensitivity, specificity, precision and AUC, of the DCNN architectures using dataset H1.
H1 dataset (N = 1406 slices)
TP (#) TN (#) FP (#) FN (#) Accuracy (%) Sensitivity (%) Specificity (%) Precision (%) AUC
Table 3
Evaluation in terms of classification absolute numbers, accuracy, sensitivity, specificity and precision of the DCNN architectures using dataset H2.
H2 dataset (N = 73 slices)
TP (#) TN (#) FP (#) FN (#) Accuracy (%) Sensitivity (%) Specificity (%) Precision (%)
TL-AlexNet 28 ± 13 31 ± 14 3±3 11 ± 5 81 ± 3 72 ± 3 92 ± 1 91 ± 9
TL-VGG19 25 ± 11 32 ± 14 4±4 12 ± 5 78 ± 3 68 ± 8 88 ± 11 87 ± 12
DBT-DCNN 32 ± 16 33 ± 16 1±2 7±4 89 ± 1 81 ± 5 94 ± 5 96 ± 6
4. Discussion
190
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
Table 4
Characteristics and performance of some DBT mass detection CAD systems reported in the literature from 2004 to 2019. The different classifiers are applied to different
input: to exam DBT (3D), to a single slice from DBT (slice), to 2D ROI and 3D VOI. For this work the performance values reported refer to the H1 dataset. Values
highlighted in bold represent best performance in this comparison.
Performance
Ref. Year Classifier Training method # Patients Input type AUC Sensitivity (%) Accuracy (%)
Feature-based classifiers
18 2005 Feature extraction 3D 0.91 85
22 2006 LDA 36 slice 90
21 2008 Mutual information 100 ROI 85
26 2008 Feature extraction 96 slice 88
19 2008 LDA 100 slice + 3D 80
20 2010 LDA 99 slice 0.93
23 2013 ANN 192 slice 80
25 2014 SVM 101 3D 90
24 2016 SVM 160 VOI 0.847
33 2019 SVM 24 ROI 0.798 83.87 72.54
RF 0.757 80.65 70.59
Naive Bayes 0.648 64.52 60.78
Multi-layer perceptron 0.754 77.42 70.59
LDA = Linear Discriminant Analysis, SVM = Support Vector Machine, RF = Random Forest, ANN = Artificial Neuronal Network.
lesion is located, working on ROIs or VOIs. We decided to implement an networks were compared on two different datasets from two hospitals
image-level study of the identification of the presence or absence of a with different acquisition devices. The DBT-DCNN shows favorable re
mass in a single slice. This classification is intended to help the DBT sults for both datasets here tested with respect to benchmark architec
interpretation task by radiologists, at the screening level, when deter tures, though the H2 dataset is small. The accuracy and AUC values
mining if a patient needs a further examination. The development of a obtained by our DBT-DCNN classifier on H1 dataset (100 DBT clinical
classification network for microcalcification detection will be the goal of exams, for a total of 4692 DBT slices) were comparable to the ones re
a future work. ported in the literature (see Table 4). The goodness of the test was also
In this work, we compared the performance of three different DCNN confirmed by the Fscore (or F1 score) value, which was 0.93 ± 0.03.
networks, one developed ad-hoc (DBT-DCNN) and the other two, Alex As shown in Table 4, all the reported classifiers featured high AUC
Net and VGG19, belonging to the class of networks that are commonly values from 0.7 to 0.93, with an accuracy ranging from 69% to 93%.
implemented for the classification of common scene images. The three Correspondingly, the AUC value and accuracy of the DBT-DCNN
191
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
network evaluated on the H1 dataset were 0.91 and 94%, respectively, References
thus indicating a very high performance. Specifically, for the architec
tures working at image-level, we find that the AUC value obtained by [1] Marcom PK. Genomic and Precision Medicine: Primary Care, 3rd Edition; 2017. p.
181-94.
our DBT-DCNN network compares favorably with the value (0.93) re [2] Lehman CD, Arao RF, Sprague BL, et al. National performance benchmarks for
ported by Chan et al. [20] related to a feature-based network, at a modern screening digital mammography: update from the breast cancer
comparable level of patient cases (99 vs. 100). At the same time, as surveillance consortium. Radiology 2017;283(1):49–58.
[3] Sechopoulos I. A review of breast tomosynthesis. Part I. The image acquisition
regards classification accuracy, the value obtained by DBT-DCNN (94%) process. Med Phys 2013;40(1):014301.
is the highest reported in the comparison offered by Table 4, and only [4] Sechopoulos I. A review of breast tomosynthesis. Part II. Image reconstruction,
the network reported in [34] (based on the use of a VGG19 network) processing and analysis, and advanced applications. Med Phys 2013;40(1):014302.
[5] Maldera A, De Marco P, Colombo PE, Origgi D, Torresin A. Digital breast
reports a comparable value as high as 93%. An increase of the numer tomosynthesis: dose and image quality assessment. Phys Med 2016;33:56–67.
osity of the dataset including additional DBT data from a third hospital [6] Sarno A, Mettivier G, Russo P. Dedicated breast computed tomography: basic
site (foreseen before year 2022) might contribute to increase the aspects. Med Phys 2015;42(6):2786–804. https://doi.org/10.1118/1.4919441.
[7] Sage J, Fezzani KL, Fitton I, Hadid L, Moussier A, Pierrat N, et al. Experimental
robustness of the above predicted performance of DBT-DCNN.
evaluation of seven quality control phantoms for digital breast tomosynthesis. Phys
As an original addition to the classification task evaluation, we also Med 2019;57:137–44.
reported the use of the Grad-CAM algorithm on DBT images. For each [8] Hadjipanteli A, Elangovan P, Mackenzie A, Wells K, Dance KD, Young KC. The
slice, this technique produces the saliency map of the neural activation threshold detectable mass diameter for 2D-mammography and digital breast
tomosynthesis. Phys Med 2019;57:25–32.
gradients, showing values of maximum intensity in the image pixels [9] Petrov D, Marshall NW, Young KC, Bosmans H. Systematic approach to a
related to the position of the breast lesion. This approach might provide channelized Hotelling model observer implementation for a physical phantom
a possible indication to the DBT observer to help in the lesion diagnosis. containing mass-like lesions: Application to digital breast tomosynthesis. Phys Med
2019;58:8–20.
As a matter of example, in Fig. 8 we showed the saliency map as [10] Agasthya G, Rodriguez-Ruiz A, Sechopoulos I. Digital Breast Tomosynthesis. In:
generated by the Grad-CAM algorithm, suggesting that this technique Russo P, editor. Handbook of X-ray imaging physics and technology. CRC Press;
could correctly identify the position of the two infiltrating ductal 2018.
[11] Astley S, et al. A comparison of image interpretation times in full field digital
carcinomas. mammography and digital breast tomosynthesis. Proc. SPIE 2013;8673;S-1–S-8.
[12] Bernardi D, Ciatto S, Pellegrini M, Anesi V, Burlon S, Cauli E, et al. Application of
5. Conclusions breast tomosynthesis in screening: Incremental effect on mammography
acquisition and reading time. Br J Radiol 2014;85:e1174–8.
[13] Wallis MG, Moa E, Zanca F, Leifland K, Danielsson M. Two view and single-view
In this work, we showed that the proposed DBT-DCNN architecture tomosynthesis versus full-field digital mammography: High-resolution x-ray
outer forms state-of-the-art techniques on two benchmarking clinical imaging observer study. Radiology 2012;262:788–96.
[14] Shen L, Margolies LR, Rothstein JH, Fluder E, McBride R. Deep learning to improve
DBT image datasets provided by two different Hospitals. Major result
breast cancer detection on screening mammography. Sci Rep 2019;9:12495.
arising from this comparison is represented by increased sensitivity and [15] Yu X, Pang W. Mammographic image classification with deep fusion learning. Sci
a significant reduction of FPs, that is an important parameter in that the Rep 2010;10:14361.
reduction of wrong evaluations can in turn reduce both patients psy [16] Wu N, Phang J, Park J, Shen Y, Huang Z, et al. Deep neural networks improve
radiologists’ performance in breast cancer screening. IEEE Trans Med Imaging
chological stress and the work stress of the doctors by avoiding the 2020;39(4):1184–94.
carrying out of further investigations. [17] Lehman CD, Wellman RD, Buist DS, et al. Diagnostic accuracy of digital screening
This work also demonstrated that the performance of the proposed mammography with and without Computer-Aided Detection. JAMA Intern Med
2015;175(11):1828–37.
DBT-DCNN network in terms of AUC and accuracy are comparable with [18] Chan HP, Wei J, Sahiner B, Rafferty EA, Wu T, Roubidoux MA, et al. Computer-
those of other DCNN networks in the literature, albeit limited to a aided detection system for breast masses on digital tomosynthesis mammograms:
smaller set of data due to the recent introduction of the exam diagnostic preliminary experience. Radiology 2005;237(3):1075–80.
[19] Chan HP, Wei J, Zhang Y, Helvie MA, Moore RH, Sahiner B, et al. Computer-aided
in screening programs. detection of masses in digital tomosynthesis mammography: comparison of three
An ongoing extension of this work will aim at evaluating the network approaches. Med Phys 2008;35:4087–95.
performance on a larger dataset of annotated images (a few hundred [20] Chan HP, et al. Characterization of masses in digital breast tomosynthesis:
Comparison of machine learning in projection views and reconstructed slices. Med
DBT exams). Additionally, we are also considering applying multiclass Phys 2010;37:3576–86.
features extraction operation of breast-specific diagnostic features such [21] Singh S, Tourassi GD, Baker JA, Samei E, Lo JY. Automated breast mass detection
as tumor density and geometry. As an ongoing work, we are developing in 3D reconstructed tomosynthesis volumes: a featureless approach. Med Phys
2008;35:3626–36.
an algorithm - applied after DBT-DCNN evaluation of each slice
[22] Reiser I, Nishikawa RM, Giger ML, Wu T, Rafferty EA, Moore R, et al.
belonging to a single DBT exam - which uses the z-position of the slice in Computerized mass detection for digital breast tomosynthesis directly from the
the DBT exam for taking into account the 3D spatial information of the projection images. Med Phys 2006;33:482–91.
exam. This algorithm will evaluate if subsequent slices in the same DBT [23] van Schie G, Wallis MG, Leifland K, Danielsson M, Karssemeijer N. Mass detection
in reconstructed digital breast tomosynthesis volumes with a computer-aided
exam are classified as positive, indicating a section of maximum prob detection system trained on 2D mammograms. Med Phys 2013;40(4):041902.
ability searching for a mammary lesion. [24] Kim DH, et al. Latent feature representation with 3-D multi-view deep
Finally, from our first observation of the Grad-CAM algorithm out convolutional neural network for bilateral analysis in digital breast tomosynthesis.
In: IEEE International Conference on Acoustics, Speech and Signal Processing.
comes, we also conclude that the use of this technique could represent a ICASSP; 2016. p. 927–31.
valuable solution to select regions of interest of the slides to be adopted [25] Palma G, et al. Detection of masses and architectural distortions in digital breast
for training the DCNN, with the aim enhancing the mass localization tomosynthesis images using fuzzy and a contrario approaches. Pattern Recogn
2014;47(7):2467–80.
process. Hence, subsequent extension of this work will consist in the [26] Park SC, Zheng B, Wang XH, Gur D. Applying a 2D based CAD scheme for detecting
localization task of masses and microcalcifications within the tomo micro-calcification clusters using digital breast tomosynthesis images: An
synthesis images supported by Grad-CAM techniques. assessment. In: Proc. SPIE 6915, Medical Imaging 2008: Computer Aid Diagnosis,
691507.
[27] Bernard S, Muller S, Onativia J. Computer-aided microcalcification detection on
Acknowledgements digital breast tomosynthesis data: A preliminary evaluation. Digital
Mammography. Springer; 2008. p. 151–7.
[28] Kim DH, et al. Latent feature representation with depth directional long-term
This work was funded in part by INFN (Istituto Nazionale di Fisica
recurrent learning for breast masses in digital breast tomosynthesis. Phys Med Biol
Nucleare, Italy). 2017;62(3):1009–31.
[29] Samala RK, et al. Mass detection in digital breast tomosynthesis: deep
convolutional neural network with transfer learning from mammography. Med
Phys 2016;43(12):6654–66.
[30] Samala RK, Chan HP, Hadjiiski L, Helvie MA, Richter CD, Cha KH. Breast cancer
diagnosis in Digital Breast Tomosynthesis: effects of training sample size on multi-
192
R. Ricciardi et al. Physica Medica 83 (2021) 184–193
stage transfer learning using deep neural nets. IEEE Trans Med Imaging 2019;38 [36] Website: https://it.mathworks.com/help/deeplearning/ref/alexnet.html. Accessed
(3):686–96. on 01/20/2021.
[31] Fotin SV, Yin Y, Haldankar H, Hoffmeister JW, Periaswamy S. Detection of soft [37] Website: https://it.mathworks.com/help/deeplearning/ref/vgg19.html. Accessed
tissue densities from digital breast tomosynthesis: comparison of conventional and on 01/20/2021.
deep learning approaches. In: Proceedings of the SPIE 9785, Medical Imaging; [38] Michell MJ, Batohi B. Role of tomosynthesis in breast imaging going forward. Clin.
2016, 97850X. Radiology 2018;73:358–71.
[32] Yousefi M, Krzyzak A, Suen CY. Mass detection in digital breast tomosynthesis data [39] Kingma DP, BaAdam JL. A Method for Stochastic Optimization. International
using convolutional neural networks and multiple instance learning. Comput Biol Conference on Learning Representations (ICLR) 2015.
Med 2018;96:283–93. [40] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Grad-CAM Batra D. Visual
[33] Sakai A, et al. A method for the automated classification of benign and malignant explanations from deep networks via gradient-based localization. IEEE
masses on digital breast tomosynthesis images using machine learning and International Conference on Computer Vision (ICCV), Venice 2017:618–26.
radiomic features. Radiol Phys Technol 2020;13:27–36. https://doi.org/10.1109/ICCV.2017.74.
[34] Bevilacqua V, Brunetti A, Guerriero A, Trotta GF, Telegrafo M, Moschetta M. [41] Sechopoulos I, Teuwen J, Mann R. Artificial intelligence for breast cancer detection
A performance comparison between shallow and deeper neural networks in mammography and digital breast tomosynthesis: State of the art. Semin Cancer
supervised classification of tomosynthesis breast lesions images. Cogn Syst Res Bio 2020. https://doi.org/10.1016/j.semcancer.2020.06.002.
2019;53:3–19. [42] Skaane P, Bandos AI, Gullien R, et al. Comparison of digital mammography alone
[35] Geras KJ, Mann RM, Moy L. Artificial intelligence for mammography and digital and digital mammography plus tomosynthesis in a population-based screening
breast tomosynthesis: current concepts and future perspectives. Radiology 2019; program. Radiology 2013;267(1):47–56.
293:246–59.
193