Deep Learning for Chest X-Ray Diagnosis
Deep Learning for Chest X-Ray Diagnosis
[Link]
Abstract
Chest radiography is a widely used diagnostic imaging procedure in medical practice, which involves prompt reporting of
future imaging tests and diagnosis of diseases in the images. In this study, a critical phase in the radiology workflow is auto‑
mated using the three convolutional neural network (CNN) models, viz. DenseNet121, ResNet50, and EfficientNetB1 for
fast and accurate detection of 14 class labels of thoracic pathology diseases based on chest radiography. These models were
evaluated on an AUC score for normal versus abnormal chest radiographs using 112120 chest X–ray14 datasets containing
various class labels of thoracic pathology diseases to predict the probability of individual diseases and warn clinicians of
potential suspicious findings. With DenseNet121, the AUROC scores for hernia and emphysema were predicted as 0.9450
and 0.9120, respectively. Compared to the score values obtained for each class on the dataset, the DenseNet121 outperformed
the other two models. This article also aims to develop an automated server to capture fourteen thoracic pathology disease
results using a tensor processing unit (TPU). The results of this study demonstrate that our dataset can be used to train models
with high diagnostic accuracy for predicting the likelihood of 14 different diseases in abnormal chest radiographs, enabling
accurate and efficient discrimination between different types of chest radiographs. This has the potential to bring benefits to
various stakeholders and improve patient care.
2
Department of Mathematics, École Centrale School
* Mukesh Mann
of Engineering, Mahindra University, Hyderabad 500043,
[Link]@[Link]
India
* Rakesh P. Badoni 3
Department of Information Technology, Indian Institute
rakeshbadoni@[Link]
of Information Technology, Sonepat, Haryana 131029, India
* Dong‑Qing Wei 4
Department of Biology, Faculty of Science, King Khalid
dqwei@[Link]
University, Abha, Saudi Arabia
Harsh Soni 5
State Key Laboratory of Microbial Metabolism, and School
harsh.11912082@[Link]
of Life Sciences and Biotechnology, Shanghai Jiao Tong
Mohammed Al‑Shehri University, 200030 Shanghai, China
mshehri@[Link] 6
School of Biomedical Informatics, University of Texas
Aman Chandra Kaushik Health Science Centre at Houston, Houston, TX, USA
amanbioinfo@[Link]
1
Department of Computer Science and Engineering,
Indian Institute of Information Technology, Sonepat,
Haryana 131029, India
13
Vol.:(0123456789)
Interdisciplinary Sciences: Computational Life Sciences
Graphical Abstract
General representation of a convolutional neural network Sample chest X-ray image for all classes in chest X-ray14
Keywords Chest X-ray diagnosis · Chest X-ray14 · Deep ConvNet diagnosis · TPU training
13
Interdisciplinary Sciences: Computational Life Sciences
interpret chest X-rays with the same efficiency as a prac‑ approaches used for COVID-19 diagnosis at both the image
ticing radiologist [3, 4]. Over the last few years [5–9], the and region levels. Khan et al. [33] introduced a new method
diagnosis of chest radiology images has received increased called deep hybrid learning (DHL) and deep boosted hybrid
attention, and several algorithms have been developed for learning (DBHL) for accurately detecting COVID-19 in
pulmonary tuberculosis classification [10–13] and pneumo‑ chest X-ray images. The DBHL technique involves using
nia detection [3, 14, 15]. During the pandemic, deep learning data augmentation, transfer learning-based fine-tuning,
has also found its usage in COVID-19 detection [16–21]. deep features boosting, and hybrid learning to improve
Our proposed study aims to identify multiple pathology the performance of the COVID-RENets models (COVID-
diseases by re-implementing the CheXNet model and con‑ RENets-1 and COVID-RENets-2). In their experiments,
structing several additional models with nearly identical the DBHL framework outperformed other well-established
hyperparameters. These models will be compared side-by- CNN models.
side. Additionally, we aim to train these models on a Ten‑ Stirenko et al. [11] conducted a study on the use of deep
sor Processing Unit (TPU) to reduce training time. Deep learning-based computer-aided diagnosis (CADx) to predict
learning has made significant advancements in the field of the presence of tuberculosis by statistically analyzing 2D
medicine due to the availability of vast datasets, enabling chest X-ray images. They demonstrated the effectiveness of
the development of models that surpass the performance of deep CNN for CADx of tuberculosis, particularly through
medical professionals. For instance, pneumonia detection [3, techniques like lung segmentation and data augmentation,
14, 15], skin cancer classification [22–24], and lung cancer both lossless and lossy, on a small and unbalanced dataset.
screening [25–27] have all benefited from deep learning. Rahman et al. [14] presented a study that aimed to develop
CheXNet [3], an algorithm that can detect pneumonia from an automated method for identifying bacterial and viral
chest X-rays, performs better than practicing radiologists. pneumonia by analyzing digital X-ray images. They gave
CNN models, such as CheXNeXt [4], can identify various a comprehensive overview of current approaches to detect‑
pathologies diseases with a performance similar to that of ing pneumonia and described the specific techniques used
practicing board-certified radiologists using frontal-view in their research. Alakus and Turkoglu [17] created clinical
chest X-rays. predictive models using deep learning and laboratory data
In recent years, various life-threatening diseases have to forecast which patients were likely to contract COVID-
been detected and diagnosed using deep learning tech‑ 19. They evaluated the models’ effectiveness using various
niques by a number of researchers [28–32]. Baltruschat et al. performance metrics like precision, F1-score, recall, AUC,
[28] conducted a study comparing multiple deep-learning and accuracy with data from 600 patients and 18 labora‑
approaches to classify chest X-Ray images with multiple tory findings and validated them through ten-fold cross-
labels. They analyzed various methods for using CNNs to validation and train-test split methods. The results showed
classify X-ray images from the Chest X-ray14 dataset and that the predictive models accurately identified patients with
found that fine-tuned networks using the ImageNet dataset COVID-19. Dey et al. [15] designed a Deep-Learning Sys‑
produced satisfactory results. However, the most effective tem (DLS) for diagnosing lung abnormalities by using chest
model was specifically trained using only X-ray images and X-ray images. They tested the system with conventional and
incorporated non-image data. A systematic survey of deep filtered chest radiographs and conducted an initial evaluation
learning techniques for the analysis of COVID-19 and their using a SoftMax classifier. The outcomes indicated that the
usability for detecting Omicron has been provided by [32]. VGG19 method provided higher classification accuracy than
The COVID-19 pandemic has caused a shift towards uti‑ other methods.
lizing deep learning methods for analyzing and identifying Khan et al. [34] have proposed a new diagnostic system
infected areas in radiology images. These techniques can that employs deep CNNs to detect and analyze COVID-19
be divided into classification, segmentation, and multi-stage infections by identifying minor irregularities. This system
13
Interdisciplinary Sciences: Computational Life Sciences
Fig. 2 Sample chest X-ray image for all classes in the chest X-ray14
comprises two phases. In the first phase, a new CNN named First, many studies have used small and potentially biased
SB-STM-BRNet is utilized to identify COVID-19 infections datasets, which negatively impacts the generalizability and
in lung CT images. This is achieved using a Squeezed and accuracy of these models. Therefore, the need for more
Boosted (SB) channel and a Split-Transform-Merge (STM) extensive and diverse datasets is essential. Second, there
block with dilated convolutions. In the second phase, the has been a lack of research on the ability of deep learning
COVID-CB-RESeg CNN is employed to detect and analyze models to diagnose rare or less common diseases in chest
COVID-19-infected areas in the images. This CNN incor‑ X-rays accurately. Finally, previous studies have relied on
porates region-homogeneity and heterogeneity operations simple accuracy metrics, which is inadequate for evaluating
in each encoder-decoder block and auxiliary channels in the performance of these models. More robust evaluation
the boosted-decoder to learn about low illumination and the methods, such as sensitivity, specificity, and area under the
boundaries of the infected regions. The proposed diagnostic curve, are required to understand these models’ performance
system has shown promising results in identifying COVID- better.
19 infections. Additionally, Khan et al. [35] have introduced
a new CNN architecture called STM-RENet, which utilizes
a split-transform-merge approach to analyze X-ray images
and identify radiographic patterns associated with COVID- 2 Materials and Methods
19 infection. This block-based CNN includes a new con‑
volutional block named STM, which separately and jointly 2.1 Dataset
performs region and edge-based operations. By combining
these operations with convolutional techniques, STM-RENet A significant amount of research has been done using the
can analyze the homogeneity of regions, intensity inhomoge‑ Chest X-ray14 dataset [3, 7–9, 36–38]. The dataset has been
neity, and features that define boundaries. The authors have collected and made openly available by the National Institute
also presented an improved version of STM-RENet called of Health. It consists of 112120 frontal view chest X-ray
CB-STM-RENet, which utilizes channel boosting and learns images of 30805 unique patients. When loaded, these images
textural variations to enhance its performance. When evalu‑ are single-channel gray-scale images and need to be con‑
ated on three datasets, CB-STM-RENet demonstrated sig‑ verted to 3-channel images to allow our pre-trained model
nificantly superior results than conventional CNNs. to process them. Each image in the dataset is annotated with
A major limitation of previous research in the field of deep up to 14 different thoracic pathology labels. Figure 2 shows
CNNs for chest X-ray diagnosis is that many studies have a sample for each of these diseases from the dataset itself.
only examined their ability to perform binary classification Table 1 lists all the diseases in chest X-ray14.
tasks. These tasks involve detecting the presence or absence Wang et al. [39] used NLP to text-mine disease classi‑
of a specific disease. There is a need for more research on fications from the related radiological reports to label the
the ability of these models to simultaneously detect and images. The labels are expected to have an accuracy greater
classify multiple diseases or conditions in a single image. than 90% [40]. The dataset also consists of images labeled
Moreover, there are two additional issues in previous studies. as No Finding, which simply indicates that the NLP system
13
Interdisciplinary Sciences: Computational Life Sciences
1
[Link]
13
Interdisciplinary Sciences: Computational Life Sciences
Test Set
(5,389 images)
Model Development
DenseNet121
ResNet50
EfficientNetB1
Model Training
Hardware: TPU
Epochs: 100
Early Stopping
Evaluation
Metric: AUROC
Result
input: ? , 7, 7, 1024
global_average_pooling2d_8: GlobalAveragePooling2D
output: ? , 1024
input: ? , 1024
dense_20: Dense
output: ? , 14
input: ? , 14
Sigmoid_1: Activation
output: ? , 14
13
Interdisciplinary Sciences: Computational Life Sciences
Fig. 6 Model architecture on ImageNet for a DenseNet [46]; b ResNet [47]; and c EfficientNet-B0 baseline network [48]
13
Interdisciplinary Sciences: Computational Life Sciences
input: ? , 1024
3.3 ResNet50 ReLU_1: Activation
output: ? , 1024
The second model used as the backbone for the diagnosis input: ? , 1024
dropout_12: Dropout
task was the ResNet50 [47]. ResNets are an exciting class output: ? , 1024
of models and have served as the state-of-the-art model for
input: ? , 1024
various tasks. A deep neural network architecture tends to dense_17: Dense
output: ? , 512
give a more significant error as compared to a comparatively
shallow neural network. He et al. [47] overcame this problem input: ? , 512
ReLU_2: Activation
by introducing a deep residual learning framework and skip output: ? , 512
input: ? , 14
3.4 EfficientNetB1 Sigmoid_1: Activation
output: ? , 14
The third and last model used as the backbone for this task
was EfficientNetB1 [48]. EfficientNets are a class of effi‑ Fig. 7 Model architecture using ResNet50V2 as the backbone
ciently designed models to optimize the model’s perfor‑
mance while having a considerably low amount of train‑
able parameters. Tan and Le [48] came up with a better way Table 3 Training parameters for ResNet50V2 backbone
of scaling the network, which they call compound scaling,
Training parameter Value
in which they selected an efficient scaling for all − width,
depth, and image resolution. The baseline network architec‑ Input shape (224, 224)
ture of EfficientNetB0 is shown in Fig. 6(c). Output shape (14, )
To use transfer learning on the EfficientNetB1, the final Batch size 16 × 8 (16 batches per TPU core)
Dense layer of the pre-trained model was replaced by a Callbacks Model checkpoint
Dense layer with 14 units, and a sigmoid activation was Reduce LR on plateau
applied to it. The resultant model architecture is shown in Early stopping
Fig. 8. Also, the training parameters for the EfficientNetB1
backbone are presented in Table 4.
13
Interdisciplinary Sciences: Computational Life Sciences
14
input: ? , 224, 224, 3 ∑
input_20: InputLayer J= L(yc , ŷ c )
output: ? , 224, 224, 3
c=1
N
input: ? , 224, 224, 3 1 ∑
efficientnetb1: Functional
L(yc , ŷ c ) = [−wpc × yci × log(̂yci ) − wnc × (1 − yci )
output: ? , 7, 7, 1280
N i=1
× log(1 − ŷ ci )]
input: ? , 7, 7, 1280
global_average_pooling2d_9: GlobalAveragePooling2D
output: ? , 1280 Here, c refers to one of the classes of labels, i refers to the ith
example in the training set, y refers to true label, and ŷ refers
input: ? , 1280 to the predicted label or probability. Similarly, wpc and wnc
dense_21: Dense
output: ? , 14
are defined as follows.
Total negative examples in class c
input: ? , 14
wpc =
Sigmoid_1: Activation
Total examples
output: ? , 14 Total positive examples in class c
wnc =
Total examples
Fig. 8 Model architecture using EfficientNetB1 as a backbone
The model was trained on a TPU V3.8 on Kaggle for 100
epochs. The above-mentioned custom loss, binary accu‑
Table 4 Training parameters for EfficientNetB1 backbone racy, and AUROC score were monitored during training.
Training parameter Value
[49] optimizer was used for training. The learning rate was
reduced by a factor of 10 if no improvements in validation
Input shape (224, 224) loss were seen for two continuous epochs. Early stopping
Output shape (14, ) was used with the patience of 10 to prevent over-fitting of
Batch size 8 × 8 (8 batches per TPU core) the model and prevent wastage of computing time. The end-
Callbacks Model checkpoint to-end open-source deep learning framework Tensor Flow2
Reduce LR on plateau was used to train and evaluate the models.
Early stopping
2
[Link]
13
Interdisciplinary Sciences: Computational Life Sciences
Fig. 9 Proposed CXD server, where a Front view of the CXD where given sample using Deep Learning approaches; and d Depict previ‑
boxes are given for sequences and blue button representing submis‑ ous history along with confidence scores generated from embedded
sion option; b Uploading Sample Chest X-ray; c Predicted results of Deep learning approaches
True positive
TPR = Table 5 AUROC scores for different model variants
True positives false negatives
Pathology CheXNet DenseNet121 ResNet50 EfficientNetB1
The false positive rate, also given as 1-sensitivity, is the
fraction of total negative examples in the dataset the model Atelectasis 0.8094 0.8200 0.7630 0.7750
incorrectly predicted as positive, i.e., Cardiomegaly 0.9248 0.9120 0.7410 0.8840
Effusion 0.8638 0.8830 0.8600 0.8460
False positive Infiltration 0.7345 0.7290 0.6660 0.6450
FPR =
True positives false negatives Mass 0.8676 0.8210 0.7480 0.7610
Nodule 0.7802 0.7200 0.5720 0.6240
The AUROC scores for different classes for all three model Pneumonia 0.7680 0.7430 0.6940 0.7130
variants have been listed in Table 5, and the ROC curves Pneumothorax 0.8887 0.8670 0.7790 0.8100
have been illustrated in Fig. 10. The experimental results Consolidation 0.7901 0.8240 0.8030 0.7960
demonstrated that the model constructed using DenseNet121 Edema 0.8878 0.8900 0.8710 0.8380
outperformed the other two models. Therefore, our experi‑ Emphysema 0.9371 0.9120 0.7600 0.8410
mental investigations are in line with several studies [50–52] Fibrosis 0.8047 0.8080 0.6870 0.7310
previously published in the literature. Some possible reasons Pleural Thick‑ 0.8062 0.7470 0.7150 0.7120
for this superior performance could be: ening
Hernia 0.9164 0.9450 0.6970 0.8490
13
Interdisciplinary Sciences: Computational Life Sciences
Fig. 10 a AUROC curve for DenseNet121 backbone; b AUROC curve for ResNet50 backbone; and c AUROC curve for EfficientNetB1 back‑
bone
• The network architecture of DenseNet121 is more com‑ • DenseNet121 incorporates batch normalization and skip
plex, enabling the model to extract more features from connections to enhance convergence and performance.
the data and potentially improve its performance.
• The DenseNet121 model utilizes a more closely Further, we employed a pre-trained DenseNet121
connected pattern between its layers, which can aid in model and modified its fully connected layers as previ‑
decreasing the number of parameters within the model ously described. Subsequently, we assessed the model’s
and avoiding overfitting.
13
Interdisciplinary Sciences: Computational Life Sciences
performance without any further training and found that of the model will be at least that value. These intervals
the ROC values were approximately 0.5, indicating that help to understand the uncertainty surrounding a model’s
its predictions were akin to random guessing. Figure 11(a) performance and to compare the performance of different
illustrates the results graphically. Similarly, we froze the models. Based on the assumption of a normal distribution,
DenseNet121 model and only trained the global aver‑ we have provided tables below that show the minimum and
age pooling (GAP) and final softmax layers, using the maximum estimated prevalence of Atelectasis disease with
DenseNet121 model as the backbone, without modifying a 95% confidence level for ResNet50, EfficientNetB1, and
its parameters. The outcomes of this method are demon‑ DenseNet121 models. These tables, labeled as Tables 6, 7,
strated in Fig. 11(b). Essentially, we maintained the original 8, 9, 10,11, indicate the minimum estimated prevalence of
DenseNet121 model and only updated the final layers that Atelectasis disease with a 95% confidence level based on
were added to it. the assumption of a normal distribution for the respective
Furthermore, DenseNet121 was utilized as an unalter‑ models. In these tables, TPR represents the true positive
able feature extractor, and a more intricate fully connected rate, and FPR represents the false positive rate.
layer was trained on it. Only the fully connected layers were Next, we have thoroughly analyzed the lower and upper
modified during the training process, and the rest of the bounds of the precision-recall (PR) curves for the three
DenseNet121 architecture remained unchanged. Figure 12 models being considered. Nevertheless, it is important to
illustrates the new layers appended to the fully connected note that the ROC curve is also a suitable alternative for
network. The ROC curve achieved from this approach is evaluating a classifier’s performance, particularly for data‑
displayed in Fig. 13. sets that have imbalanced classes. This is because, unlike
We have computed the lower and upper confidence the PR curve, which only considers the true positive rate,
intervals for ResNet50, EfficientNetB1, and DenseNet121 the ROC curve considers both the true positive rate and
to further analyze these models. A confidence interval is the false positive rate. Therefore, we have included both
a range of values that are likely to include the true value the PR and ROC curves to provide a more comprehensive
of a population parameter. The lower and upper confidence evaluation of these models, which are depicted in Figs. 14,
intervals for these models indicate the potential range of 15, and 16.
performance when applied to a specific task or dataset. The aforementioned tables, given as Tables 6, 7,
For instance, if the lower confidence interval for a model’s 8, 9, 10,11, also display the F1 score, also known as
accuracy is 95%, there is a 95% chance that the true accuracy the F-measure or F-score. It is a metric that combines
Fig. 11 a ROC curve for DenseNet121 without training model; and b ROC curve for DenseNet121 model after training only the GAP and final
softmax layers
13
Interdisciplinary Sciences: Computational Life Sciences
precision and recall into a single score, commonly used year. India has over 10 million cases of pneumonia each
in classification tasks. The score is calculated as the year. Although chest X-rays are the most effective means of
harmonic mean of precision and recall. Precision is the diagnosing pneumonia [53], medical imaging has constraints
number of true positive predictions divided by the total in terms of access to expertise in some areas [54]. Addition‑
number of positive predictions, and recall is the number ally, chest radiographs may also be utilized to diagnose other
of true positive predictions divided by the total number illnesses.
of actual positive samples. The F1 score is valuable for In addition, even expert radiologists are limited by vari‑
evaluating the performance of classification models ous human factors [38, 55–58]. Therefore, the creation of
because it provides a balance between precision and recall detection systems could greatly benefit humanity. As a result
and allows for the comparison of models with different of the difficulty in training these three models on a CPU,
precision and recall values. this study considers using a TPU. The CXD server has an
improved interface that is more efficient and has been devel‑
oped using a large chest X-ray dataset up until January 2021.
6 Conclusions and Future Scopes This extensive and accurate data is being constantly utilized
to enhance our proposed CXD server, ensuring the quality
Pneumonia is a major cause of human fatalities worldwide. of our work.
According to the Centers for Disease Control and Preven‑ The objective of this study was to automate an essen‑
tion,3 over one million adults in the US are hospitalized due tial stage of the radiology process by utilizing three con‑
to pneumonia, and around 50, 000 die from the disease each volutional neural networks (CNNs), namely DenseNet121,
ResNet50, and EfficientNetB1, to precisely detect 14 types
of thoracic pathology diseases from chest radiography
3
[Link] images. A total of 112, 120 chest X-ray datasets containing
13
Interdisciplinary Sciences: Computational Life Sciences
Table 6 The minimum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.805903354 0.015975 0.895843 0.972882 0.805903 0.88118
confidence level based on
0.2 0.841285476 0.029508 0.895843 0.966523 0.841285 0.899306
the assumption of a normal
distribution for ResNet50 0.3 0.865006785 0.047695 0.895843 0.959846 0.865007 0.909793
0.4 0.882387525 0.068971 0.895843 0.954492 0.882388 0.916916
0.5 0.899576808 0.101926 0.895843 0.949523 0.899577 0.923817
0.6 0.918142515 0.163129 0.895843 0.942756 0.918143 0.93027
0.7 0.943963684 0.3045 0.895843 0.928266 0.943964 0.936041
0.8 0.972310777 0.600504 0.895843 0.912073 0.972311 0.941086
0.9 0.999218124 0.990938 0.895843 0.896111 0.999218 0.943991
various thoracic pathology diseases were utilized to evalu‑ clinicians to potentially abnormal findings. The results indi‑
ate the performance of these models based on their abil‑ cated that the DenseNet121 model outperformed the other
ity to predict the likelihood of individual diseases and alert two models in terms of the score values achieved for each
13
Interdisciplinary Sciences: Computational Life Sciences
Table 7 The maximum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.82011924 0.020878 0.906718 0.978497 0.820119 0.892725
confidence level based on
0.2 0.854381302 0.035998 0.906718 0.972778 0.854381 0.910013
the assumption of a normal
distribution for ResNet50 0.3 0.877223562 0.05577 0.906718 0.966704 0.877224 0.919967
0.4 0.893879821 0.078499 0.906718 0.961793 0.89388 0.926705
0.5 0.91027128 0.113223 0.906718 0.95721 0.910271 0.933211
0.6 0.927862328 0.176824 0.906718 0.950935 0.927862 0.939273
0.7 0.952058015 0.321407 0.906718 0.937392 0.952058 0.944676
0.8 0.97798659 0.618292 0.906718 0.922126 0.977987 0.949381
0.9 0.999957983 0.994081 0.906718 0.906973 0.999958 0.952083
Table 8 The minimum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.69087922 0.00361946 0.89584263 0.98295658 0.69087922 0.81089795
confidence level based on
0.2 0.82653939 0.02196173 0.89584263 0.97044309 0.82653939 0.89240883
the assumption of a normal
distribution for EfficientNetB1 0.3 0.88065577 0.06462379 0.89584263 0.95866499 0.88065577 0.91787436
0.4 0.91498338 0.14903047 0.89584263 0.94340845 0.91498338 0.92895791
0.5 0.93846433 0.27433506 0.89584263 0.93229964 0.93846433 0.9353707
0.6 0.95526986 0.44986595 0.89584263 0.92421408 0.95526986 0.93945243
0.7 0.97303584 0.65328083 0.89584263 0.91259259 0.97303584 0.94169824
0.8 0.98557233 0.84551181 0.89584263 0.90482787 0.98557233 0.94315702
0.9 0.99518978 0.97055598 0.89584263 0.89868163 0.99518978 0.94388506
Table 9 The maximum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.70759942 0.00616315 0.90671814 0.98736494 0.70759942 0.8249679
confidence level based on
0.2 0.84012727 0.02763145 0.90671814 0.97631232 0.84012727 0.90344688
the assumption of a normal
distribution for EfficientNetB1 0.3 0.8922237 0.07388037 0.90671814 0.96562332 0.8922237 0.92760923
0.4 0.92487865 0.16224773 0.90671814 0.95154207 0.92487865 0.93804209
0.5 0.94693812 0.29075071 0.90671814 0.94117529 0.94693812 0.94404906
0.6 0.96250887 0.46803446 0.90671814 0.93358409 0.96250887 0.94785958
0.7 0.97863467 0.67052869 0.90671814 0.92261785 0.97863467 0.94995146
0.8 0.98960864 0.85845931 0.90671814 0.9152598 0.98960864 0.95130836
0.9 0.99740433 0.97641372 0.90671814 0.90941992 0.99740433 0.95198497
Table 10 The minimum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.84306717 0.03027407 0.89584263 0.96648739 0.84306717 0.90031113
confidence level based on
0.2 0.87086545 0.05159917 0.89584263 0.96172846 0.87086545 0.91387505
the assumption of a normal
distribution for DenseNet121 0.3 0.88815469 0.07713721 0.89584263 0.95646389 0.88815469 0.92093865
0.4 0.90111604 0.10671467 0.89584263 0.95189192 0.90111604 0.92574521
0.5 0.91569139 0.15556943 0.89584263 0.94509502 0.91569139 0.93013811
0.6 0.9282404 0.21364821 0.89584263 0.93823539 0.9282404 0.93320833
0.7 0.93842546 0.28161416 0.89584263 0.93317418 0.93842546 0.93579162
0.8 0.94847708 0.37734796 0.89584263 0.92822659 0.94847708 0.93822927
0.9 0.96335002 0.56196969 0.89584263 0.9196265 0.96335002 0.94091021
13
Interdisciplinary Sciences: Computational Life Sciences
Table 11 The maximum Threshold TPR FPR Accuracy Precision Recall F1 score
estimated prevalence of
Atelectasis disease with a 95% 0.1 0.85610088 0.03684002 0.90671814 0.97274542 0.85610088 0.91096946
confidence level based on
0.2 0.88284634 0.05996685 0.90671814 0.96842214 0.88284634 0.92383087
the assumption of a normal
distribution for DenseNet121 0.3 0.89938912 0.08714851 0.90671814 0.96360561 0.89938912 0.93049949
0.4 0.91173454 0.11823421 0.90671814 0.95939841 0.91173454 0.93502422
0.5 0.92554771 0.16901291 0.90671814 0.95310858 0.92554771 0.93914936
0.6 0.93736856 0.22878152 0.90671814 0.94672453 0.93736856 0.94202619
0.7 0.94690186 0.29815644 0.90671814 0.94199421 0.94690186 0.94444251
0.8 0.9562432 0.39509976 0.90671814 0.9373556 0.9562432 0.94671882
0.9 0.96989882 0.58001505 0.90671814 0.92926239 0.96989882 0.94921784
1 1
Atelectasis Atelectasis
0.98 0.98
Cardiomegaly Cardiomegaly
0.96 Consolidation 0.96 Consolidation
Emphysema 0.92
Precision
0.92
Fibrosis Fibrosis
0.9
0.9 Hernia
Hernia
0.88
0.88 Infiltration
Infiltration
0.86 Mass
0.86 Mass
0.84 Nodule
0.84 Nodule
Pleural_Thickening
Pleural_Thickening 0.82
0.82 Pneumonia
Pneumonia 0.8
0.8 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 Pneumothorax
0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 Pneumothorax
Recall Effusion
Recall Effusion
(a) (b)
Fig. 14 PR curve for a lower bound, and b upper bound at 95% confidence interval for ResNet50 model
1 1
Atelectasis Atelectasis
0.98 Cardiomegaly 0.98 Cardiomegaly
Fibrosis
Precision
Fig. 15 PR curve for a lower bound, and b upper bound at 95% confidence interval for EfficientNetB1 model
1
1 Atelectasis
Atelectasis
0.98 Cardiomegaly
0.98 Cardiomegaly
0.96 Consolidation
0.96 Consolidation
Edema Edema
0.94 0.94
Emphysema Emphysema
0.92 0.92 Fibrosis
Fibrosis
Precision
Precision
Infiltration Mass
0.88 0.88
Mass Nodule
0.86 0.86 Pleural_Thickening
Nodule
0.84 Pleural_Thickening 0.84 Pneumonia
Pneumonia Pneumothorax
0.82 0.82
Pneumothorax Effusion
0.8 0.8
Hernia
Effusion
0.8 0.85 0.9 0.95 1 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1
Recall Recall
(a) (b)
Fig. 16 PR curve for a lower bound, and b upper bound at 95% confidence interval for DenseNet121 model
13
Interdisciplinary Sciences: Computational Life Sciences
class on the dataset. Furthermore, the performance of these learning. arXiv preprint arXiv:1711.05225. [Link]
models was compared to that of the ChexNet model. 48550/arXiv.1711.05225
4. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Duan T,
Our future plans involve expanding our CNN training by Ding D, Bagul A, Langlotz CP et al (2018) Deep learning for chest
incorporating extra data and assessing different architectures radiograph diagnosis: a retrospective comparison of the chexnext
for diagnosing other thoracic pathology diseases. We are algorithm to practicing radiologists. PLoS Med 15(11):1002686.
convinced that a computer-aided diagnostic tool of this kind [Link]
5. Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas
could enhance the effectiveness and precision of diagnosing K, Barfett J (2017) Training and validating a deep convolutional
thoracic pathology diseases, including pandemics like neural network for computer-aided detection and classification
COVID-19 and Swine Flu, significantly. This tool could prove of abnormalities on frontal chest radiographs. Investig Radiol
to be especially useful during a pandemic when the demand 52(5):281–287. [Link]
6. Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H
for prevention and treatment often surpasses the available (2015) Chest pathology detection using deep learning with non-
resources. medical training. In: 2015 IEEE 12th International Symposium on
Biomedical Imaging, pp. 294–297. [Link]
2015.7163871. IEEE
7. Guendel S, Grbic S, Georgescu B, Liu S, Maier A, Comaniciu D
Appendix (2018) Learning to recognize abnormalities in chest x-rays with
location-aware dense networks. In: Iberoamerican Congress on
The proposed server “CXD” is accessible at: [Link] Pattern Recognition, pp. 757–765. [Link]
google.com/file/d/1gKJNFfJc2FQoDo4lGz10wdTenbcAh 3-030-13469-3_88. Springer
8. Yan C, Yao J, Li R, Xu Z, Huang J (2018) Weakly supervised
C73/view?usp=sharing. deep learning for thoracic disease classification and localiza‑
NIH dataset (tfrecords) can be accessible at: [Link] tion on chest x-rays. In: Proceedings of the 2018 ACM Interna‑
www.kaggle.com/harshsoni/nih-chest-xray-tfrecords. tional Conference on Bioinformatics, Computational Biology,
Kaggle TPU documentations can be accessible at: and Health Informatics, pp. 103–110. [Link]
3233547.3233573
[Link] 9. Wang H, Xia Y (2018) Chestnet: a deep neural network for
The source code and sample data can be accessible at: classification of thoracic diseases on chest radiography. arXiv
[Link] preprint arXiv:1 807.0 3058 . [Link] oi.o rg/1 0.4 8550/a rXiv.
1807.03058
10. Lakhani P, Sundaram B (2017) Deep learning at chest radiog‑
Supplementary Information The online version contains supplemen‑ raphy: automated classification of pulmonary tuberculosis by
tary material available at [Link] oi.o rg/1 0.1 007/s 12539-023-00562-2. using convolutional neural networks. Radiology 284(2):574–582.
[Link]
Acknowledgements Dong-Qing Wei is funded by the National Science 11. Stirenko S, Kochura Y, Alienin O, Rokovyi O, Gordienko Y, Gang
Foundation of China (Grant numbers 32070662, 61832019, 32030063), P, Zeng W (2018) Chest x-ray analysis of tuberculosis by deep
the Science and Technology Commission of Shanghai Municipality learning with segmentation and augmentation. In: 2018 IEEE
(Grant number 19430750600), the SJTU JiRLMDS Joint Research 38th International Conference on Electronics and Nanotechnol‑
Fund, the Joint Research Funds for Medical and Engineering and Sci‑ ogy (ELNANO), pp. 422–428. [Link] oi.o rg/1 0.1 109/E LNANO.
entific Research at Shanghai Jiao Tong University (YG2021ZD02), and 2018.8477564. IEEE
the PCL Major Key Project (PCL2021A13). The computations were 12. Maduskar P, Muyoyeta M, Ayles H, Hogeweg L, Peters-Bax L,
partially performed at the Pengcheng Lab and the Center for High- vanGinneken B (2013) Detection of tuberculosis using digital
Performance Computing at Shanghai Jiao Tong University. chest radiography: automated reading vs. interpretation by clini‑
cal officers. Int J Tuberc Lung Dis 17(12):1613–1620. [Link]
org/10.5588/ijtld.13.0325
Declarations 13. Hwang S, Kim H-E, Jeong J, Kim H-J (2016) A novel approach
for tuberculosis screening based on deep convolutional neural net‑
Conflict of interest The corresponding author states, on behalf of all works. In: Medical Imaging 2016: Computer-aided Diagnosis, vol.
authors, that there is no conflict of interest. 9785, pp. 750–757. [Link] SPIE
14. Rahman T, Chowdhury ME, Khandakar A, Islam KR, Islam KF,
Mahbub ZB, Kadir MA, Kashem S (2020) Transfer learning with
deep convolutional neural network (CNN) for pneumonia detec‑
References tion using chest x-ray. Appl Sci 10(9):3233. [Link]
3390/app10093233
1. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classifi‑ 15. Dey N, Zhang Y-D, Rajinikanth V, Pugalenthi R, Raja NSM
cation with deep convolutional neural networks. Commun ACM (2021) Customized vgg19 architecture for pneumonia detection
60(6):84–90. [Link] in chest x-rays. Pattern Recognit Lett 143:67–74. [Link]
2. Raoof S, Feigin D, Sung A, Raoof S, Irugulpati L, Rosenow EC 10.1016/j.patrec.2020.12.010
III (2012) Interpretation of plain chest roentgenogram. Chest 16. Wang L, Lin ZQ, Wong A (2020) Covid-net: a tailored deep con‑
141(2):545–558. [Link] volutional neural network design for detection of covid-19 cases
3. Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding from chest x-ray images. Sci Rep 10(1):1–12. [Link]
D, Bagul A, Langlotz C, Shpanskaya K et al. (2017) Chexnet: 1038/s41598-020-76550-z
radiologist-level pneumonia detection on chest x-rays with deep
13
Interdisciplinary Sciences: Computational Life Sciences
17. Alakus TB, Turkoglu I (2020) Comparison of deep learning covid-19 and their usability for detecting omicron. J Exp Theor
approaches to predict covid-19 infection. Chaos Solit Fractals Artif Intell. [Link]
140:110120. [Link] 33. Khan SH, Sohail A, Khan A, Hassan M, Lee YS, Alam J, Basit A,
18. Basu S, Mitra S, Saha N (2020) Deep learning for screening Zubair S (2021) Covid-19 detection in chest x-ray images using
covid-19 using chest x-ray images. In: 2020 IEEE Symposium deep boosted hybrid learning. Comput Biol Med 137:104816.
Series on Computational Intelligence (SSCI), pp. 2521–2527. [Link]
[Link] IEEE 34. Khan SH (2022) Covid-19 detection and analysis from lung ct
19. Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lal‑ images using novel channel boosted cnns. arXiv preprint arXiv:
bakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani 2209.10963. [Link]
M et al (2020) Artificial intelligence and covid-19: deep learning 35. Khan SH, Sohail A, Khan A, Lee Y-S (2022) Covid-19 detec‑
approaches for diagnosis and treatment. IEEE Access 8:109581– tion in chest x-ray images using a new channel boosted cnn.
109595. [Link] Diagnostics 12(2):267. [Link]
20. Zhang J, Xie Y, Li Y, Shen C, Xia Y (2020) Covid-19 screen‑ 0267
ing on chest x-ray images using deep learning based anomaly 36. Kumar P, Grewal M, Srivastava MM (2018) Boosted cascaded
detection. arXiv preprint arXiv:2003.1233827. [Link] convnets for multilabel classification of thoracic diseases in
48550/arXiv.2003.12338 chest radiographs. In: International Conference Image Analysis
21. Wang S, Zha Y, Li W, Wu Q, Li X, Niu M, Wang M, Qiu X, Li and Recognition, pp. 546–552. [Link]
H, Yu H et al (2020) A fully automatic deep learning system for 319-93000-8_62. Springer
covid-19 diagnostic and prognostic analysis. Eur Respir J. [Link] 37. Yao L, Poblenz E, Dagunts D, Covington B, Bernard D, Lyman
doi.org/10.1183/13993003.00775-2020 K (2017) Learning to diagnose from scratch by exploiting
22. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, dependencies among labels. arXiv preprint arXiv:1710.10501.
Thrun S (2017) Dermatologist-level classification of skin cancer [Link]
with deep neural networks. Nature 542(7639):115–118. [Link] 38. Pesce E, Withey SJ, Ypsilantis P-P, Bakewell R, Goh V, Mon‑
doi.org/10.1038/nature21056 tana G (2019) Learning to detect chest radiographs containing
23. Rezvantalab A, Safigholi H, Karimijeshni S (2018) Dermatologist pulmonary lesions using visual attention networks. Med Image
level dermoscopy skin cancer classification using different deep Anal 53:26–38. [Link]
learning convolutional neural networks algorithms. arXiv preprint 39. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM (2017)
arXiv:1810.10348. [Link] Chestx-ray8: Hospital-scale chest x-ray database and bench‑
24. Hosny KM, Kassem MA, Foaud MM (2018) Skin cancer clas‑ marks on weakly-supervised classification and localization of
sification using deep learning and transfer learning. In: 2018 9th common thorax diseases. In: 2017 IEEE Conference on Com‑
Cairo International Biomedical Engineering Conference (CIBEC), puter Vision and Pattern Recognition, pp. 3462–3471. [Link]
pp. 90–93. [Link] IEEE doi.org/10.1109/CVPR.2017.369
25. Manning DJ, Ethell S, Donovan T (2004) Detection or decision 40. Summers R (2019) NIH chest X-ray dataset of 14 common tho‑
errors? Missed lung cancer from the posteroanterior chest radio‑ rax disease categories. figshare [Link]
graph. Br J Radiol 77(915):231–235. [Link] ChestXray-NIHCC/file/220660789610
28883951 41. Chen Y, Zhang Y, Huang Z, Luo Z, Chen J (2021) Celebhair: A
26. Ciompi F, Chung K, Van Riel SJ, Setio AAA, Gerke PK, Jacobs new large-scale dataset for hairstyle recommendation based on
C, Scholten ET, Schaefer-Prokop C, Wille MM, Marchiano A et al celeba. In: Knowledge Science, Engineering and Management:
(2017) Towards automatic pulmonary nodule management in lung 14th International Conference, KSEM 2021, Tokyo, Japan,
cancer screening with deep learning. Sci Rep 7(1):1–11. [Link] August 14–16, 2021, Proceedings, Part III, pp. 323–336. [Link]
doi.org/10.1038/srep46479 doi.org/10.1007/978-3-030-82153-1_27. Springer
27. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, 42. Shrimali S (2021) Plantifyai: a novel convolutional neural net‑
Tse D, Etemadi M, Ye W, Corrado G et al (2019) End-to-end work based mobile application for efficient crop disease detec‑
lung cancer screening with three-dimensional deep learning on tion and treatment. Procedia Comput Sci 191:469–474. [Link]
low-dose chest computed tomography. Nat Med 25(6):954–961. doi.org/10.1016/j.procs.2021.07.059
[Link] 43. Kenter T, Jones L, Hewlett D (2018) Byte-level machine reading
28. Baltruschat IM, Nickisch H, Grass M, Knopp T, Saalbach A across morphologically varied languages. In: Proceedings of the
(2019) Comparison of deep learning approaches for multi-label AAAI Conference on Artificial Intelligence, vol. 32. [Link]
chest x-ray classification. Sci Rep 9(1):1–10. [Link] org/10.1609/aaai.v32i1.12050
1038/s41598-019-42294-8 44. Mariani S, Rendu Q, Urbani M, Sbarufatti C (2021) Causal
29. Ayan E, Ünver HM (2019) Diagnosis of pneumonia from chest dilated convolutional neural networks for automatic inspection
x-ray images using deep learning. In: 2019 Scientific Meeting on of ultrasonic signals in non-destructive evaluation and struc‑
Electrical-Electronics & Biomedical Engineering and Computer tural health monitoring. Mech Syst Signal Process 157:107748.
Science (EBBT), pp. 1–5. [Link] [Link]
8741582. IEEE 45. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Ima‑
30. Khan SH, Sohail A, Khan A, Lee YS (2020) Classification and genet: A large-scale hierarchical image database. In: 2009 IEEE
region analysis of covid-19 infection using lung ct images and Conference on Computer Vision and Pattern Recognition, pp.
deep convolutional neural networks. arXiv preprint arXiv:2009. 248–255. [Link] IEEE
08864. [Link] 46. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017)
31. Khan SH, Sohail A, Zafar MM, Khan A (2021) Coronavirus dis‑ Densely connected convolutional networks. In: 2017 IEEE
ease analysis using chest x-ray images and a novel deep convolu‑ Conference on Computer Vision and Pattern Recognition, pp.
tional neural network. Photodiagnosis Photodyn Ther 35:102473. 2261–2269 . [Link] oi.o rg/1 0.1 109/C VPR.2 017.2 43. IEEE
[Link] Computer Society
32. Khan A, Khan SH, Saif M, Batool A, Sohail A, Waleed Khan M 47. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
(2023) A survey of deep learning techniques for the analysis of image recognition. In: Proceedings of the IEEE Conference on
13
Interdisciplinary Sciences: Computational Life Sciences
Computer Vision and Pattern Recognition, pp. 770–778. [Link] (2011) White paper report of the 2010 rad-aid conference on inter‑
doi.org/10.1109/CVPR.2016.90 national radiology for developing countries: identifying sustain‑
48. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for able strategies for imaging services in the developing world. J Am
convolutional neural networks. In: International Conference on Coll Radiol 8(8):556–562. [Link]
Machine Learning, pp. 6105–6114. [Link] 011
arXiv.1905.11946. PMLR 55. Fitzgerald R (2001) Error in radiology. Clin Radiol 56(12):938–
49. Kingma DP, Ba J (2014) Adam: a method for stochastic optimiza‑ 946. [Link]
tion. arXiv preprint arXiv:1412.6980. [Link] 56. Donovan T, Litchfield D (2013) Looking for cancer: expertise
arXiv.1412.6980 related differences in searching and decision making. Appl Cogn
50. Shazia A, Xuan TZ, Chuah JH, Usman J, Qian P, Lai KW (2021) Psychol 27(1):43–49. [Link]
A comparative study of multiple neural network for detection 57. Bass JC, Chiles C (1990) Visual skill. Correlation with detection
of covid-19 on chest x-ray. EURASIP J Adv Signal Process of solitary pulmonary nodules. Investig Radiol 25(9):994–998.
2021(1):1–16. [Link] [Link]
51. Hasan N, Bao Y, Shawon A, Huang Y (2021) Densenet convolu‑ 58. Carmody DP, Nodine CF, Kundel HL (1980) An analysis of per‑
tional neural networks application for predicting covid-19 using ceptual and cognitive factors in radiographic interpretation. Per‑
CT image. SN Comput Sci 2(5):389. [Link] ception 9(3):339–344. [Link]
s42979-021-00782-7
52. Ogundokun RO, Maskeliūnas R, Misra S, Damasevicius R (2022) Springer Nature or its licensor (e.g. a society or other partner) holds
A novel deep transfer learning approach based on depth-wise sep‑ exclusive rights to this article under a publishing agreement with the
arable CNN for human posture detection. Information 13(11):520. author(s) or other rightsholder(s); author self-archiving of the accepted
[Link] manuscript version of this article is solely governed by the terms of
53. Organization WH et al. (2001) Standardization of interpretation such publishing agreement and applicable law.
of chest radiographs for the diagnosis of pneumonia in children.
Technical report, World Health Organization. [Link] pps.w ho.i nt/
iris/bitstream/handle/10665/66956/WHO_V_and_B_01.35.pdf
54. Welling RD, Azene EM, Kalia V, Pongpirul K, Starikovsky A,
Sydnor R, Lungren MP, Johnson B, Kimble C, Wiktorek S et al
13