Applsci 11 11185
Applsci 11 11185
sciences
Article
An Improved VGG16 Model for Pneumonia Image Classification
Zhi-Peng Jiang 1,2 , Yi-Yang Liu 1,3 , Zhen-En Shao 1 and Ko-Wei Huang 1, *
Abstract: Image recognition has been applied to many fields, but it is relatively rarely applied
to medical images. Recent significant deep learning progress for image recognition has raised
strong research interest in medical image recognition. First of all, we found the prediction result
using the VGG16 model on failed pneumonia X-ray images. Thus, this paper proposes IVGG13
(Improved Visual Geometry Group-13), a modified VGG16 model for classification pneumonia X-rays
images. Open-source thoracic X-ray images acquired from the Kaggle platform were employed for
pneumonia recognition, but only a few data were obtained, and datasets were unbalanced after
classification, either of which can result in extremely poor recognition from trained neural network
models. Therefore, we applied augmentation pre-processing to compensate for low data volume and
poorly balanced datasets. The original datasets without data augmentation were trained using the
proposed and some well-known convolutional neural networks, such as LeNet AlexNet, GoogLeNet
and VGG16. In the experimental results, the recognition rates and other evaluation criteria, such
as precision, recall and f-measure, were evaluated for each model. This process was repeated
Citation: Jiang, Z.-P.; Liu, Y.-Y.; Shao,
for augmented and balanced datasets, with greatly improved metrics such as precision, recall
Z.-E.; Huang, K.-W. An Improved
and F1-measure. The proposed IVGG13 model produced superior outcomes with the F1-measure
VGG16 Model for Pneumonia Image
compared with the current best practice convolutional neural networks for medical image recognition,
Classification. Appl. Sci. 2021, 11,
confirming data augmentation effectively improved model accuracy.
11185. https://doi.org/10.3390/
app112311185
Keywords: thoracic X-ray; deep learning; data augmentation; convolutional neural network; LeNet;
Academic Editor: Jerry Chun-Wei Lin AlexNet; GoogLeNet; VGGNet; Keras
from large image datasets, combine them into a feature map and perform recognition by
connecting neurons. The approach has been successfully applied to various areas, such
as handwriting recognition [6–8], face recognition [9–11], automatic driving vehicles [12],
video surveillance [13] and medical image recognition [14–17].
With the development of computer technology applied to the medical field, whether
through basic medical applications, disease treatments, clinical trials or new drug therapies,
all such applications involve data acquisition, management and analysis. Therefore, deter-
mining how modern medical information can be used to provide the required data is an
important key to modern medical research. Medical services mainly include telemedicine,
information provided through internet applications and digitization of medical informa-
tion. In this way, we more accurately and quickly confirm a patient’s physical condition and
determine how best to treat a patient, thereby improving the quality of medical care. Smart
healthcare can help us to establish an effective clinical decision support system to improve
work efficiency and the quality of diagnosis and treatment. This is of particular importance
in the aging population in society, which has many medical problems. In addition, the
COVID-19 [18–22] outbreak in 2020 greatly increased the demand for medical informa-
tion processing. Therefore, medical information is combined with quantitative medical
research; in addition, assistance in diagnosis from an objective perspective is a trend at
present. The development of big data systems has also enabled the systematic acquisition
and integration of medical images [23–27]. Furthermore, traditional image processing
algorithms have gradually been replaced by deep learning algorithms.
Recent deep learning and machine learning developments mean that traditional
image processing method performance for image recognition is no longer comparable
to that of neural network (NN)-based approaches. Consequently, many studies have
proposed optimized deep learning algorithms to improve image recognition accuracy for
various recognition scenarios. CNN is the most prominent approach for image recognition,
improving recognition accuracy by increasing hidden layer depths and effectively acquiring
more characteristic parameters. Successful image recognition applications include face,
object, and license plate recognition, but medical image recognition is less common due
to difficulties acquiring medical images and poor understanding regarding how diseases
appear in the various images. Therefore, physician assistance is usually essential for
medical image recognition to diagnose and label focal areas or lesions before proceeding to
model training. This study used open-source thoracic X-ray images from the Kaggle data
science community, which were already categorized and labeled by professional physicians.
Recognition systems were pre-trained using LeNet [28], AlexNet [2], GoogLeNet [29]
and VGG16 [30] images, but trained VGG16 model classification exhibited poor image
classification accuracy in the test results. Therefore, this paper proposes IVGG13 to solve
the problem of applying VGG16 to medical image recognition. Several other well-known
CNNs were also trained on the same datasets, and the outcomes were compared with
the proposed IVGG13 approach. The proposed IVGG13 model outperformed all other
CNN models considered. We also applied data augmentation to increase the raw dataset
and improve the data balance, hence improving the model recognition rate. It is essential
to consider hardware requirements for CNN training and deployment. The number of
network control parameters increases rapidly with increasing network layer depth, which
imposes higher requirements on hardware and increases overhead and computing costs.
Therefore, this paper investigated methods to reduce network depth and parameter count
without affecting recognition accuracy. The proposed IVGG13 incorporates these learnings,
and it has strong potential for practical medical image recognition systems.
The remainder of this paper is organized as follows. The related works are described
in Section 2. The research methodology is stated in Section 3. The performance evaluation is
outlined in Section 4. Finally, conclusions and suggestions for future research are provided
in Section 5.
has strong potential for practical medical image recognition systems.
The remainder of this paper is organized as follows. The related works are described
in Section 2. The research methodology is stated in Section 3. The performance evaluation
is outlined in Section 4. Finally, conclusions and suggestions for future research are
Appl. Sci. 2021, 11, 11185 provided in Section 5. 3 of 19
Figure 1.
Figure LeNetnetwork
1. LeNet network architecture
architecture [28].
[28].
2.1.2. AlexNet
2.1.2. AlexNet
AlexNet was the first CNN used in the LSVRC competition, proposed by Krizhevsky et al.
AlexNet was the first CNN used in the LSVRC competition, proposed by Krizhevsky
2012 [2], which won the competition with significantly improved accuracy compared with all
et al. 2012 [2], which won the competition with significantly improved accuracy compared
previous models, including the one that took second place that year. AlexNet has three main
with all previous models, including the one that took second place that year. AlexNet has
features:
three main features:
(1) Employs the ReLU non-linear activation function to solve the vanishing gradient prob-
(1) Employs the ReLU non-linear activation function to solve the vanishing gradient
lem more effectively than sigmoid and tanh activation functions used in other NNs;
problem more effectively than sigmoid and tanh activation functions used in other
(2) Adds dropout and data augmentation in the network layer to prevent overfitting; and
NNs;
(3) Employs multiple parallel GPUs to accelerate computational throughput during training.
As shown in Figure 2, AlexNet architecture is similar to LeNet. The network archi-
tecture is divided into two layers since training is done on two GPUs due to memory
restrictions, and data dropout or augmentation is added to prevent overfitting.
and
(3) Employs multiple parallel GPUs to accelerate computational throughput during
training.
As shown in Figure 2, AlexNet architecture is similar to LeNet. The network
Appl. Sci. 2021, 11, 11185 4 of 19
architecture is divided into two layers since training is done on two GPUs due to memory
restrictions, and data dropout or augmentation is added to prevent overfitting.
Figure
Figure2.2.AlexNet
AlexNetnetwork
networkarchitecture
architecture[2].
[2].
2.1.3.VGGNet
2.1.3. VGGNet
VGGNetisisaaCNN
VGGNet CNNjointly
jointlydeveloped
developedby bythe
theVisual
VisualGeometry
GeometryGroup
Groupat
atthe
theUniversity
University
of Oxford and Google DeepMind [30]. As shown in Figure 3, VGGNet architecture cancan
of Oxford and Google DeepMind [30]. As shown in Figure 3, VGGNet architecture be
be considered
considered an extended
an extended AlexNet,
AlexNet, characterized
characterized by 3 by × 3 convolutional
× 33convolutional kernels
kernels and 2 and
×2
2 × 2 pooling
pooling layers,layers, andnetwork
and the the network architecture
architecture can can
be be deepenedbybyusing
deepened using smaller
smaller
Appl. Sci. 2021, 11, x FOR PEER REVIEW
convolutional layers to enhance feature learning. The two most common current 5 of 21
VGGNet
convolutional layers to enhance feature learning. The two most common current VGGNet
versionsare
versions areVGGNet-16
VGGNet-16 andand VGGNet-19
VGGNet-19 [30].
[30].
Figure3.3. VGGNet
Figure VGGNet network
networkarchitecture.
architecture.
2.1.4.
2.1.4. GoogLeNet
GoogLeNet
The
The earliest
earliest GoogLeNet
GoogLeNet version,
version, Inception
Inception V1,
V1, won
won the
theILSVRC
ILSVRCcompetition
competition with
with
higher accuracy than VGGNet in 2014 [29]. Figure 4 displays a typical
higher accuracy than VGGNet in 2014 [29]. Figure 4 displays a typical GoogLeNet
GoogLeNetar-
architecture. Inception architecture was subsequently derived to deepen and widen the
network by using receptive vision fields with different convolutional kernel sizes to
improve network accuracy.
Figure 3. VGGNet network architecture.
Figure 4.
Figure Typical GoogLeNet
4. Typical GoogLeNet network
network architecture
architecture [29].
[29].
Figure 5 demonstrates that inception architecture contains convolutional 1 × 1, 3 × 3,
Figure 5 demonstrates
and 5 × 5 kernels that inception
with maximum pooling architecture contains
3 × 3 stacking. convolutional
Different 1 × 1,
convolutional 3 × 3,
kernels
and
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 × 5 kernels with maximum pooling 3 × 3 stacking. Different convolutional
sizes are used for feature extraction and connection to increase network width and enhance kernels
6 of 21
sizes are used for feature extraction and connection to increase network width
adaptability to different sizes. The 3 × 3 and 5 × 5 convolutional kernels are preceded by and
1enhance adaptabilitykernels
× 1 convolutional to different sizes. The 3 ×and
for dimensionality 3 and 5 × 5 convolutional
parameter size reduction,kernels
reducingare
preceded
computing by 1 × 1
volume convolutional
and kernels
correcting for
nonlineardimensionality
functions. and parameter
Finally, a 1 ×
reducing computing volume and correcting nonlinear functions. Finally, a 1 × is11 size reduction,
convolution
added after 3is×added
convolution 3 maximum
after 3 pooling.
× 3 maximum pooling.
Figure5.5. Typical
Figure Typicalinception
inceptionmodule
modulearchitecture
architecture[29].
[29].
3. Research Methods
3. Research Methods
This section discusses various study approaches. Section 3.1 introduces the source and
This section discusses various study approaches. Section 3.1 introduces the source
classification for image datasets used in this study; Section 3.2 describes data augmentation
and classification
pre-processing for unbalanced
to solve image datasets used
or small in this
dataset study; and
problems; Section 3.2 3.3
Section describes
presentsdata
the
augmentation pre-processing
proposed IVGG13 model. to solve unbalanced or small dataset problems; and Section
3.3 presents
Figure 6 the proposed
shows the flowIVGG13
chart ofmodel.
data pre-processing and Figure 7 exhibits the proposed
CNNFigure
training6 process.
shows the flow AlexNet,
LeNet, chart of GoogLeNet,
data pre-processing and IVGG13
VGG16 and Figure 7models
exhibits the
were
proposed CNN training process. LeNet, AlexNet, GoogLeNet, VGG16
trained with the original datasets without data augmentation and then evaluated and and IVGG13
models were
compared. Thetrained
modelswithwerethe original datasets
subsequently trainedwithout data augmentation
with augmented datasets andandagain
then
evaluated and compared.
evaluated and compared. The models were subsequently trained with augmented
datasets and again evaluated and compared.
augmentation pre-processing to solve unbalanced or small dataset problems; and Section
3.3 presents the proposed IVGG13 model.
Figure 6 shows the flow chart of data pre-processing and Figure 7 exhibits the
proposed CNN training process. LeNet, AlexNet, GoogLeNet, VGG16 and IVGG13
Appl. Sci. 2021, 11, 11185 models were trained with the original datasets without data augmentation and6 of then
19
evaluated and compared. The models were subsequently trained with augmented
datasets and again evaluated and compared.
Figure 7.
Figure 7. CNN
CNN network
network training.
training.
3.1.
3.1. Training
Training Datasets
Datasets
We
Weused
usedan anopen-source
open-sourcedataset provided
dataset provided byby
thethe
Kaggle data
Kaggle science
data competition
science competitionplat-
form for
platform training
for (https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
training (https://www.kaggle.com/paultimothymooney/chest-xray-
accessed
pneumonia on 25 Marchon
accessed 2018) [31]. The
25 March dataset
2018) comprised
[31]. The thoracic cavity
dataset comprised images
thoracic from
cavity child
images
patients (1 to 5 years old) from the Guangzhou Women and Children’s
from child patients (1 to 5 years old) from the Guangzhou Women and Children’s MedicalMedical Center,
China.
Center,These
China. images
These were
imagesclassified by two expert
were classified by twophysicians and separated
expert physicians into training,
and separated into
test and validation
training, sets. Figure
test and validation 8 displays
sets. Figure 8 the dataset
displays thestructure, with training
dataset structure, withsets including
training sets
1341 and 3875,
including 1341 test
andsets 234test
3875, and 390,
sets validation
234 and 390, set 8, and eight
validation set 8,normal andnormal
and eight pneumonia
and
images,
pneumoniarespectively.
images, Figures 9 and 10
respectively. show examples
Figures 9 and 10 ofshow
normal and pneumonia
examples of normal thoracic
and
cavity X-ray images, respectively.
pneumonia thoracic cavity X-ray images, respectively.
pneumonia accessed on 25 March 2018) [31]. The dataset comprised thoracic cavity images
from child patients (1 to 5 years old) from the Guangzhou Women and Children’s Medical
Center, China. These images were classified by two expert physicians and separated into
training, test and validation sets. Figure 8 displays the dataset structure, with training sets
Appl. Sci. 2021, 11, 11185 including 1341 and 3875, test sets 234 and 390, validation set 8, and eight normal7 of and
19
pneumonia images, respectively. Figures 9 and 10 show examples of normal and
pneumonia thoracic cavity X-ray images, respectively.
Figure 9. Examplenormal
normal thoraciccavity
cavity X-rayimages
images fromthe
the studydataset
dataset [31].
Figure9.9. Example
Figure Example normal thoracic
thoracic cavityX-ray
X-ray imagesfrom
from thestudy
study dataset[31].
[31].
Figure 10. Example pneumonia thoracic cavity X-ray images from the study dataset [31].
Figure10.
Figure 10. Example
Example pneumonia
pneumoniathoracic
thoraciccavity
cavityX-ray
X-rayimages
imagesfrom
fromthe
thestudy
studydataset
dataset[31].
[31].
temperature for the original images to compensate for the lack of data volume. Data
augmentation increased the training set from 5216 to 22,146 images, and the test set from
624 to 1000 images. Furthermore, some images were transferred from the training to test
Appl.
Appl.Sci.
Sci. 2021,
2021, 11,
11, xx FOR
FOR PEER
PEER REVIEW
set for data balance and to ensure images in the test set were predominantly99originals.
REVIEW ofof 21
21
Figures 11 and 12 show examples of original and augmented images, respectively.
Figure 11.
Figure
Figure 11. Example
Exampleoriginal
11. Example originalimages
original images
images [31].
[31].
[31].
3.3. IVGG13
This study proposes IVGG13—an improved VGG16 that reduces the VGGNet network
depth—as shown in both Table 1 and Figure 13. The proposed network architecture
reduces the number of parameters by reducing the network depth compared to the original
VGG16 to avoid both under- and overfitting problems during training. The original
VGG16 convolutional architecture was retained by performing feature extraction using
two consecutive small convolutional kernels rather than a single large one. This maintains
VGG16 perceptual effects while reducing the number of parameters, which not only reduces
the training time but also maintains the network layer depth.
Table 1. Proposed IVGG13 network model.
Figure 13 highlights the similarities and differences between the IVGG13 and VGG16
network architectures.
First, the input image size was changed to 128 × 128, and the hidden layer was divided
into five blocks, with each block containing two convolutional layers and a pooling layer.
Thirty-two 3 × 3 convolutional kernels were randomly generated in each convolutional
layer for feature extraction, and the image size was reduced by the pooling layer. Convolu-
tional kernels in blocks 3–5 were the same size (3 × 3), but 64, 128 and 64 kernels in each
block, respectively. Reducing convolutional kernels reduced the number of parameters
required compared with VGG16. The image size was then reduced by the pooling layer,
feature maps were converted to one dimension by the flattened layer and finally, three of
the fully connected layers concatenated output features into two classifications.
Appl. Sci.
Sci. 2021, 11, x11185
2021, 11, FOR PEER REVIEW 1110of
of 21
19
(a) (b)
Figure 13.
Figure (a)Proposed
13. (a) Proposed IVGG13
IVGG13 and
and (b)
(b) conventional
conventional VGG16
VGG16 [30]
[30] network
network architectures.
architectures.
4. Results
Figure 13 highlights the similarities and differences between the IVGG13 and VGG16
Thisarchitectures.
network section provides a concise and precise description of the experimental results,
theirFirst,
interpretation,
the inputasimage
well as thewas
size experimental
changed conclusions
to 128 × 128, that canthe
and be hidden
drawn. Section 4.1
layer was
introduces the experimental environment of this article. Section 4.2 introduces
divided into five blocks, with each block containing two convolutional layers and a and com-
Appl. Sci. 2021, 11, 11185 11 of 19
pares various CNNs; Sections 4.3 and 4.4 discuss model outcomes without and with data
pre-processing, respectively; and Section 4.5 discusses VGG16 problems highlighted by the
experimental results.
4.2.2.
4.2.2. AlexNet
AlexNet
Table
Table 33 and
and Figure
Figure 15 15 exhibit
exhibit the
the AlexNet
AlexNet architecture
architecture used
used to
to train
train thoracic
thoracic X-ray
X-ray
images, which comprise five convolutional and three fully connected layers,
images, which comprise five convolutional and three fully connected layers, with 227 with 227 × 227×
input images. The first convolutional layer included 48 (11 × 11) convolutional
227 input images. The first convolutional layer included 48 (11 × 11) convolutional kernels kernels to
produce 227 × 227 images, followed by local response normalization in the
to produce 227 × 227 images, followed by local response normalization in the LRN layer LRN layer to
reduce
to reduceimages
imagesto 55 × 55
to 55 × 55using 3×
using 3 ×3 3max
maxpooling.
pooling.The
Thesecond
secondconvolutional
convolutionallayer
layerwas
was
similar to the first, but included 128 (5 × 5) convolution kernels. Subsequent LRN and max
pooling layers reduced image size to 13 × 13, with convolution layers 3–5 employing 192,
192 and 128 (3 × 3) kernels, respectively, producing 13 × 13 images. Images were reduced
to 6 × 6 using a max pooling layer, and features were converted to one dimension in the
Appl. Sci. 2021, 11, 11185 12 of 19
similar to the first, but included 128 (5 × 5) convolution kernels. Subsequent LRN and max
pooling layers reduced image size to 13 × 13, with convolution layers 3–5 employing 192,
192 and 128 (3 × 3) kernels, respectively, producing 13 × 13 images. Images were reduced
to 6 × 6 using a max pooling layer, and features were converted to one dimension in the
flattening layer, fully connected, and output as two categories.
Table 3. AlexNet network model [2].
Figure 15.
Figure 15. AlexNet
AlexNet architecture
architecture employed
employed[2].
[2].
4.2.3.
4.2.3. VGG16
VGG16
Considering
Considering the
the selected
selected workstation
workstation performance,
performance, wewe only
only used
used VGG16
VGG16 CNN
CNN forfor
this
this study. First, we used the VGG16 model and applied it to pneumonia X-ray data
study. First, we used the VGG16 model and applied it to pneumonia X-ray data for
for
training,
training, but
but the
the prediction
prediction results
results failed.
failed. Table
Table 44 and
andFigure
Figure16 16indicate
indicatethat
thatour
ourVGG16
VGG16
architecture contains 13 convolutional and 3 fully connected layers, with 3 × 3 kernels for
the convolutional layers and 2 × 2 parameters for the pooling layers. VGG16 convolutional
and pooling layers are divided into blocks 1–5, where each block contains multiple
convolutional layers and a single pooling layer. The two convolutional layers in block 1
Appl. Sci. 2021, 11, 11185 13 of 19
architecture contains 13 convolutional and 3 fully connected layers, with 3 × 3 kernels for
the convolutional layers and 2 × 2 parameters for the pooling layers. VGG16 convolu-
tional and pooling layers are divided into blocks 1–5, where each block contains multiple
convolutional layers and a single pooling layer. The two convolutional layers in block 1
each use 16 kernels for feature extraction, with image size subsequently reduced in the
pooling layer. Subsequent blocks have similar architecture, except that blocks 1 and 2 use
two convolutional layers, whereas blocks 3–5 use three convolutional layers with different
kernel numbers in each layer to deepen the network and improve accuracy. Finally, three
fully connected layers concatenate and output features into two classifications.
Figure 7 compares the overall network performance calculated using Equations (1)–(4).
A good prediction model requires not only high accuracy but also generalizability. Gener-
ally, the validation is initially high, because the validation data is primarily used to select
and modify the model; if the right validation data is selected at the beginning, the value of
validation will exceed the accuracy value of the training set. Conversely, if the wrong data
is selected, the parameters will be corrected and updated.
Evaluation metrics are usually derived from the confusion matrix (Tables 6 and 7) to
evaluate classification results, where true positive (TP) means both actual and predicted
results are pneumonia; true negative (TN) means both actual and predicted results are
normal; false positive (FP) means actual results are normal but predicted to be pneumonia;
and false negative (FN) means actual results are pneumonia but predicted to be normal.
Table 6. General confusion matrix.
Actual Category
Pneumonia Normal
Pneumonia True positive (TP) False positive (FP)
Predicted category
Normal False negative (FN) True negative (TN)
Table 8 compares the evaluation results for each model, where accuracy, precision,
recall and F-measure were calculated as shown in Equations (1)–(4). Precision is the
proportion of relevant instances among those retrieved, recall is the proportion of rele-
vant instances that were retrieved and the F1-measure is a special case of the F-measure
for β = 1, i.e., precision and recall are equally important. A larger F1 means improved
model effectiveness.
TP + TN
Accuracy = (1)
TP + FP + FN + TN
TP
Precision = (2)
TP + FP
TP
Recall = (3)
TP + FN
Appl. Sci. 2021, 11, 11185 16 of 19
Precision × Recall
Fβ − Measure = 1 + β2 (4)
( β2 × Precision) + Recall
Table 8. Prediction results for each network model.
TP FP TN FN
4000 500 500 0 0
5000 500 500 0 0
6000 500 500 0 0
7000 500 500 0 0
8000 500 500 0 0
9000 500 500 0 0
10,000 500 500 0 0
Author Contributions: Conceptualization and methodology, Z.-P.J., Y.-Y.L. and K.-W.H. software—
Z.-P.J. and Z.-E.S. formal analysis—Y.-Y.L. writing—original draft preparation, Z.-P.J., Y.-Y.L., Z.-E.S.
and K.-W.H.; writing—review and editing, Z.-P.J., Y.-Y.L., Z.-E.S. and K.-W.H. All authors have read
and agreed to the published version of the manuscript.
Appl. Sci. 2021, 11, 11185 18 of 19
Funding: This work was supported in part by the Ministry of Science and Technology, Taiwan,
R.O.C., under grants MOST 110-2222-E-992-006-.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al.
Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [CrossRef]
2. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf.
Process. Syst. 2012, 25, 1097–1105. [CrossRef]
3. Paluru, N.; Dayal, A.; Jenssen, H.B.; Sakinis, T.; Cenkeramaddi, L.R.; Prakash, J.; Yalavarthy, P.K. Anam-Net: Anamorphic depth
embedding-based lightweight CNN for segmentation of anomalies in COVID-19 chest CT images. IEEE Trans. Neural Netw. Learn. Syst.
2021, 32, 932–946. [CrossRef] [PubMed]
4. Wu, Y.-H.; Gao, S.-H.; Mei, J.; Xu, J.; Fan, D.-P.; Zhang, R.-G.; Cheng, M.-M. Jcs: An explainable COVID-19 diagnosis system by
joint classification and segmentation. IEEE Trans. Image Process. 2021, 30, 3113–3126. [CrossRef]
5. Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep learning techniques for medical image segmentation: Achievements and
challenges. J. Digit. Imaging 2019, 32, 582–596. [CrossRef] [PubMed]
6. Vashist, P.C.; Pandey, A.; Tripathi, A. A comparative study of handwriting recognition techniques. In Proceedings of the 2020
International Conference on Computation, Automation and Knowledge Management (ICCAKM), Dubai, United Arab Emirates,
9–10 January 2020; pp. 456–461. [CrossRef]
7. Ghanim, T.M.; Khalil, M.I.; Abbas, H.M. Comparative study on deep convolution neural networks DCNN-based offline Arabic
handwriting recognition. IEEE Access 2020, 8, 95465–95482. [CrossRef]
8. Aly, S.; Almotairi, S. Deep convolutional self-organizing map network for robust handwritten digit recognition. IEEE Access 2020,
8, 107035–107045. [CrossRef]
9. Prakash, R.M.; Thenmoezhi, N.; Gayathri, M. Face recognition with convolutional neural network and transfer learning.
In Proceedings of the 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India,
27–29 November 2019; pp. 861–864. [CrossRef]
10. Kakarla, S.; Gangula, P.; Rahul, M.S.; Singh, C.S.C.; Sarma, T.H. Smart Attendance Management System Based on Face Recognition
Using CNN. In Proceedings of the 2020 IEEE-HYDCON, Hyderabad, India, 11–12 September 2020; pp. 1–5. [CrossRef]
11. Zhou, Y.; Ni, H.; Ren, F.; Kang, X. Face and gender recognition system based on convolutional neural networks. In Proceedings of
the 2019 IEEE International Conference on Mechatronics and Automation (ICMA), Tianjin, China, 4–7 August 2019; pp. 1091–1095.
[CrossRef]
12. Ouyang, Z.; Niu, J.; Liu, Y.; Guizani, M. Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans. Mob. Comput.
2019, 19, 300–313. [CrossRef]
13. Muhammad, K.; Hussain, T.; Tanveer, M.; Sannino, G.; de Albuquerque, V.H.C. Cost-Effective Video Summarization Using Deep
CNN With Hierarchical Weighted Fusion for IoT Surveillance Networks. IEEE Internet Things J. 2019, 7, 4455–4463. [CrossRef]
14. Shibly, K.H.; Dey, S.K.; Islam, M.T.U.; Rahman, M.M. COVID faster R–CNN: A novel framework to Diagnose Novel Coronavirus
Disease (COVID-19) in X-ray images. Inform. Med. Unlocked 2020, 20, 100405. [CrossRef]
15. Polsinelli, M.; Cinque, L.; Placidi, G. A light CNN for detecting COVID-19 from CT scans of the chest. Pattern Recognit. Lett. 2020,
140, 95–100. [CrossRef]
16. Sethi, R.; Mehrotra, M.; Sethi, D. Deep learning based diagnosis recommendation for COVID-19 using chest X-rays images.
In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA),
Coimbatore, India, 15–17 July 2020; pp. 1–4. [CrossRef]
17. Tahir, H.; Iftikhar, A.; Mumraiz, M. Forecasting COVID-19 via Registration Slips of Patients using ResNet-101 and Performance
Analysis and Comparison of Prediction for COVID-19 using Faster R-CNN, Mask R-CNN, and ResNet-50. In Proceedings of the
2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT),
Bhilai, India, 19–20 February 2021; pp. 1–6. [CrossRef]
18. Abbas, A.; Abdelsamea, M.M.; Gaber, M.M. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional
neural network. Appl. Intell. 2021, 51, 854–864. [CrossRef]
19. Jain, G.; Mittal, D.; Thakur, D.; Mittal, M.K. A deep learning approach to detect COVID-19 coronavirus with X-ray images.
Biocybern. Biomed. Eng. 2020, 40, 1391–1405. [CrossRef]
20. Sitaula, C.; Hossain, M.B. Attention-based VGG-16 model for COVID-19 chest X-ray image classification. Appl. Intell. 2021, 51,
2850–2863. [CrossRef]
Appl. Sci. 2021, 11, 11185 19 of 19
21. Baltruschat, I.M.; Nickisch, H.; Grass, M.; Knopp, T.; Saalbach, A. Comparison of deep learning approaches for multi-label chest
X-ray classification. Sci. Rep. 2019, 9, 6381. [CrossRef] [PubMed]
22. Minaee, S.; Kafieh, R.; Sonka, M.; Yazdani, S.; Soufi, G.J. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep
transfer learning. Med. Image Anal. 2020, 65, 101794. [CrossRef] [PubMed]
23. Loddo, A.; Pili, F.; Di Ruberto, C. Deep Learning for COVID-19 Diagnosis from CT Images. Appl. Sci. 2021, 11, 8227. [CrossRef]
24. Ibrahim, A.U.; Ozsoz, M.; Serte, S.; Al-Turjman, F.; Yakoi, P.S. Pneumonia classification using deep learning from chest X-ray
images during COVID-19. Cogn. Comput. 2021, 1–13. [CrossRef]
25. Gabruseva, T.; Poplavskiy, D.; Kalinin, A. Deep learning for automatic pneumonia detection. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 350–351. [CrossRef]
26. Liang, G.; Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Methods
Programs Biomed. 2020, 187, 104964. [CrossRef] [PubMed]
27. El Asnaoui, K.; Chawki, Y.; Idri, A. Automated methods for detection and classification pneumonia based on X-ray images using
deep learning. In Artificial Intelligence and Blockchain for Future Cybersecurity Applications; Springer: Cham, Switzerland, 2021;
pp. 257–284. [CrossRef]
28. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86,
2278–2324. [CrossRef]
29. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper
with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA,
7–12 June 2015; pp. 1–9. [CrossRef]
30. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
31. Kermany, D.; Zhang, K.; Goldbaum, M. Labeled optical coherence tomography (oct) and chest X-ray images for classification.
Mendeley Data 2018, 2. [CrossRef]
32. Ord, K. Data adjustments, overfitting and representativeness. Int. J. Forecast. 2020, 36, 195–196. [CrossRef]