Improving RepVGG Model With Variational Data Imputation in COVID-19 Classification
Improving RepVGG Model With Variational Data Imputation in COVID-19 Classification
Corresponding Author:
An Hoang Nguyen
School of Electrical Engineering, International University
Quarter 6, Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Vietnam
Email: [email protected]
1. INTRODUCTION
After more than a year, Coronavirus disease 2019 (COVID-19) has been declared a worldwide
pandemic and a public health emergency by the World Health Organization (WHO). The rapid transmission
of this disease in humans makes COVID-19 become one of the major threats to humanity. This is also one of
the main factors leading to complex situations. According to the report of WHO, the confirmed cases have
reached over 180 billion and there are 4 billion people who have fallen into fatal cases [1]. Reverse
transcription-polymerase chain reaction (RT-PCR) is the main tool to diagnose COVID-19. This method is to
conduct the detection of the viral nucleic acid in SARS-CoV-2. Nevertheless, the error during the process of
sampling caused by samples with low viral load is one of the disadvantages of the RT-PCR test. Antigen
testing is known to give rapid results, but the sensitivity to detect viruses in patient samples is low. Besides,
this normally requires a huge number of testing kits and health human resources, followed by the increase of
infected patients. To overcome the high demand for the usage of the toolkit, medical imaging methods such
as X-ray and computerized tomography (CT) are considered as an alternative solution for detecting
disease [2]. Therefore, in compared to other imaging techniques, chest X-ray (CXR) is known as an imaging
approach to aid radiologists in rapid identification and reduction of expenses.
Recently, deep learning has emerged as a cutting-edge method for computer vision and pattern
recognition achieving outstanding results in image-based classification. Therefore, the implementation of
deep learning in medical imaging such as X-ray is considered as a new trend in automated disease
identification and several of these types of research have been already conducted in the field of COVID-19
detection. In recent work, COVID-19 detection based on CXR images is proposed by Wang et al. [3] using a
deep convolutional neural network (DCNN). Compared with other pre-trained networks, their proposed
COVID-Net gives an outstanding result of over 93% in terms of average accuracy. Besides, COVID-ResNet
is proposed by [4] based on the traditional ResNet-50 to process the CXR images in COVIDx dataset. The
better generalization coming from 3 different resolutions of images training leads to that the accuracy is
higher than the COVID-Net - reaching 96.23%. To increase the number of images for training, the usage of
X-ray projected generative adversarial network (XPGAN) is mentioned by Quan et al. [5]. This is able to
synthesize more CXR images based on the current images which can give an improvement in data
augmentation and classification accuracy. Applying convolutional neural network (CNN) based on spectral
analysis is introduced by Singh and Singh [6]. Integration of multiresolution analysis (MRA) by wavelet
decomposition is conducted to produce frequency sub-bands before feeding into CNN for classification.
Grad-CAM is also applied to visualize gradient information in the forms of a heatmap for diagnosis and the
final obtained result achieves over 95%. Oh et al. [7] employed a CNN working with small patches to deal
with COVID-19 identification in the limited dataset. Lung area is extracted and then its background is
removed by FC-DenseNet. The processed images are separated into many small patches to feed in CNN for
training and testing. Despite the amelioration in sensitivity, the classification accuracy is 91.9% which is less
than the COVID-Net. Transfer learning from pre-trained models is also applied in the work [8]–[10]. By fine-
tuning, the network is able to learn the new features based on the CXR images and this process also gives a
better result compared with training from scratch in terms of accuracy and training time.
In this paper, the combination of RepVGG pre-trained model and variational data imputation is
proposed. Data imputation of variational autoencoder (VAE) is conducted with U-net for segmentation
training at initial. Then, the encoder part of U-net and VAE is preserved and treated as an additional feature
extractor for combining with RepVGG. By this connection, the extra features are added to the pre-trained
model which can give benefits to the classification performance. The structure of the paper is organized as
follow. The proposed approach is described in section 2, followed by the results and discussions are given in
section 3. Finally, the conclusion of the paper is given in section 4.
2. PROPOSED APPROACH
The proposed model to improve RepVGG in the case of COVID-19 disease identification based on
CXR image is illustrated in Figure 1. In the beginning, the CXR images are prepared and re-scaled to feed
into the two following networks. Initially, the U-Net type network is trained for lung segmentation with the
support of variational data imputation. The decoder part of the network is then removed while the output of
encoder part is flattened as a feature vector. Simultaneously, the final layer of RepVGG model is adjusted to
concatenate the features from the encoder part. Thanks to this connection, the classification layer is employed
based on two types of features extracted from two networks. While training the COVID-19 disease
classification, all the layers in U-net and a small part of layers in RepVGG are frozen. This can give an
improvement for learning as a result of the pre-trained model.
2.1. Dataset
The training classification in this work is conducted by COVID-19 radiography dataset [11] and [12].
The dataset is collected by international research teams in Qatar, Pakistan, and Malaysia from different sources
and publications. All the images have a resolution of 299×299 with the PNG format. In this research, we
employ three classes in the dataset including COVID-19, Normal and Viral Pneumonia which is described in
Table 1.
Improving RepVGG model with variational data imputation in COVID-19 classification (Kien Trang)
1280 ISSN: 2252-8938
The ELBO loss is the summation of two types of loss. The first term ℒ𝑟𝑒𝑐 indicates the
reconstruction loss for determining the loss between the original data and the reconstructed data which is
measured by the expected negative log-likelihood as given in (2). Another term is the Kulback-Leibler
divergence 𝒦ℒ𝑙𝑜𝑠𝑠 , also called relative entropy which is used for computing the difference of one probability
distribution 𝑄(𝓏|𝒳) and the reference probability distribution 𝑃(𝓏), as shown in (3). In the case of
multivariate normal distributions, the Kulback-Leibler divergence can be written in form of (4) where 𝑘 is the
dimensionality of latent space.
VAE, in our research, is used as data imputation to contribute more features for the segmentation
process. This runs against the traditional applications of VAE that some researchers focused on data
generation [15]–[17]. Besides, denoising based on VAE is proposed in [18] while Anh et al. [19] introduce
VAE as a feature extraction to combine with random forest for fraud detection.
The combination of U-net and VAE in [20] is conducted in this work. The block diagram of this
model is illustrated in Figure 3. Firstly, the training for lung area segmentation is conducted. Encoder in VAE
is stacked to the U-net through latent variable 𝓏. This connects directly with the latent space of U-Net which
is used to represent a low dimensionality of the input image. The combination of features is conducted and
fed into the decoder part for segmentation instead of the reconstruction as the origin. Therefore, the first term
in (1) would be modified to compute the loss of segmentation, while the second term remains objective as a
regularization.
Figure 3. The block diagram of U-net with variational data imputation model
2.3. RepVGG
Currently, deep learning is one of the advanced techniques which can deal with an enormous dataset
applying many fields such as image processing, and pattern recognition. DCNN takes advantage of the
Improving RepVGG model with variational data imputation in COVID-19 classification (Kien Trang)
1282 ISSN: 2252-8938
connection of more layers compared with the traditional neural network, which results in applying for
complicated tasks. Many pre-trained networks have been introduced to deal with different datasets such as
AlexNet [21], VGG [22], ResNet [23], GoogleNet [24], and DenseNet [25]. As a result, transfer learning
approaches are widely employed in many applications which could transfer the previous learning features to
the new tasks with impressive outcomes in terms of training time and accuracy. In the beginning, researchers
tend to propose single-branch model such as AlexNet and VGG which are easily implemented and give
impressive results in many competitions. This leads to the fact that many new models are attached with more
layers to carry out more complex datasets [26]. However, making a deeper network also exists some
disadvantages in terms of training because of gradient vanishing [27] which results in unable updating for the
initial layers in backpropagation. To address this phenomenon, skip-connection is applied as a solution
described in ResNet which encourages the development of later multi-branch models such as GoogleNet and
DenseNet. This type of models not only avoids the dependence in one branch preventing from gradient
vanishing but also allows to transmit the information in previous layers to the latter ones by concatenated
connection. Despite the multi-branch architecture brings an improvement in accuracy, the complexity is
increased giving rise to longer time for training and inference. This may cause an increase in the usage of
memory and be difficult to implement on some devices.
Inspired by the previous models, RepVGG [28] is born to resolve these disadvantages. Multi-branch
architecture is applied in RepVGG, however, the separation of training and inference model makes the
increase in performance while retaining the advantages of the multi-branch model. RepVGG model consists
of 5 main stages which contain similar blocks in terms of structure. Convolutional layer (Conv), ReLU
function and batch normalization (BN) are the main components in RepVGG. The first blocks in each stage
carry a 3×3 Conv with stride-2 and a 1×1 Conv with stride-2 for down-sampling. From the second blocks
onwards, there are 3 branches: 3×3 Conv with stride-1 added BN, 1×1 Conv added BN and identity using
BN. The summation of branches is conducted before feeding into ReLU activation function. To conquer the
disadvantages of multi-branch architecture, RepVGG mentions a process called reparameterization which is
used to convert from multi-branch to single-branch model before inference. Initially, the convolutional layer
is fused with batch normalization. The function of convolutional layer 𝐶𝑜𝑛𝑣 is described in (5) where 𝒲 and
ℬ are weight and bias. Then, normalization is performed by subtracting the mean value 𝜇 and dividing by the
standard deviation of the batch. In terms of batch normalization ℬ𝒩, 𝛾 and 𝛽 are the scaling and shifting
factors, which are mentioned in (6). Substituting (5) into (6), the result is obtained in (7). It is noticed that the
first term and second term are similar to 𝐶𝑜𝑛𝑣 function in terms of characteristics. Therefore, let 𝒲𝑓𝑢𝑠𝑒𝑑 and
ℬ𝑓𝑢𝑠𝑒𝑑 be the first and second terms, the fusion function can be rewritten in (8). Besides, 1×1 Conv is able to
be replaced by 3×3 Conv by zero-padding and adjusting the center value of kernel. Subsequently, to convert
the identity branch to feed with 3×3 Conv, the identity matrix is employed as a convolutional kernel. Thanks
to the mentioned process, the fusion of three different branches is presented as one 3×3 Conv block.
Therefore, the architecture is converted from multi-branch to single-branch for inference process as depicted
in Figure 4.
In this work, the final layer in the last stage of RepVGG is replaced by a flattened layer for joining
with the encoders as shown in Figure 1. This allows performing the concatenation of the feature vectors in
RepVGG and encoders. Due to the encoding-decoding process focused on segmentation, the latent feature
may concentrate on the lung area. Thanks to this connection, the additional features are obtained to enlarge
more useful information before classification. Finally, the fully-connected and output layers are subsequently
added and adjust to fit with the number of classes. Since the pre-trained model dealt with a large scale of
images, in this case, transfer learning is applied in the training process to preserve the previous training
weights by freezing the low-level layers. Thus, the first three stages and the encoders block are frozen while
the remains process re-training.
(a) (b)
Figure 5. The training progress of (a) only using RepVGG and (b) the proposed network
Improving RepVGG model with variational data imputation in COVID-19 classification (Kien Trang)
1284 ISSN: 2252-8938
In addition, to perform a specific analysis in each class, the confusion matrices are depicted in
Figure 6. Over the experiments, the mean accuracy is measured by obtaining the number of correct
predictions over the total images for each class. Mean accuracy is one of the vital factors to evaluate the
performance which indicates the ability to deal with the new data. Generally, compared to the original
RepVGG in Figure 6(a), all of the classification cases of the proposed network in Figure 6(b) give better
results in terms of accuracy. There is a critical improvement in the case of COVID-19 from 83% to 90%
which plays an essential role because our main purpose concentrates on seeking the COVID-19 patients.
Besides, compared to the original RepVGG model, our proposed network also provides better outcomes of
97% and 91% in the case of Normal and Viral Pneumonia, respectively.
The summary results of the experiments are described in Table 2. The metrics used for evaluating
two models are accuracy, precision, recall, F1 score and time processing per image. The overall accuracy of
the joined network is higher than the original network-95.4% and 91.8%, respectively. Precision and recall
are known as powerful metrics to measure imbalanced data, especially in the case of collecting Covid class
due to the scarcity of samples. High precision implies the high confidence of detection in the corresponding
class, while high recall demonstrates the low rate of missing detection of the true class. Our proposed model
also has improved results reaching 96.1% for precision and 97.5% for recall, while F1 score is 96.7%.
Despite attaching more layers, the processing time in one image of the joined network only lasts over 1/3
times of the original RepVGG case.
(a) (b)
Figure 6. The training progress of (a) only using RepVGG and (b) the proposed network
4. CONCLUSION
In this paper, an improvement of RepVGG model has been presented by integrating with variational
data imputation to deal with the classification of COVID-19 based on CXR images. Although the original
RepVGG pre-trained reaches a fairly high accuracy, improvement in image-based disease classification still
needs to be performed continuously. Inspired by the lung segmentation, the proposed model makes good use
of latent features to support the RepVGG for increase the classification performance by concatenating
RepVGG and encoder with data imputation. As a result, our proposed model implies that the average
accuracy reaches 95.4% which is higher than the initial RepVGG. Besides, other evaluation metrics also give
improved results over the original model. This also indicates that the use of deep learning for COVID-19
disease classification could become a reference method for medical therapy to aid in the prevention of
corona-virus spread.
REFERENCES
[1] “WHO coronavirus (COVID-19) dashboard.” World Health Organization. Accessed on: Jul. 28, 2021. [Online]. Available:
https://covid19.who.int/
[2] M. Hasan Jahid, M. Alom Shahin, and M. Ali Shikhar, “Deep learning based detection and segmentation of COVID-19
pneumonia on chest X-ray image,” in 2021 International Conference on Information and Communication Technology for
Sustainable Development, ICICT4SD 2021-Proceedings, Feb. 2021, pp. 210–214., doi: 10.1109/ICICT4SD50815.2021.9396878.
[3] L. Wang, Z. Q. Lin, and A. Wong, “COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19
cases from chest X-ray images,” Scientific Reports, vol. 10, no. 1, Dec. 2020, doi: 10.1038/s41598-020-76550-z.
[4] M. Farooq and A. Hafeez, “COVID-ResNet: a deep learning framework for screening of COVID19 from radiographs,” Mar.
2020, arXiv:2003.14395.
[5] T. M. Quan et al., “XPGAN: X-ray projected generative adversarial network for improving COVID-19 image classification,” in
2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), Apr. 2021, pp. 1509–1513.,
doi: 10.1109/ISBI48211.2021.9434159.
[6] K. K. Singh and A. Singh, “Diagnosis of COVID-19 from chest X-ray images using wavelets-based depthwise convolution
network,” Big Data Mining and Analytics, vol. 4, no. 2, pp. 84–93, Jun. 2021, doi: 10.26599/BDMA.2020.9020012.
[7] Y. Oh, S. Park, and J. C. Ye, “Deep learning COVID-19 features on CXR using limited training data sets,” IEEE Transactions on
Medical Imaging, vol. 39, no. 8, pp. 2688–2700, Aug. 2020, doi: 10.1109/TMI.2020.2993291.
[8] S. Degadwala, D. Vyas, and H. Dave, “Classification of COVID-19 cases using fine-tune convolution neural network (FT-
CNN),” in 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Mar. 2021, pp. 609–613.,
doi: 10.1109/ICAIS50930.2021.9395864.
[9] S. Asif, Y. Wenhui, H. Jin, and S. Jinhai, “Classification of COVID-19 from chest X-ray images using deep convolutional neural
network,” in 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Dec. 2020, pp. 426–433., doi:
10.1109/ICCC51575.2020.9344870.
[10] E. T. Hastuti, A. Bustamam, P. Anki, R. Amalia, and A. Salma, “Performance of true transfer learning using CNN DenseNet121
for COVID-19 detection from chest X-ray images,” in 2021 IEEE International Conference on Health, Instrumentation and
Measurement, and Natural Sciences (InHeNce), Jul. 2021, pp. 1–5., doi: 10.1109/InHeNce52833.2021.9537261.
[11] M. E. H. Chowdhury et al., “Can AI help in screening viral and COVID-19 pneumonia?,” IEEE Access, vol. 8,
pp. 132665–132676, 2020, doi: 10.1109/ACCESS.2020.3010287.
[12] T. Rahman et al., “Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images,”
Computers in Biology and Medicine, vol. 132, May 2021, doi: 10.1016/j.compbiomed.2021.104319.
[13] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9351,
Springer International Publishing, 2015, pp. 234–241., doi: 10.1007/978-3-319-24574-4_28.
[14] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” Dec. 2013, arXiv:1312.6114.
[15] D. Liu and G. Liu, “A transformer-based variational autoencoder for sentence generation,” in 2019 International Joint Conference
on Neural Networks (IJCNN), Jul. 2019, pp. 1–7., doi: 10.1109/IJCNN.2019.8852155.
[16] S. Semeniuta, A. Severyn, and E. Barth, “A hybrid convolutional variational autoencoder for text generation,” in Proceedings of
the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 627–637., doi: 10.18653/v1/D17-1066.
[17] A. Sagar, “Generate high resolution images with generative variational autoencoder,” Aug. 2020, arXiv:2008.10399.
[18] V. Zilvan, A. Ramdan, E. Suryawati, R. B. S. Kusumo, Di. Krisnandi, and H. F. Pardede, “Denoising convolutional variational
autoencoders-based feature learning for automatic detection of plant diseases,” 3rd International Conference on Informatics and
Computational Sciences: Accelerating Informatics and Computational Research for Smarter Society in The Era of Industry 4.0,
Proceedings, Oct. 2019., doi: 10.1109/ICICoS48119.2019.8982494.
[19] N. T. N. Anh, T. Q. Khanh, N. Q. Dat, E. Amouroux, and V. K. Solanki, “Fraud detection via deep neural variational autoencoder
oblique random forest,” in 2020 IEEE-HYDCON, Sep. 2020, pp. 1–6., doi: 10.1109/HYDCON48903.2020.9242753.
[20] R. Selvan et al., “Lung segmentation from chest x-rays using variational data imputation,” May 2020, arXiv:2005.10052.
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International
Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778., doi: 10.1109/CVPR.2016.90.
[24] C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Jun. 2015, pp. 1–9., doi: 10.1109/CVPR.2015.7298594.
[25] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 2261–2269., doi: 10.1109/CVPR.2017.243.
[26] J. Kolbusz, P. Rozycki, and B. M. Wilamowski, “The study of architecture MLP with linear neurons in order to eliminate the
‘vanishing gradient’ problem,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), vol. 10245, Springer International Publishing, 2017, pp. 97–106., doi: 10.1007/978-3-319-
59063-9_9.
[27] K. J. M. Tarnate, M. Devaraj, and J. C. De Goma, “Overcoming the vanishing gradient problem of recurrent neural networks in
the ISO 9001 quality management audit reports classification,” International Journal of Scientific and Technology Research,
vol. 9, no. 3, pp. 6683–6686, 2020
[28] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG: making VGG-style ConvNets great again,” in 2021
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 13728–13737.,
doi: 10.1109/CVPR46437.2021.01352.
[29] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wang, P.-X. Lu, and G. Thoma, “Two public chest X-ray datasets for computer-aided
screening of pulmonary diseases,” Quantitative imaging in medicine and surgery, vol. 4, no. 6, pp. 475–477, 2014
Improving RepVGG model with variational data imputation in COVID-19 classification (Kien Trang)
1286 ISSN: 2252-8938
BIOGRAPHIES OF AUTHORS
Long TonThat obtained M.Sc. and Ph.D. degrees in 2008 and 2012 respectively
at The University of Manchester (UK). Since 2014, he joined International University (IU),
Vietnam National University Ho Chi Minh City (VNU-HCMC) as a Lecturer in Department
of Automation and Control Engineering. His research interests lie mainly in control theory,
nonlinear observer design, biological systems, and computational intelligence. He was also
involved in some papers in biomedical signal processing. He can be contacted at email:
[email protected].
Bao Quoc Vuong received the B.Eng. degree and M.Eng. degree in Electrical
Engineering from School of Electrical Engineering, International University, Vietnam
National University-Ho Chi Minh City (IU- VNUHCMC), in 2014 and 2017, respectively. He
is currently working toward the Ph.D. degree with the Information Science and Technology,
Communication and Knowledge Laboratory (Lab-STICC), Department of Electrical
Engineering, University of Western Brittany, Brest, France. His main interests are in the areas
of signal processing, wireless communication, information theory, Full-duplex transmission.
He can be contacted at email: [email protected].