A Fast Medical Image Super Resolution Method Based
A Fast Medical Image Super Resolution Method Based
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
ABSTRACT Low-resolution medical images can hamper medical diagnosis seriously, especially in the
analysis of retina images and specifically for the detection of macula fovea. Therefore, improving the quality
of medical images and speeding up their reconstruction is particularly important for expert diagnosis. To deal
with this engineering problem, our paper presents a fast medical image super resolution (FMISR) method
whereby the three hidden layers to complete feature extraction is as same as the Super Resolution Convolution
Neural Network (SRCNN). It is important that a well-designed deep learning network processes images in
the low-resolution instead of the high-resolution space and enables the super-resolution reconstruction to be
more efficient. Sub-pixel convolution layer addition and mini-network substitution in the hidden layers are
critical for improving the image reconstruction speed. While the hidden layers is proposed for ensuring
reconstruction quality, our FMISR framework performs significantly faster and produces a higher resolution
images. As such, the technique underlying this framework presents a high potential in retinal macular
examination as it provides a good platform for the segmentation of retinal images.
INDEX TERMS Super resolution, Medical imaging, Deep learning, Medical diagnosis.
2169-3536 © 2017 IEEE. Translations and content mining are permitted for academic research only.
VOLUME XX, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 1
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
better for medical image analysis. But one of the fatal flaws performance and factors like depth, number of filters and
in a deep neural network is the computational cost. filter sizes. Finally, the SRCNN is extended to cope with
Image Super-Resolution Using Deep Convolutional color images and evaluate the performance on different
Networks (SRCNN) [22] is the earliest ancestor of deep channels.
learning method, which aims at recovering a high-resolution There are many different ways for super-resolution
image from a single low-resolution image through reconstruction. Shi et al. [24] proposed a method called Real-
Convolution Neural Network (CNN) [23]. There have been Time Single Image and Video Super-Resolution Using an
a few studies of using deep learning techniques for image Efficient Sub-Pixel Convolutional Neural Network
restoration. The deep models in these methods are not (ESPCN). However, compared with the method of SRCNN,
specifically designed to be an end-to-end solution. On the the ESPCN method is lack of contextual information after
contrary, the proposed SRCNN optimizes an end-to-end reconstruction. Meanwhile it is not enough to express the
mapping. Because of its simple convolution neural network characteristics of object. In 2017, Gao et al. [4] utilized a
structure, it can be used to cope with the issue of image deep convolutional network for medical image super-
segmentation. Furthermore, the SRCNN is faster in terms of resolution that is an improved SRCNN algorithm. The
speed. This method can also be employed in other fields of reconstructed CT images can clearly provide an important
object recognition. SRCNN firstly uses a bicubic reference for clinicians to make the correct treatment
interpolation to amplify its size. Secondly, it performs decisions. Although it is able to achieve a better quality, it
nonlinear mapping through a three-layer convolutional costs significant time. Reducing the time to rebuild images
neural network. The resulting output is used as a has become a problem that needs to be solved urgently.
reconstructed high-resolution image. The whole process can In this paper, we focus on shortening the time of image
be divided into two parts: 1) Patch Extraction and reconstruction and optimize the structure for speed. An
Representation, as well as 2) Non-Linear Mapping and efficient structure for reconstruction named fast medical
Reconstruction. image super resolution (FMISR) is proposed. It is a
The advantage of SRCNN lies in the simplicity of its combined sub-pixel convolutional layer and mini-network in
three-layer convolutional neural network, ease of order to shorten the time of super-resolution. In addition, we
convergence, low computational complexity, and ability to implemented hidden layers to remain the information while
quickly reconstruct high-resolution image while maintaining training the images for improving the quality of the
high quality. Nevertheless, due to its relatively shallow reconstruction. Next, we address the problem on how to
network, the image features required for reconstruction obtain the quality of an image. In particular, the Peak Signal
cannot be extracted effectively; although larger cores can to Noise Ratio (PSNR) is an engineering term for the ratio
reduce the amount of computation, a large amount of between the maximum possible power of a signal and the
information will be lost in each convolution, which results a power of corrupting noise that affects the fidelity of its
poverty reconstruction image in the end. For example, the representation. And it is the most common and widely used
ringing effect is caused by selecting an inappropriate image objective measurement method of quality evaluation [25].
model in image restoration. The direct cause of the ringing In the following, we will first present the structure of
effect is the loss of information during image degradation, FMISR and then illustrate why it gets the speed improved
especially due to the loss of high frequency information. and the details of every component in it. Next, we
In various experiments, the SRCNN shows extensibility demonstrate how to conduct experiments and elaborate what
and portability. The researchers investigate the impact of the experimental results will be in the next two sections.
using different datasets on the model performance. Next, Through discussion, we draw a conclusion for this paper and
they explore different architecture designs of the network, introduce the future work.
and study the relations between super-resolution
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
Hidden layers
Sub-pixel convolution layer
II. Fast Medical Image Super Resolution Based on Deep periodic shuffling operator reconstructs a high-resolution
Learning image from low-resolution feature maps directly without
As showed in Fig.1, the proposed fast medical image super convolution computational cost, it costs less time compared
resolution method is based on a well-designed deep learning with other operators. Mathematically, this operation can be
network, which comprises three components, i.e., sub-pixel described in the following way, and the T means transfer:
convolutional layer, mini-network and hidden layers.
Among these components, the mini-network and the sub- 𝑃𝑆(𝑇)𝑥,𝑦,𝑐 = 𝑇[ 𝑥 ],[𝑦],𝑐∙𝑅∙𝑚𝑜𝑑(𝑦,𝑅)+𝑐∙𝑚𝑜𝑑(𝑥,𝑅) (1)
𝑅 𝑅
pixel convolution layer are designed for improving the
reconstruction speed, since the mini-network is a small B. MINI-NETWORK
convolution neural network and the sub-pixel convolution In order to shorten the time of super-resolution, two 3*3
layer can be directly used as the super resolution image convolution kernels cascade is nested in hidden layers, named
output layer. Then, in the following sections, these the mini-network. After analyzing the SRCNN 9-5-5 model,
components will be introduced in detail. the second layer can achieve a better feature map with the 5*5
convolution kernel. We replace the 5*5 convolution kernel by
A. SUB-PIXEL CONVOLUTION LAYER this mini-network in order to obtain the same consequence but
In the last layer, we applied the sub-pixel convolution layer on a much faster basis.
that is proposed by Shi et al. [24] who implemented an Large convolution kernel can achieve a greater receptive
efficient neural network (ESPCN) to reconstruct a low- field, but also adds more parameters, then increases the
resolution image. In contrast to SRCNN [22], the amount of computation. Since the number of parameters is
reconstruction of the Shi model is directly in the low- related to the convolution kernel size, a small convolution
resolution space [24]. The sub-pixel convolution layer in the kernel is advantageous. Note that we use the ReLU activation
model can be indirectly amplification process of the function to extract the non-linear feature, whereby its
implementation of the image. It achieves a high-resolution calculation is lower than the Tanh activation function. This is
image from low-resolution feature maps directly with one because the ReLU activation function only determine whether
upscaling filter for each feature map (R*R channels) as shown the input is greater than zero. The details of the image are
in Fig.1. In a convolution layer, different convolution kernel related to the receptive field extracted by the filter. In mini-
W of size k can be activated in low-resolution space. The network, the two 3*3 convolution kernels cascade ensure the
number of activation patterns is exactly R*R (as shown in Fig same receptive field using the 5*5 convolution kernel.
1). In the position of activation patterns, there are [k/R]2 According to time complexity formula (2) and the
weights activated. These patterns are periodically activated convolution formula (3), we make a calculation comparison
during the convolution of the kernel across the image and parameter comparison with ESPCN shown in Table 2.
depending on the different sub-pixel location: mod(x, r) ,
mod(y, r) where (x,y) are the output pixel coordinates in 𝑇 = 𝑂(𝑁 2 ∗ 𝐾 2 ∗ 𝐹) (2)
high-resolution space to rearrange the elements. The key 𝐼−𝐾−2∗𝑃
𝑂𝑢𝑡𝑝𝑢𝑡 = +1 (3)
operator of sub-pixel convolution layer is a periodic shuffling 𝑆
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
layers are being added, the more features the network will be where μ is the mean or expectation of the distribution, 𝜎 is the
learned. Moreover, we added a new layer to exploit the inner standard deviation, and σ2 is the variance in it.
high-frequency components compared with ESPCN. At the same time, batch size in the training data sets is 128
In ESPCN, the convolution kernel’s number is 64 and 32 and that of the testing data sets is 32. The Euclidean Loss
respectively, the two layers applied the Tanh activation function is used to compute the loss between predictive value
function. Our hidden layers are composed by three and label value which is defined by [27]:
convolution layers. Here, when more convolution neural
1
network layers are added, the more characteristics can be 𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐿𝑜𝑠𝑠 = ∑𝑁 ̂𝑛 − 𝑦𝑛 ||22
𝑛=1 ||𝑦 (5)
2𝑁
extracted from an image. The Conv1 means the first layer in
hidden layers. Then, the Conv1 and Conv3 contain the same whereby 𝑁 means the total number of input image, 𝑛 is the
convolution kernel number (32), the same size of convolution number of input image, 𝑦
̂𝑛 is the predictive value and 𝑦𝑛 is the
kernel (3*3) and the same activation function (Tanh). In the label value.
middle of the mini-network, involves two convolution kernels
cascade with the size of 3*3 convolution kernel and ReLU B. DATASET AND PROTOCOL
activation function. The parameters that we have set are shown During the training phase, the publicly available benchmark
in Table 2. datasets contain Timofte dataset [28], 91 training images and
TABLE 2. PARAMETERS OF HIDDEN LAYERS two test datasets Set5 and Set14 which provide 5 and 14
Details of hidden layers Parameters images. The Berkeley segmentation dataset [29] is BSD300
Conv1 3*3*32 and Tanh and BSD500, which provide 100 and 200 images for testing.
Mini-network (3*3+3*3) *64 and ReLU And the super texture dataset [30] that provides 136 texture
Conv3 3*3*32 and Tanh
images is used.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
FMISR algorithms is based on the Caffe codes in the computer D. IAMGE RECONSTRUCTION QUALITY RESULT
with GTX 1050-Ti. Therefore, the same experimental We record the value of the PSNR [25] that is the objective
environment is guaranteed to ensure that there is only one criteria for measuring image distortion or noise levels. The
variable. The results are presented in Table 3. Compared to PSNR is for grey-level (8 bits) images. In the formula (5),
ESPCN model, our model has two acceleration modules that given input image 𝑓 and the reconstructed image 𝑓′, both of
are optimized for both the number of parameters and the size 𝑀 ∗ 𝑁, the PSNR between 𝑓 and 𝑓′ is defined by:
design of the structure, making it a lightweight network (255)2
structure. 𝑃𝑆𝑁𝑅 = 10𝑙𝑜𝑔10 (6)
𝑀𝑆𝐸
TABLE 3. PSNR(DB) AND SR-TIME FOR DIFFERENT DATASETS IN 300000
ITERATIONS where
Dataset Scale Bicubic ESPCN OUR 1
𝑀𝑆𝐸 = ∑𝑀 𝑁
𝑖=1 ∑𝑗=1(𝑓 − 𝑓′)
2
(7)
PSNR/SR- PSNR/SR- 𝑀𝑁
PSNR
Time Time
Brain 3 24.553 25.080/0.298s 25.502/0.220 The larger PSNR value between two images shows a higher
s image quality. The common reference is 30dB, and the image
Abdomen1 3 27.815 28.829/0.261s 29.891/0.248 deterioration is obvious below 30dB. And Fig.2, Fig.3 and
s
Abdomen2 3 26.525 27.504/0.270s 28.161/0.267 Fig.4 are the visualization of super-resolution, which
s compared the ESPCN and our FMISR method. The process of
Knee 3 32.898 35.131/0.309s 35.309/0.227
s
super-resolution can also be visualized is in Fig.5.
Cell 3 26.562 27.964/0.264s 27.980/0.274
s
SR-Time/s None None 28.901/0.280s 29.368/0.240
s
FIGURE 2. The brain image from Brain dataset with an upscaling factor 3
FIGURE 3. Super-resolution examples for abdomen from public dataset Abdomen1 based on an upscaling factor of 3 with their corresponding PSNR
values shown under each sub-figure.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
FIGURE 4. Knee image processing based on a public Knee dataset based on an upscaling factor 3 with their corresponding PSNR values shown
under each sub-figure.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
SRCNN/37.64dB SCN/37.90dB
Original Mask
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
The PSNR after image reconstruction has been improved by good effect on reconstruction. In future studies, the
0.95dB when compared with the ESPCN based on the low implementation of deconvolution layers in this field of image
resolution retinal image. In our study, we also used the processing can be used as a research motivation. Whether
masking technique to test the luminance sensitive area. there is an optimized deconvolution layer that can contain
According to our experimental results, our reconstruction the original image information is the question of interest in
algorithm extracts a better receptive field in the training the research. We present a reasonable hypothesis stating that
process of the neural network where the brightness changes there are some relationships between convolution layer and
obviously and results in a better reconstruction effect. deconvolution layer based on information from neural
With regards to the application of deconvolution layers, network training.
multiple deconvolution layers are now used for the
visualization of neural network. During the reconstruction
process, the selection of the picture magnification is mostly
implemented using the deconvolution layers and direct linear
interpolation. However, these two methods do not have a
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.