0% found this document useful (0 votes)
61 views10 pages

A Fast Medical Image Super Resolution Method Based

This document presents a fast medical image super resolution method using deep learning. It proposes a framework called FMISR that uses a three-layer convolutional neural network similar to SRCNN for feature extraction. FMISR aims to improve reconstruction speed through sub-pixel convolution and substituting smaller networks in hidden layers while maintaining reconstruction quality.

Uploaded by

emeliterary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views10 pages

A Fast Medical Image Super Resolution Method Based

This document presents a fast medical image super resolution method using deep learning. It proposes a framework called FMISR that uses a three-layer convolutional neural network similar to SRCNN for feature extraction. FMISR aims to improve reconstruction speed through sub-pixel convolution and substituting smaller networks in hidden layers while maintaining reconstruction quality.

Uploaded by

emeliterary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

A Fast Medical Image Super Resolution Method


Based on Deep Learning Network
SHENGXIANG ZHANG, GAOBO LIANG, SHUWAN PAN, LIXIN ZHENG*
Fujian Provincial Academic Engineering Research Centre in Industrial Intellectual Techniques and Systems, College of Engineering, Huaqiao
University, Quanzhou, China

*Corresponding author: LIXIN ZHENG (E-mail: [email protected] ).


This work was supported in part by the Science and Technology Bureau of Xiamen under Grant 3502Z20173045, in part by Technology Bureau of Quanzhou
under Grant 2017G036.

ABSTRACT Low-resolution medical images can hamper medical diagnosis seriously, especially in the
analysis of retina images and specifically for the detection of macula fovea. Therefore, improving the quality
of medical images and speeding up their reconstruction is particularly important for expert diagnosis. To deal
with this engineering problem, our paper presents a fast medical image super resolution (FMISR) method
whereby the three hidden layers to complete feature extraction is as same as the Super Resolution Convolution
Neural Network (SRCNN). It is important that a well-designed deep learning network processes images in
the low-resolution instead of the high-resolution space and enables the super-resolution reconstruction to be
more efficient. Sub-pixel convolution layer addition and mini-network substitution in the hidden layers are
critical for improving the image reconstruction speed. While the hidden layers is proposed for ensuring
reconstruction quality, our FMISR framework performs significantly faster and produces a higher resolution
images. As such, the technique underlying this framework presents a high potential in retinal macular
examination as it provides a good platform for the segmentation of retinal images.

INDEX TERMS Super resolution, Medical imaging, Deep learning, Medical diagnosis.

I. INTRODUCTION combination of elements from an appropriately chosen over-


In medical image analysis, the typical common medical complete dictionary. Inspired by this observation, we seek a
imaging systems that are utilized for expert diagnosis are sparse representation for each patch of the low-resolution
nuclear Magnetic Resonance Imaging (MRI) [1], Computed input, and then utilize the coefficients of this representation
Tomography (CT) [2], Positron Emission Computed to generate the high-resolution output. Theoretical results
Tomography (PET-CT) [3] and Ultrasound (US) [2]. from compressed sensing suggest that under mild conditions,
However these images have low resolution, inherent noise, the sparse representation can be correctly recovered from the
and lack of structural information. Due to hardware devices down-sampled signals. However, in this reconstruction
and existing imaging technology limitations, image super-
algorithm, the coefficient regularization outcome is not
resolution processing is favored by medical experts and
manifest. Zhang et al. [19] also proposed an improved
researchers for its advantages of being intuitionistic,
algorithm that introduces the local constraint of weighting.
noninvasive, convenient and secure [4].
Single image super-resolution (SISR) technology However, when processing of some images to achieve more
principally is partitioned into three aspects: edge-based [5], details, it tends to trigger excessive texture.
image-based statistics [6-9], and sample-based method [10- Propelled by the drive of big data such as ImageNet
16]. Among them, a sparse representation proposed by Yang database, the image super-resolution method based on deep
et al. [17] has always occupied the dominant position in the neural network is promising, which greatly promotes the
field of super-resolution restoration. Yang J et al [18] also prosperity of image reconstruction algorithm. In 2016, Zhao
published a classic paper on image super-resolution, named et al. [20] proposed a novel technique in order to address the
image super-resolution via sparse representation, in 2010. problem of super-resolution (SR) for medical ultrasound (US)
This paper researched on image statistics and suggested that images. Tajbakhsh et al. [21] discussed whether using full
image patches can be well-represented as a sparse linear training or fine tuning in convolutional neural networks is

2169-3536 © 2017 IEEE. Translations and content mining are permitted for academic research only.
VOLUME XX, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 1
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

better for medical image analysis. But one of the fatal flaws performance and factors like depth, number of filters and
in a deep neural network is the computational cost. filter sizes. Finally, the SRCNN is extended to cope with
Image Super-Resolution Using Deep Convolutional color images and evaluate the performance on different
Networks (SRCNN) [22] is the earliest ancestor of deep channels.
learning method, which aims at recovering a high-resolution There are many different ways for super-resolution
image from a single low-resolution image through reconstruction. Shi et al. [24] proposed a method called Real-
Convolution Neural Network (CNN) [23]. There have been Time Single Image and Video Super-Resolution Using an
a few studies of using deep learning techniques for image Efficient Sub-Pixel Convolutional Neural Network
restoration. The deep models in these methods are not (ESPCN). However, compared with the method of SRCNN,
specifically designed to be an end-to-end solution. On the the ESPCN method is lack of contextual information after
contrary, the proposed SRCNN optimizes an end-to-end reconstruction. Meanwhile it is not enough to express the
mapping. Because of its simple convolution neural network characteristics of object. In 2017, Gao et al. [4] utilized a
structure, it can be used to cope with the issue of image deep convolutional network for medical image super-
segmentation. Furthermore, the SRCNN is faster in terms of resolution that is an improved SRCNN algorithm. The
speed. This method can also be employed in other fields of reconstructed CT images can clearly provide an important
object recognition. SRCNN firstly uses a bicubic reference for clinicians to make the correct treatment
interpolation to amplify its size. Secondly, it performs decisions. Although it is able to achieve a better quality, it
nonlinear mapping through a three-layer convolutional costs significant time. Reducing the time to rebuild images
neural network. The resulting output is used as a has become a problem that needs to be solved urgently.
reconstructed high-resolution image. The whole process can In this paper, we focus on shortening the time of image
be divided into two parts: 1) Patch Extraction and reconstruction and optimize the structure for speed. An
Representation, as well as 2) Non-Linear Mapping and efficient structure for reconstruction named fast medical
Reconstruction. image super resolution (FMISR) is proposed. It is a
The advantage of SRCNN lies in the simplicity of its combined sub-pixel convolutional layer and mini-network in
three-layer convolutional neural network, ease of order to shorten the time of super-resolution. In addition, we
convergence, low computational complexity, and ability to implemented hidden layers to remain the information while
quickly reconstruct high-resolution image while maintaining training the images for improving the quality of the
high quality. Nevertheless, due to its relatively shallow reconstruction. Next, we address the problem on how to
network, the image features required for reconstruction obtain the quality of an image. In particular, the Peak Signal
cannot be extracted effectively; although larger cores can to Noise Ratio (PSNR) is an engineering term for the ratio
reduce the amount of computation, a large amount of between the maximum possible power of a signal and the
information will be lost in each convolution, which results a power of corrupting noise that affects the fidelity of its
poverty reconstruction image in the end. For example, the representation. And it is the most common and widely used
ringing effect is caused by selecting an inappropriate image objective measurement method of quality evaluation [25].
model in image restoration. The direct cause of the ringing In the following, we will first present the structure of
effect is the loss of information during image degradation, FMISR and then illustrate why it gets the speed improved
especially due to the loss of high frequency information. and the details of every component in it. Next, we
In various experiments, the SRCNN shows extensibility demonstrate how to conduct experiments and elaborate what
and portability. The researchers investigate the impact of the experimental results will be in the next two sections.
using different datasets on the model performance. Next, Through discussion, we draw a conclusion for this paper and
they explore different architecture designs of the network, introduce the future work.
and study the relations between super-resolution

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

3*3 3*3+3*3 3*3


Original Tanh ReLU Tanh
Image(Input) 32 64 32 R*R channels SR Image(Output)

Conv1 Mini-network Conv3

Hidden layers
Sub-pixel convolution layer

FIGURE 1. Structure framework of FMISR

II. Fast Medical Image Super Resolution Based on Deep periodic shuffling operator reconstructs a high-resolution
Learning image from low-resolution feature maps directly without
As showed in Fig.1, the proposed fast medical image super convolution computational cost, it costs less time compared
resolution method is based on a well-designed deep learning with other operators. Mathematically, this operation can be
network, which comprises three components, i.e., sub-pixel described in the following way, and the T means transfer:
convolutional layer, mini-network and hidden layers.
Among these components, the mini-network and the sub- 𝑃𝑆(𝑇)𝑥,𝑦,𝑐 = 𝑇[ 𝑥 ],[𝑦],𝑐∙𝑅∙𝑚𝑜𝑑(𝑦,𝑅)+𝑐∙𝑚𝑜𝑑(𝑥,𝑅) (1)
𝑅 𝑅
pixel convolution layer are designed for improving the
reconstruction speed, since the mini-network is a small B. MINI-NETWORK
convolution neural network and the sub-pixel convolution In order to shorten the time of super-resolution, two 3*3
layer can be directly used as the super resolution image convolution kernels cascade is nested in hidden layers, named
output layer. Then, in the following sections, these the mini-network. After analyzing the SRCNN 9-5-5 model,
components will be introduced in detail. the second layer can achieve a better feature map with the 5*5
convolution kernel. We replace the 5*5 convolution kernel by
A. SUB-PIXEL CONVOLUTION LAYER this mini-network in order to obtain the same consequence but
In the last layer, we applied the sub-pixel convolution layer on a much faster basis.
that is proposed by Shi et al. [24] who implemented an Large convolution kernel can achieve a greater receptive
efficient neural network (ESPCN) to reconstruct a low- field, but also adds more parameters, then increases the
resolution image. In contrast to SRCNN [22], the amount of computation. Since the number of parameters is
reconstruction of the Shi model is directly in the low- related to the convolution kernel size, a small convolution
resolution space [24]. The sub-pixel convolution layer in the kernel is advantageous. Note that we use the ReLU activation
model can be indirectly amplification process of the function to extract the non-linear feature, whereby its
implementation of the image. It achieves a high-resolution calculation is lower than the Tanh activation function. This is
image from low-resolution feature maps directly with one because the ReLU activation function only determine whether
upscaling filter for each feature map (R*R channels) as shown the input is greater than zero. The details of the image are
in Fig.1. In a convolution layer, different convolution kernel related to the receptive field extracted by the filter. In mini-
W of size k can be activated in low-resolution space. The network, the two 3*3 convolution kernels cascade ensure the
number of activation patterns is exactly R*R (as shown in Fig same receptive field using the 5*5 convolution kernel.
1). In the position of activation patterns, there are [k/R]2 According to time complexity formula (2) and the
weights activated. These patterns are periodically activated convolution formula (3), we make a calculation comparison
during the convolution of the kernel across the image and parameter comparison with ESPCN shown in Table 2.
depending on the different sub-pixel location: mod(x, r) ,
mod(y, r) where (x,y) are the output pixel coordinates in 𝑇 = 𝑂(𝑁 2 ∗ 𝐾 2 ∗ 𝐹) (2)
high-resolution space to rearrange the elements. The key 𝐼−𝐾−2∗𝑃
𝑂𝑢𝑡𝑝𝑢𝑡 = +1 (3)
operator of sub-pixel convolution layer is a periodic shuffling 𝑆

operator (PS). A periodic shuffling operator is to replace the


In time complexity formula (2), 𝑻 means the time
elements of a H*W*C*R2 tensor to a tensor of shape
complexity; 𝐍 is the size of input image; 𝐊 is the convolution
RH*RW*C. Here, C means color of the image. Because the
kernel size, where 𝑭 is the number of filters. In the

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

convolution formula (3), Input image size is I; 𝐊 is the A. SETUP


convolution kernel size; 𝐏 means padding which is the The whole network in our setup includes the training network
addition of an extra layer of zero between the images to make and testing network. In training network, the basic learning
the output image of the same size as the input. 𝐒 represents rate is 10-4, the weight decay is zero and momentum is 0.9 in
the step length of the filter in both horizontal and vertical the process of iteration. The gradient descent algorithm we
directions in the original image. For convenience of choose is Mini-Batch Gradient Descent (MBGD). The
discussion we assume that 𝐏 =0, 𝐒 =1. pseudocode is as follows:
TABLE 1. MINI-NETWORK AND 5*5 MODELS Mini-Batch Gradient Descent
ESPCN (5*5) Our (3*3+3*3) Repeat
{
Time O(N2*(5*5*64)) O(N2*(3*3+3*3) *64) For i = 1,11, 21 ,31 ,…, 911
Complexity =O(1600*N2) =O(1152*N2) {
1
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 ∑𝑖+9 (ℎ (𝑥 (𝑘) ) − 𝑦 (𝑘) )𝑥𝑗 (𝑘) )
10 𝑘=𝑖 𝜃
Parameter 5*5+1 = 26 3*3+1+3*3+1 = 20 (for every j =0,…,n)
}
Calculation 25(I-4)2 9(I-2)2+9(I-4)2 }
In this pseudocode, ∑𝑖+9 𝑘=𝑖(ℎ𝜃 (𝑥
(𝑘) )
− 𝑦 (𝑘) )𝑥𝑗 (𝑘) ) defines
the loss
We computed the time complexity and the number of function. If the number of samples in the training set is 1000,
parameters in the mini-network and 5*5 convolutional kernel then each mini-batch is only a subset, assuming that there are
respectively. It is obvious that mini-network has an advantage 10 samples in each mini-batch. In this way, the entire training
in the performance of both. We compare the results of the data set can be divided into 100 mini-batches. When there is a
calculation, the value of I is greater than 10, the same I for this flat area in the error surface, mini-batch gradient descent can
mini-network is smaller than the 5*5 model in terms of learn faster.
multiplication and addition. In every convolution layer, we implemented the gaussian
distribution to initialize weights which is showed in formula
C. HIDDEN LAYERS (4) [26].
Deep architecture is composed of multiple layers of
(𝑥−𝜇)2
parameterized non-linear modules, and the parameters of 1 −
𝑋~𝑁(𝑥|𝜇, 𝜎 2 ) = 2
𝑒 2𝜎2 (4)
every module are subjected to learning. The more hidden √2𝜋𝜎

layers are being added, the more features the network will be where μ is the mean or expectation of the distribution, 𝜎 is the
learned. Moreover, we added a new layer to exploit the inner standard deviation, and σ2 is the variance in it.
high-frequency components compared with ESPCN. At the same time, batch size in the training data sets is 128
In ESPCN, the convolution kernel’s number is 64 and 32 and that of the testing data sets is 32. The Euclidean Loss
respectively, the two layers applied the Tanh activation function is used to compute the loss between predictive value
function. Our hidden layers are composed by three and label value which is defined by [27]:
convolution layers. Here, when more convolution neural
1
network layers are added, the more characteristics can be 𝐸𝑢𝑐𝑙𝑖𝑑𝑒𝑎𝑛 𝐿𝑜𝑠𝑠 = ∑𝑁 ̂𝑛 − 𝑦𝑛 ||22
𝑛=1 ||𝑦 (5)
2𝑁
extracted from an image. The Conv1 means the first layer in
hidden layers. Then, the Conv1 and Conv3 contain the same whereby 𝑁 means the total number of input image, 𝑛 is the
convolution kernel number (32), the same size of convolution number of input image, 𝑦
̂𝑛 is the predictive value and 𝑦𝑛 is the
kernel (3*3) and the same activation function (Tanh). In the label value.
middle of the mini-network, involves two convolution kernels
cascade with the size of 3*3 convolution kernel and ReLU B. DATASET AND PROTOCOL
activation function. The parameters that we have set are shown During the training phase, the publicly available benchmark
in Table 2. datasets contain Timofte dataset [28], 91 training images and
TABLE 2. PARAMETERS OF HIDDEN LAYERS two test datasets Set5 and Set14 which provide 5 and 14
Details of hidden layers Parameters images. The Berkeley segmentation dataset [29] is BSD300
Conv1 3*3*32 and Tanh and BSD500, which provide 100 and 200 images for testing.
Mini-network (3*3+3*3) *64 and ReLU And the super texture dataset [30] that provides 136 texture
Conv3 3*3*32 and Tanh
images is used.

C. IAMGE RECONSTRUCTION SPEED RESULT


III. EXPERIMENTS AND RESULTS In this section, we evaluated our model’s run time on publicly
The experimental environment includes hardware devices and available dataset IDI (I Do Imaging) [31] with an upscale
software configurations. The computer configuration is factor of 3. IDI is the resource location to find the free and the
Intel(R) Core(TM) [email protected], and the GPU is open source medical imaging software, whereby one can find
NVIDIA GeForce GTX 1050-Ti. The experimental platform nearly 300 software projects that are neatly categorized,
is equipped with 64-bit Windows 7, Caffe, MatlabR2016a, ranked, and searchable. Implementation of both ESPCN and
CUDA Toolkit v8.0, and Anaconda2.

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

FMISR algorithms is based on the Caffe codes in the computer D. IAMGE RECONSTRUCTION QUALITY RESULT
with GTX 1050-Ti. Therefore, the same experimental We record the value of the PSNR [25] that is the objective
environment is guaranteed to ensure that there is only one criteria for measuring image distortion or noise levels. The
variable. The results are presented in Table 3. Compared to PSNR is for grey-level (8 bits) images. In the formula (5),
ESPCN model, our model has two acceleration modules that given input image 𝑓 and the reconstructed image 𝑓′, both of
are optimized for both the number of parameters and the size 𝑀 ∗ 𝑁, the PSNR between 𝑓 and 𝑓′ is defined by:
design of the structure, making it a lightweight network (255)2
structure. 𝑃𝑆𝑁𝑅 = 10𝑙𝑜𝑔10 (6)
𝑀𝑆𝐸
TABLE 3. PSNR(DB) AND SR-TIME FOR DIFFERENT DATASETS IN 300000
ITERATIONS where
Dataset Scale Bicubic ESPCN OUR 1
𝑀𝑆𝐸 = ∑𝑀 𝑁
𝑖=1 ∑𝑗=1(𝑓 − 𝑓′)
2
(7)
PSNR/SR- PSNR/SR- 𝑀𝑁
PSNR
Time Time
Brain 3 24.553 25.080/0.298s 25.502/0.220 The larger PSNR value between two images shows a higher
s image quality. The common reference is 30dB, and the image
Abdomen1 3 27.815 28.829/0.261s 29.891/0.248 deterioration is obvious below 30dB. And Fig.2, Fig.3 and
s
Abdomen2 3 26.525 27.504/0.270s 28.161/0.267 Fig.4 are the visualization of super-resolution, which
s compared the ESPCN and our FMISR method. The process of
Knee 3 32.898 35.131/0.309s 35.309/0.227
s
super-resolution can also be visualized is in Fig.5.
Cell 3 26.562 27.964/0.264s 27.980/0.274
s
SR-Time/s None None 28.901/0.280s 29.368/0.240
s

Original/PSNR Bic/24.553dB ESPCN/25.080dB Our/25.502dB

FIGURE 2. The brain image from Brain dataset with an upscaling factor 3

Original/PSNR Bic/26.525dB ESPCN/27.504dB Our/28.161dB

FIGURE 3. Super-resolution examples for abdomen from public dataset Abdomen1 based on an upscaling factor of 3 with their corresponding PSNR
values shown under each sub-figure.

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

Original/PSNR Bic/32.898dB ESPCN/35.131dB Our/35.309dB

FIGURE 4. Knee image processing based on a public Knee dataset based on an upscaling factor 3 with their corresponding PSNR values shown
under each sub-figure.

Input Conv1 mini-network Conv3 Output

FIGURE 5. Process of the Super-Resolution technique

E. Super-Resolution in the low resolution retinal image


for the detection of macula fovea
We studied the mechanism of diabetic retinopathy as follows.
In the fundus image, the macula lies slightly below the 3.5mm
disc and has a brown oval structure. It is the most sensitive part
of visual acuity whereby any damage that occurs in the center
of the macula may cause vision impairment or even blindness.
We started to reconstruct high-resolution images from low-
resolution retinal images so that the macular center will
become more apparent. In this way, it is not only allowed
medical experts to quickly and accurately determine the cause, Original Blue

but also results in having the advantage of not needing high-


resolution camera.
Step 1. The dataset is a Digital Retinal Images for Vessel
Extraction (DRIVE) [32]. Here, we truncated the pictures into
a 256*256 low-resolution picture using OpenCV and divide
each picture image into three channels. The vascular profile
and edges in the green channel are much clearer and the
contrast is higher; The red channel is brighter but the blood
vessels are not obvious; The blue channel is less noisy Green Red
compared to the red channel; however, the contrast of the FIGURE 6. Dividing the original retina image into the three channels
blood vessel and the background is not significant enough, Step 2. Reconstruct a single retinal image using multiple
which is not conducive to the later segmentation work. We reconstruction algorithms based on CNN method. In the
choose the green channel as an ideal input image. Note that the choice of contrast method, SCN (sparse coding-based network)
three channels are shown in Fig.6. [34], SRCNN [22] and ESPCN [24] are selected. Because they
are the algorithms by learning features to reconstruct a single
image.

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

Original/PSNR Bicubic/36.862dB Original_in_Mask Original_out_Mask

SRCNN/37.64dB SCN/37.90dB

FIGURE 8. Grayscale histogram of an area near the macula using


FMISR.
ESPCN/37.95dB OUR/38.90dB
FIGURE 7. Different methods in reconstructing a single retinal image. The graph in Fig.9 shows the histograms of the gray
histograms in the mask area computed using five algorithms.
Step 3. In order to compare the advantages of our
The red color curve is the line that pertains to our FMISR
reconstruction algorithm in macular detection, we
algorithm. We have found that our network structure performs
implemented masking techniques to count the high frequency
better at brightness sensitive sites than other algorithms. We
information in local regions of the image. As shown in Fig.8,
discussed the FMISR’s PSNR value and reconstructing time
we utilized a mask to count the grayscale histogram of an area
using the three different CNN methods. Performance
near the macula, and we observe the proportion and
parameters are presented in Table 4.
information of our algorithm in the histogram information of
the entire image. The blue curve expresses the high frequency
information of the whole picture, and the red curve means the
high frequency information of the area we masked.

Original Mask

FIGURE 9. Comparison curves of five types of algorithms.


TABLE 4. COMPARING PERFORMANCE WITH DIFFERENT SUPER-
RESOLUTION METHODS

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

C. APPLY TO RETINAL IMAGE FOR THE DETECTION


Method Scale Bicubic SCN SRCNN ESPCN Our OF MACULA FOVEA
The super resolution reconstruction algorithm is changing
PSNR/ 3 36.862 37.901 37.648 37.951 38.908 rapidly and is becoming more and more widely used. In the
dB medical field, the improved algorithm based on SRCNN
proposed by Zhao et al. [20] has also been applied in CT
SR- 3 0.025 4.957 2.460 0.346 0.293 medical images. We used open medical image datasets to test
Time/s
a variety of super-resolution algorithms. In the case of diabetic
retinopathy, we found the center of the macula by super-
IV. DISCUSSION resolution reconstruction and achieved effective results.
Applying this method to retinal image for the detection of
A. IMAGE RECONSTRUCTION SPEED IMPROVED macula fovea immensely reduce the burden of judgment on
In the traditional super-resolution reconstruction algorithms medical images by researchers and medical experts. We also
introduced above, the reconstruction time has always been an compared and evaluated many reconstruction algorithms, and
aspect neglected by researchers. Traditional algorithms have in our method, the results of PSNR are much higher than
been devoted to improving algorithms to increase the quality others in quality. The reconstruction speed is enormously
of super-resolution reconstruction. However, with the faster than traditional reconstruction algorithms.
prospering and promotion of deep learning algorithms, the To prove the superiority of the algorithm, we implemented
SRCNN algorithm based on deep learning has broken the a set of application experiments based on the retinal images.
record of most traditional algorithms for reconstruction quality. For a better detection of macula fovea, we attempt to
More importantly, it has greatly improved the speed of reconstruct an acceptable image as much as possible. We
processing. Then, researchers started to study how to achieve compared the three super-resolution reconstruction methods
a faster reconstruction speed while guaranteeing the quality of by learning, SCN [34], SRCNN [22], ESPCN [24] and a
reconstruction. traditional interpolation-based function, Bicubic [33]. From
In the Table 4, we utilized bold fonts to indicate the figure the Fig 9, the red color line that pertains to our algorithm
values pertaining to the better reconstruction results. We test (FMISR) is obviously higher in terms of performance in
the time for the reconstruction of the algorithm based on five comparison with the other lines. This shows that in the whole
different test datasets, and the results were positive on all four picture image, the high frequency information in the
of the test datasets. Our average common speed is increased reconstructed image processed by the algorithm is
by 0.24s, which is used to reconstruct a single image from IDI significantly increased. Note that we can also draw
(I Do Imaging) [31] on 1050-Ti GPU. Compared with the conclusions from the data in Table 4. Our reconstruction
ESPCN algorithm in the Matlab codes that are provided by model runs on a magnitude that is much faster than SCN
[24], the time of FMISR super-resolution reconstruction of model, SRCNN model and ESPCN model (+4.664s, +2.167s
single image is reduced by 40ms. and +0.053s improved). In terms of reconstruction quality, our
model achieves a much better super-resolution performance
B. IMAGE RECONSTRUCTION QUALITY IMPROVED
(+1.007dB, +1.260dB and +0.957dB) in comparison to its
The improvement of the quality of super-resolution
counterparts.
reconstruction is an eternal topic. From the edge-based
reconstruction of traditional algorithms to the reconstruction V. CONCLUSION
of sparse dictionaries, emphasis has been placed on quality In this paper, we demonstrated an efficient network model
improvement. In the algorithm published by Wang et al. [34] specifically for medical image super-resolution method,
in 2010, the traditional algorithm has pushed the quality of which is based on an increased convolution layer in order to
super-resolution reconstruction to a peak. But later on, the achieve better picture reconstruction. The time for image
subsequent deep learning reconstruction algorithm proposed reconstruction has been significantly reduced by 50 ms
by Dong et al. [22] raised a new climax. The quality of compared to the ESPCN in a single low resolution retinal
reconstruction continues to rise. image reconstruction. In addition, our mini-network has a
Regarding the quality of reconstructed image, we evaluate higher speed improvement in the field of super-resolution.
the quality of reconstructed images from two aspects. On one Note that we demonstrate the positive effect of the subpixel
hand, the value of PSNR increased +0.467 dB on the dataset convolution layer as well as tanh activation function. We used
IDI. On the other hand, we visualized the reconstructed images different activation function in the same structure, and the
in Fig 2, Fig 3 and Fig 4. The process of reconstruction can results show that the Tanh activation function performs
also be visualized in Fig 5. Based on the results, the subjective incomparably better than ReLU activation.
visual perception is superior to other methods in terms of Moreover, the optimization of the structure is essential to
performance. determine an optimal receptive field, and a good receptive
field can more effectively extract the image features required.

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

The PSNR after image reconstruction has been improved by good effect on reconstruction. In future studies, the
0.95dB when compared with the ESPCN based on the low implementation of deconvolution layers in this field of image
resolution retinal image. In our study, we also used the processing can be used as a research motivation. Whether
masking technique to test the luminance sensitive area. there is an optimized deconvolution layer that can contain
According to our experimental results, our reconstruction the original image information is the question of interest in
algorithm extracts a better receptive field in the training the research. We present a reasonable hypothesis stating that
process of the neural network where the brightness changes there are some relationships between convolution layer and
obviously and results in a better reconstruction effect. deconvolution layer based on information from neural
With regards to the application of deconvolution layers, network training.
multiple deconvolution layers are now used for the
visualization of neural network. During the reconstruction
process, the selection of the picture magnification is mostly
implemented using the deconvolution layers and direct linear
interpolation. However, these two methods do not have a

neighborhood regression for fast super-resolution. In Asian Conference on


REFERENCES Computer Vision (ACCV), pages 111–126. Springer, 2014.
[1] Peeters R R, Kornprobst P, Nikolova M, et al. The use of super‐ [16] D. Dai, R. Timofte, and L. Van Gool. Jointly optimized regressors for
image super-resolution. In Eurographics, volume 7, page 8, 2015.
resolution techniques to reduce slice thickness in functional MRI[J].
[17] Yang J,Wright J,Huang T,et al.Image super-resolution as sparse
International Journal of Imaging Systems & Technology, 2004, 14(3):131–
representation of raw image patches[C]//IEEE Conference on Computer
138.
Vision and Pattern Recognition.IEEE,2008:1-8.
[2] Elad M., Feuer A..Restoration of a single super resolution image from
several blurred, noisy, and undersampled measured images[M]. IEEE [18] Yang J, Wright J, Huang T S, et al. Image super-resolution via sparse
1997,12(12). representation. [J]. IEEE Transactions on Image Processing, 2010,
[3] Kennedy J A, Israel O, Frenkel A, et al. Super-resolution in PET 19(11):2861-2873.
imaging[J]. IEEE Transactions on Medical Imaging, 2006, 25(2):137. [19] Zhang Xiaoyan, Qin Longlong, Qian Yuan, et al.An improved sparse
[4] Y. Gao, H. Li, J. Dong and G. Feng, "A deep convolutional network representation super resolution Reconstruction algorithm [J]. Journal of
for medical image super-resolution," 2017 Chinese Automation Congress Chongqing University of Posts and Telecommunications: Natural Science,
(CAC), Jinan, 2017, pp. 5310-5315.doi: 10.1109/CAC.2017.8243724. 2016,28(3): 400-405.
[5] Sun J, Sun J, Xu Z, et al. Gradient Profile Prior and Its Applications in [20] Zhao N, Wei Q, Basarab A, et al. Single image super-resolution of
Image Super-Resolution and Enhancement[J]. IEEE Transactions on medical ultrasound images using a fast algorithm[C]// IEEE, International
Image Processing A Publication of the IEEE Signal Processing Society, Symposium on Biomedical Imaging. IEEE, 2016:473-476.
2011, 20(6):1529-1542. [21] N. Tajbakhsh et al., "Convolutional Neural Networks for Medical
[6] N. Efrat, D. Glasner, A. Apartsin, B. Nadler, and A. Levin. Accurate Image Analysis: Full Training or Fine Tuning? ," in IEEE Transactions on
blur models vs. image priors in single image super-resolution. In IEEE Medical Imaging, vol. 35, no. 5, pp. 1299-1312, May 2016.doi:
International Conference on Computer Vision (ICCV), pages:2832–2839. 10.1109/TMI.2016.2535302.
IEEE, 2013. [22] Dong C, Chen C L, He K, et al. Image Super-Resolution Using Deep
[7] H. He and W.-C. Siu. Single image super-resolution using gaussian Convolutional Networks[J]. IEEE Transactions on Pattern Analysis &
process regression. In IEEE Conference on Computer Vision and Pattern Machine Intelligence, 2016, 38(2):295-307.
Recognition (CVPR), pages 449–456. IEEE, 2011. [23] Yu D, Deng L. Deep Learning and Its Applications to Signal and
[8] J. Yang, Z. Lin, and S. Cohen. Fast image super-resolution based on Information Processing [Exploratory DSP] [J]. IEEE Signal Processing
in-place example regression. In IEEE Conference on Computer Vision and Magazine, 2010, 28(1):145-154.
Pattern Recognition (CVPR), pages 1059–1066. IEEE,2013. [24] Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., & Bishop,
[9] C. Fernandez-Granda and E. Candes. Super-resolution via transform R., et al. (2016). Real-Time Single Image and Video Super-Resolution
invariant group-sparse regularization. In IEEE International Conference on Using an Efficient Sub-Pixel Convolutional Neural Network. Computer
Computer Vision (ICCV), pages 3336–3343.IEEE, 2013. Vision and Pattern Recognition (pp.1874-1883). IEEE. unpublished.
[10] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution through [25] Hore A, Ziou D. Image Quality Metrics: PSNR vs. SSIM[C]//
neighbor embedding. In IEEE Computer Society Conference on Computer International Conference on Pattern Recognition. IEEE, 2010:2366-2369.
Vision and Pattern Recognition (CVPR), volume 1, pages I–I. IEEE, 2004 [26] Anderson T W. An introduction to multivariate statistical analysis. [J].
[11] S. Wang, L. Zhang, Y. Liang, and Q. Pan. Semi-coupled dictionary Technometrics, 1958, 46(1):119-119.
learning with applications to image super-resolution and photosketch [27] Gosling C. Encyclopedia of Distances[J]. Reference Reviews, 2009,
synthesis. In IEEE Conference on Computer Vision and 24(6):1-583.
Pattern Recognition (CVPR), pages 2216–2223. IEEE, 2012. [28] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored
[12] K. Zhang, X. Gao, D. Tao, and X. Li. Multi-scale dictionary for single neighborhood regression for fast super-resolution. In Asian Conference on
image super-resolution. In IEEE Conference on Computer Vision and Computer Vision (ACCV), pages 111–126. Springer, 2014.
Pattern Recognition (CVPR), pages 1114–1121. IEEE,2012. [29] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human
[13] X. Gao, K. Zhang, D. Tao, and X. Li. Image super-resolution with segmented natural images and its application to evaluating segmentation
sparse neighbor embedding. IEEE Transactions on Image Processing, algorithms and measuring ecological statistics. In Proc. 8th Int’l Conf.
21(7):3194–3205, 2012. Computer Vision, volume 2, pages 416–423, July 2001.
[14] Y. Zhu, Y. Zhang, and A. L. Yuille. Single image super-resolution [30] D. Dai, R. Timofte, and L. Van Gool. Jointly optimized regressors for
using deformable patches. In IEEE Conference on Computer Vision and image super-resolution. In Eurographics, volume 7, page 8, 2015.
Pattern Recognition (CVPR), pages 2917–2924. IEEE, 2014. [31] I Do Imaging (IDI), a searchable database of free and open source
[15] R. Timofte, V. De Smet, and L. Van Gool. A+: Adjusted anchored medical imaging software [DB/OL]. https://idoimaging.com/home.

2 VOLUME XX, 2017

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2871626, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

[32] J.J. Staal, M.D. Abramoff, M. Niemeijer, M.A. Viergever, B. van


Ginneken, "Ridge based vessel segmentation in color images of the retina",
IEEE Transactions on Medical Imaging, 2004, vol. 23, pp. 501-509.
[33] De Boor C. Bicubic Spline Interpolation[J]. J.math.physics, 1962,
41(3):212–218.
[34] Wang Z, Liu D, Yang J, et al. Deep Networks for Image Super-
Resolution with Sparse Prior[J]. 2015:370-378.

Shengxiang Zhang is born in Jiangling County,


Hubei Province, China in 1994. He is currently
pursuing a MS. degree in Computer Science and
Technology from Huaqiao University of Engineering.
As of now, his research interest includes computer
vision and the application of deep learning in medical
image super-resolution reconstruction. Furthermore,
he is specializing in the field of machine vision
applied to industrial robots.

Gaobo Liang is born in DaWu County, Hubei


Province, China in 1994. He is currently pursuing a
MS. degree in Computer Science and Technology at
College of Engineering, Huaqiao University, China.
From 2017 to 2018, he is a Research Assistant with
Huaqiao University, College of Engineering, China.
His research interests include the application of deep
learning in medical image analysis and the
application of deep learning in rapid visual
identification. Gaobo is also specializing in the
positioning of manufacturing components used by industrial sorting robots

Shuwan Pan is born in Jiangsu, China, in October


1982. He received his B. S. degree from Jiangsu
Normal University in 2006, and a Ph. D. degree in
Microelectronics and Solid State Electronics from
Xiamen University in 2011. Since 2011, he has been
a lecturer in the College of Engineering, Huaqiao
University. His current research focuses on the
photoelectric detection and machine vision
technology.

Lixin Zheng is born in Fujian, China, in April


1967. His research mainly focused on image
recognition and machine vision technology.
Currently, he is the dean and professor of
Engineering College of Huaqiao University, vice
president of Fujian automation society, vice
president of Xiamen automation society and
director of Fujian power supply society.
Dr. Zheng graduated from the Department of
electronic engineering, Huaqiao University and
received a bachelor's degree in 1987. In 1990, he was graduated from the
Department of Mechanical Engineering in the field of testing and
automation control and received a Master's degree in 1990. In 1997, he was
in Japan for further study. In 2002, he received his Ph. D. in predictive
control at Tianjin University. In 2004, he won the award of “Outstanding
Young and Middle-aged Teachers of Huaqiao University” and in 2006, was
awarded the title of "Outstanding Teacher of Huaqiao University”. In 2007,
he was supported by the Fujian excellent talent support program for his
research on robotics. Dr. Zheng has published significantly in the field of
industrial robot vision systems.

VOLUME XX, 2017 9

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like