0% found this document useful (0 votes)
31 views10 pages

Image Super-Resolution Reconstruction Based On Enhanced Attention Mechanism and Gradient Correlation Loss

Uploaded by

lydiacdanything
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views10 pages

Image Super-Resolution Reconstruction Based On Enhanced Attention Mechanism and Gradient Correlation Loss

Uploaded by

lydiacdanything
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2024.Doi Number

Image super-resolution reconstruction based


on enhanced attention mechanism and gradient
correlation loss
YANLAN SHI 1, HUAWEI Yi 1, XIN ZHANG 1 , Lu Xu 2 , JIE LAN 3
1
School of Electronics and Information Engineering, Liaoning University of Technology, Jinzhou, 121001, China
2
IT and Products Management Department, Agricultural Bank of China, Beijing,100005 , China
3
College of Science, Liaoning University of Technology, Jinzhou, 121001, China

Corresponding author: Huawei Yi (e-mail: [email protected]).


This research was funded by the National Natural Science Foundation for youth scientists of China (No. 62203201), the Foundation Research Project of the
Educational Department of Liaoning Province (No. JYTMS20230860, No. LJKZZ20220085).”

ABSTRACT In the field of super-resolution reconstruction, generative adversarial networks are able to
generate textures that are more in line with the perception of the human eyes, but low-resolution images often
encounter information loss and edge blurring problems in the process of reconstruction. In order to solve this
problem, this article proposed an image super-resolution reconstruction model based on an enhanced attention
mechanism and gradient loss, which can better focus on important details in low-resolution images, thus
improving the quality of reconstructed images. Firstly, an enhanced attention mechanism is proposed and
incorporated into the generator model as a way to reduce the amount of information loss during image feature
extraction and retain more image details. Furthermore, this paper proposes a gradient correlation loss function
that aims to maximize the correlation between the gradient of the generated image and the gradient of the
original image. Thus, the generated image is more realistic and maintains a consistent edge structure. Finally,
the experimental results on the standard dataset show that compared with other representative algorithms, the
proposed model has achieved some improvement in PSNR, SSIM, and LPIPS, which can verify the
effectiveness of the algorithm.

INDEX TERMS image reconstruction,super-resolution,improved attention mechanism,gradient loss


function.

I. INTRODUCTION become a hot topic. Among them, image super-resolution


Super-resolution reconstruction of images is a technique reconstruction methods based on convolutional neural
for obtaining high-resolution images from single or network (CNN) and generative adversarial network (GAN)
multiple low-resolution images. The image super- are widely used because their reconstruction performance
resolution reconstruction technology is used to restore and is much better than the traditional algorithms.
reconstruct low-resolution images, which can effectively In 2014, Dong et al[4] proposed the super-resolution
improve the details and quality of images. Super-resolution convolutional neural network (SRCNN), which used three
image reconstruction algorithms can be roughly divided convolutional layers for reconstruction and greatly
into three categories, interpolation-based algorithms[1], improves the speed of reconstruction compared to
reconstruction-based algorithms[2]and learning-based traditional methods. In 2016, Kim et al[5] proposed a
algorithms[3]. The first two categories belong to the recursive recurrent neural network (DRCN) that recurrent
traditional methods, which usually suffer from the loops and jump connections to further improve the image
drawbacks of overall blurring of the images and serious quality compared to SRCNN. In 2017, LIM et al [6], on the
lack of details, and have greater limitations. In recent years, other hand, improved the residual network by removing the
with the development of deep learning, learning-based batch normalization (BN) layer in the residual blocks, thus
super-resolution reconstruction technology has gradually improving the generalization ability of the enhanced deep

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

SR networks (EDSR). In 2018, ZHANG et al[7] introduced training to better maintain the image's overall edge
the channel attention mechanism into SR and constructed structure features. Rakotonirina et al.[17]improved
residual channel attention networks (RCAN).RCAN is the ESRGAN and proposed a super-resolution reconstruction
first network that applies the attention mechanism to the SR method based on ESRGAN+, which improves the super-
problem, and the information learned by the network is resolution image generation capability of the generative
more effective. In 2020, NIU et al[8]. proposed a holistic network by improving the generative network of ESRGAN
attention network (HAN) to address this problem. This by adding an extra layer of skip connections in its dense
method introduces the layer attention module, learns blocks. In 2021, Chen et al. [17]integrated the hierarchical
feature values through the interrelationship between multi- feature extraction module into the SRGAN model
scale layers, and uses the channel-spatial attention module framework and proposed the HSRGAN model, which
(CSAM) to learn the channel and spatial correlations of extracts image features at multiple scales by hierarchically
features in each layer. guiding the reconstruction, thus enhancing the visual
In recent years, Generative Adversarial Networks (GAN) fidelity of super-resolution reconstructed images.ZHANG
have been widely used in super-resolution reconstruction et al. [18]designed a degradation model that is more
algorithms due to their ability to learn more meaningful suitable for real images based on the above problems, from
loss functions through discriminators than those based on a more practical and depth-blind model. The algorithm
pixel differences. In 2014, Goodfellow[9]et al. surprised contains more complex fuzzy degradation, downsampling,
researchers with the performance of Generative noise and degradation strategies. In 2022, Liang et al.[19]
Adversarial Network (GAN) models in generating image proposed the LDL model for the problem of artifacts in
data, as first proposed by them. Inspired by GAN [8], in images, which determines the regions, penalizes the image
2017, Ledig et al[10]. applied it to the field of image super- generation details, retains the useful textures, reduces the
resolution reconstruction and proposed the super-resolution artifacts and makes the image more realistic. Li et al[20].
reconstruction model for generative adversarial networks argued that the processing of images with a single loss
(SRGAN). The model combines perceptual loss and would produce artifacts as well as part of the information
adversarial loss to recover the texture details of the image. would be too smooth, and therefore proposed a one-to-
Wang et al[11]. proposed an Enhanced Super Resolution many supervised Beby-GAN. In 2023, Yoo et al[21].
Generative Adversarial Network (ESRGAN) based on combined CNN and Transformer and proposed a cross-
SRGAN concerning network architecture, adversarial loss, scale marker attention module, allowing transform
and perceptual loss, which further improved the quality of branches to efficiently exploit informative relationships
the reconstructed image by using residual dense blocks between markers at different scales.In 2024, Lee [22]et al.
instead of the residual blocks in the original generator and performed meta-learning from the information contained in
by using a relative discriminator. In 2018, Luo et al[12]. the distribution of the image, which greatly improved
proposed a new framework for Bi-GANs-ST super- adaptation speed to new images as well as performance in
resolution generative adversarial networks by introducing kernel estimation and image fidelity. Although many
two complementary branches of generative adversarial scholars at home and abroad have achieved some results in
networks, which were trained using a combination of pixel the field of single-image super-resolution reconstruction,
loss, perceptual loss, and adversarial loss, to strike a they often face the problems of information loss and blurred
balance between objective evaluation metrics of the image image edges in the reconstruction process. To address this
and the subjective perception of the visual effect. In 2019, problem this paper proposes an image super-resolution
Zhang et al.[13]. proposed the Rank SRGAN model by reconstruction model based on enhanced attention and
incorporating content ranking into the SRGAN model gradient loss, with the following specific contributions.
framework, which introduces content ranking loss to An enhanced attention mechanism module is designed.
optimize the quality of generated images. In 2020, The feature extraction capa-bility of the generator is
PRAJAPATI et al[14]. used GAN for unsupervised enhanced by reducing the number of channels, introducing
learning of the SR algorithm and introduced a new stride connections and pooling layers to reduce the spatial
objective learning function based on mean opinion score. dimensions of the network, as well as using convolutional
Ma et al[15].proposed a super-resolution generative groups to provide more variations and feature combina-tions.
adversarial network based on SPSR, which establishes the A gradient correlation loss function is proposed. This loss
gradient feature mapping relationship between low- function improves the visual effect of the reconstructed
resolution and high-resolution images by adding a new image by maximising the correlation between the gradient of
gradient branch, introduces a gradient loss to better the generated image and the gradient of the original image,
maintain the geometric structure of the reconstructed image, which makes the generated image more realistic and
and incorporates a combination of minimum absolute value maintains a consistent edge structure.
loss, perceptual loss, adversarial loss, and gradient loss for

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

LR

Upsampling
Conv

Conv

Conv
MERF MERF MERF MERF
SR

Fusion

Conv
H()
H()

Upsampling
Conv
Conv

Conv
GB

GB
GB

GB
LR_G
SR_G

(a)EAGCL-SR生成器模型

HR

Dense

Dense

Dense
LRelu
LRelu

LRelu
Input

Conv

Conv

NB

or
LR

(b)EAGCL-SR判别器器模型

FIGURE 1. Here is the EAGCL-SR model, where (a) is the generator model and (b) is the discriminator model.
low-resolution images (LR). Firstly, the LR is fed to the
II. MODEL convolutional layer for shallow feature extraction. Then, the
Based on the SPSR model proposed in literature [15], this extracted features are passed to the MERF, next, the features
section proposes a super-resolution reconstruction model output from the MERF are passed to the next MERF while
based on enhanced attention mechanism and gradient also passing them to the generator to rebuild another part of
correlation loss (EAGCL-SR),and the overall architecture is the Gradient Block (GB), and so on. The gradient block (GB)
shown in Fig. 1. Where Fig.1(a)shows the EAGCL-SR can be any basic block to extract higher-level features, in the
generator model, in this paper, the generator model is designed experiments of this paper, a 3×3 convolution kernel was used.
from two perspectives, respectively, the EA-RRDB-based After the last MERF is executed, the features required for the
EAGCL-SR generator reconstruction part and the gradient reconstruction of this part of the generator are obtained
graph-based EAGCL-SR generator reconstruction part, where through convolution and upsampling operations in turn.
the latter adopts the gradient branching part as mentioned in
1) ENHANCED ATTENTION MECHANISM
the literature [15], and the focus of this section is on the former.
When performing feature extraction on images, the problem
The features obtained
of detail loss is often faced. To solve this problem, attention
from the two parts are fused by a fusion block, and after
mechanisms have been combined with reconstruction models
reconstruction by a convolutional layer to get the
so that the details of the image are attended to and not easily
reconstructed image, the gradient is extracted to get the
lost. However, ordinary attention mechanisms have the
gradient map. Fig.1(b) shows the discriminator model of
problem of focusing only on features in certain regions of the
EAGCL-SR, in this paper, we adopt the relative discriminator
image and ignoring other key details, resulting in some
design idea in ESRGAN [11], where Conv denotes the regular
important details or features being ignored or blurred, so that
convolutional layer, LRelu denotes the Leaky ReLU
there is still the problem of detail loss.
activation function, BN denotes the batch normalization layer,
In order to solve the above problems, this article proposes
and Dense denotes the fully connected layer.
an Enhanced Attention Block (EA), which aims to further
solve the problem of detail loss in the process of image feature
A. Reconstruction part of EACGL-SR generator based
on EA-RRDB extraction. The module enables the model to focus on feature-
In this section, the EACGL-SR generator reconstruction rich regions and extract more representative features, thus
partial model based on EA-RRDB is given. Firstly, a improving the reconstruction of the image by enhancing the
multilevel residual dense connection module EA-RRDB detail information.
based on an enhanced attention mechanism is proposed and When designing the EA module, it should be designed as a
five EA-RRDBs are combined to obtain the MERF (Multi EA- lightweight module considering that it needs to be inserted into
RRDB Fusion) module. Fig.1(a) gives the framework of the multiple modules of the generator. The attention block needs
reconstruction part of the EAGL-SR generator based on EA- a large receptive field to better perform the image super-
RRDB, which mainly performs reconstruction operations on resolution reconstruction task [23]. The EA module is
designed as shown in Fig.2, where firstly, the feature x is input

VOLUME XX, 2017 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

into a convolutional layer (Conv) which uses a 1×1 which helps to improve image quality and detail retention
convolutional kernel to reduce the channel dimension. Then, a during reconstruction.
stride convolution (Strided conv) with a step size of 2 is used
to expand the receptive field, and subsequently, the input B. Reconstruction part of EACGL-SR generator based
features are amplified using a deconvolution (Deconv) to on EA-RRDB
obtain richer high-frequency information. Using a In the model of this paper, the classical loss function and
combination of Strided convolution and Deconvolution gradient loss function are used. The classical loss function
quickly reduces the spatial dimensionality of the network. includes pixel-based mean absolute error loss (MAE),
Next, different combinations of features are applied using 1×1, perceptual loss, and adversarial loss. A comprehensive loss
3×3, and 5×5 cross-channel convolution (depthwise conv) function is formed by a weighted summation of these four loss
operations to enhance the expressive power of the model. The functions.
spatial dimension is then recovered using an upsampling layer 1) MAE LOSS FUNCTION
and the channel dimension is recovered using a 1×1 The MAE (Mean Absolute Error) loss function calculates the
convolutional layer (Conv), and finally, after a Softmax layer, absolute value of the difference between the predicted value
the deeper features of the image are obtained and fused with and the true value of each sample and then takes the average
the initial feature x to obtain the final features. The model can of the absolute differences of all samples as the loss.
effectively retain the detailed information in the image and Specifically as shown in equation (1) below:
promote the flow of gradient, thus improving the effective 1
transfer of features and the stability of network training. 𝑙𝑀𝐴𝐸 = ∑𝑚 𝐿𝑅 𝐻𝑅
𝑖=1 |𝐺(𝐼𝑖 ) − 𝐼𝑖 | (1)
𝑚

Where 𝑙𝑀𝐴𝐸 denotes the average absolute error loss


function, m is the number of iterations, 𝐼𝑖𝐻𝑅 is the distribution
Strided conv

of the ith real image and 𝐺(𝐼𝑖𝐿𝑅 ) is the distribution of the ith
Upsampling
depthwise

Softmax
Deconv
Conv

conv

Conv

X high-resolution image generated by the generator.


2) PERCEPTUAL LOSS
The neural network is capable of extracting high-level features
FIGURE 2. Module. EA
of an image by training it on large-scale datasets. Thus, the
perceptual loss can calculate the difference between the two
2) EA-RRDB MODULE images through the pre-trained neural network, which is
usually calculated by passing the input image and the target
LReLU

LReLU

LReLU

LReLU
Conv

Conv

Conv

Conv

Conv

image through the pre-trained neural network separately to get


EA

their feature representations in the network. These feature


EADB EADB EADB representations are then used as inputs to the loss function to
calculate the Euclidean distance or Manhattan distance


between them. Specifically, this can be expressed by the
following equation (2):
FIGURE 3. EA-RRDB module.
𝑁 2
The EA proposed above is fused with a multi-level residual 1
𝑙𝑃𝑒𝑟 = ∑ (𝐹𝑖 (𝑥) − 𝐹𝑖 (𝑦)) (2)
𝑁
dense block (RRDB) of the SPSR model [15], and is 𝑖=1
constructed to obtain the multilevel Residual-Residual Dense Where x is the input image, y is the target image, 𝐹𝑖 (𝑥) and
Block (EA-RRDB) based on the Enhanced Attention 𝐹𝑖 (𝑦) denote their feature representations in ith layer,
Mechanism, which is shown in Fig.3 Each EA-RRDB module respectively, in a pre-trained neural network and N denotes the
consists of three Enhanced Attention Mechanism-based Dense number of feature layers. By minimizing the perceptual loss,
Connection Blocks (EADBs). Each EADB block consists of the generator is forced to produce an image that is closer to the
multiple densely connected residual blocks, and the input of target image in terms of the feature space, which in turn
each residual block includes the input and output of the improves the quality of the generated image.
previous level residual block. This densely connected
approach helps to capture detailed information in the image 3) ADVERSARIAL LOSS
and has more chances to pass gradients and can mitigate the A binary cross entropy loss function is used to measure the
problem of gradient vanishing. Based on this, the introduction probability that an image generated by the generator is
of an enhanced attention mechanism allows the EADB to correctly discriminated as a real image or a fake image. This
focus on important image regions in a targeted manner and is shown in equation (3) below:
enhance feature extraction from these regions. This design 𝑙𝑔𝑎𝑛 = − 𝑙𝑜𝑔(𝐷(𝑥) − 𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧))) (3)
enables the EA-RRDB module to extract effective features,
x denotes the real sample, 𝐷(𝑥) denotes the judgement
result of the discriminator on the real sample, 𝐺(𝑧) denotes

6 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

the forgery sample generated by the generator, and 𝐷(𝐺(𝑧)) Where α and β are the weights of the reconstructed image loss
denotes the judgement result of the discriminator on the and η is the weight of the gradient loss. Among them, α=0.01,
forgery sample. The goal of the discriminator is to minimize β=0.005, and η=0.005.
the adversarial loss function so that the judgement result for
real samples is close to 1 and the judgement result for forged III. EXPERIMENT
samples is close to 0.
4) GRADIENT CORRELATION LOSS FUNCTION A. Experimental environment and dataset
It has been shown that the use of classical loss functions in the In this article, the DIV2K [24] dataset, which includes 800
training process can easily lead to the problem of over- training images, 100 validation images and 100 test images, is
smoothing of the reconstructed image, i.e., it is difficult to used in the training process. In order to avoid overfitting
reconstruct the edges of low-resolution images with the during the training process, data enhancement operations such
trained model to achieve the desired results. The root cause of as random rotation and horizontal flipping are performed on
this problem is that the classical loss function (e.g. mean the training images as a way to increase the diversity of the
square error) focuses more on minimizing the global pixel- data. In order to test the model effect, three standard
level differences in the optimization process while ignoring benchmark datasets (Set5 [25], Set14 [26]and BSDS100 [27])
the importance of image details and edges. This results in a are used as the test sets, and the specific information is shown
model that tends to generate excessively smooth images that in TABLE I:
TABLE I
lack sharp edge features. To solve this problem, a gradient COMMONLY USED SUPER-RESOLUTION RECONSTRUCTION DATASETS
correlation loss function is proposed in this paper.
Scene
The concern of the proposed gradient loss function is to Dataset Number Advantages Disadvantages
content
ensure that the generated image is aligned with the original Small datasets Lack of
Nature
image in the gradient direction to maintain edge and texture Set5 5
People
Convenient for diversity
consistency. The performance of the gradient loss function is quick testing Small size
Medium-sized Still relatively
further enhanced by maximizing the correlation between the People
datasets small scale
Animals
gradient of the generated image and the gradient of the original Set14 14
Landscape
Diverse datasets
image. This loss function enables stronger constraints on the scenario Limited
Nature
content coverage
super-resolution model, which effectively maintains the Scenario types
structural information of the image and helps the generated Cities
are rich may be too
BSDS Architecture
high-resolution image to be more realistic and structurally 100 Diversity of comple Data
100 landscapen
degradation sets x
consistent in terms of details and edges, thus improving the ature
issues
quality and visual effect of the reconstructed image. The
specific calculation of the gradient correlation loss function is
B. Evaluation indicators
shown in Equation (4):
In this paper, PSNR, SSIM [28] and LPIPS [29]are used to
𝑐𝑜𝑣(𝐻(𝐺(𝐼 𝐿𝑅 )),𝐻(𝐼 𝐻𝑅))
𝑟𝐿𝐺 = (4) evaluate the experimental results. As in equation (6), PSNR
√𝜎(𝐻(𝐺(𝐼 𝐿𝑅 )))∗√𝜎(𝐻(𝐼 𝐻𝑅 )) can be evaluated by the grey level difference between the
In Equation (4), 𝑟𝐿𝐺 represents the correlation coefficient, corresponding pixel points of two images, the higher the value
and the result of "1" indicates a complete positive correlation of PSNR, the smaller the distortion.
(optimal performance of the loss function), -1 indicates 2552 ∗𝑤∗ℎ∗𝑐
negative correlation, and 0 indicates no correlation; cov(.) 𝑃𝑆𝑁𝑅 (𝑋, 𝑌) = 10 ∗ 𝑙𝑔 ℎ 𝑐 (6)
∑𝑤
𝑚=1 ∑𝑛=1 ∑𝑧=1[𝑋(𝑚,𝑛)−𝑌(𝑚,𝑛)]
2

denotes the calculation of the gradient covariance of the


generated image and the reconstructed image; 𝐻(𝐺(𝐼 𝐿𝑅 )) is Where X denotes the original high-resolution image; Y
the calculation of the gradient value of the reconstructed low- denotes the generator reconstructed image; c denotes the
resolution image, and 𝐻(𝐼 𝐻𝑅 ) is the calculation of the gradient number of channels of the image; w and h denote the width
value of the high-resolution image. σ(·) denotes the and height of the image, respectively; m denotes the mth pixel
calculation of the gradient covariance of the generated image on the width of the image; n denotes the nth pixel on the height
and the reconstructed image. of the image; and z denotes the zth channel of the 3 primary
The three classical loss functions proposed in (1), (2) and color channels.
(3) and the gradient correlation loss function proposed in (4) As in equation (7), SSIM evaluates the similarity of two
are fused to obtain the final loss function as shown in Equation images from three dimensions: brightness, contrast and
(5) below. structure, and the SSIM value is close to 1, which indicates
that the reconstructed image is closer to the structure of the
𝐿𝐺 = 𝑙𝑃𝑒𝑟 + 𝛼𝑙𝑀𝐴𝐸 + 𝛽𝑙𝑔𝑎𝑛 + 𝜂𝑟𝐿𝐺 (5) original image and generates a better result.
Where 𝑙𝑔𝑎𝑛 denotes the adversarial loss, 𝑙𝑃𝑒𝑟 denotes the 𝑆𝑆𝐼𝑀 (𝑋, 𝑌) =
(2μ𝑋 μ𝑌 +𝐶1 )(2σ𝑋𝑌 +𝐶2 )
(7)
(μ2 2 2 2
𝑋 +μ𝑌 +𝐶1 )(σ𝑋 +σ𝑌 +𝐶2 )
perceptual loss, and 𝑟𝐿𝐺 denotes the gradient correlation loss.

VOLUME XX, 2017 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

TABLE II
COMPARISON OF PSNR, SSIM, AND LPIPS VALUES FOR 4X RECONSTRUCTION RESULTS OF VARIOUS ALGORITHMS

Dataset Metric bicubic SRGAN ESRGAN ESRGAN+ PDM-GAN Beby-GAN SPSR Ours
PSNR 26.69 26.69 26.50 25.88 25.62 27.82 28.44 28.89
Set5 SSIM 0.7736 0.7813 0.7565 0.7511 0.7304 0.8004 0.8241 0.8386
LPIPS 0.3644 0.1305 0.1080 0.1178 0.1075 0.0875 0.0870 0.0802
PSNR 26.08 25.88 25.52 25.01 23.69 24.69 24.75 24.89
Set14 SSIM 0.7467 0.7480 0.7175 0.7159 0.6716 0.7016 0.6960 0.7025
LPIPS 0.3870 0.1421 0.1254 0.1362 0.1398 0.1094 0.1062 0.0972
PSNR 22.65 22.67 23.33 23.54 23.84 24.13 24.21 24.58
BSDS100 SSIM 0.6014 0.6363 0.6133 0.6172 0.6235 0.6355 0.6554 0.6584
LPIPS 0.4452 0.1636 0.1436 0.1434 0.1433 0.1274 0.1197 0.1125

where μ𝑋 denotes the mean value of X and μ𝑌 denotes the


mean value of Y; μ2𝑋 denotes the variance of X, μ2𝑌 denotes the D. Comparative experiments
variance of Y, and σ𝑋𝑌 denotes the covariance of X and Y; C1 1) QUANTITATIVE COMPARISON
and C2 are constants. This article compares and analyzes the proposed model (Hours)
As in equation (8), the LPIPS metric measures the with Bicubic, SRGAN [10], ESRGAN [11], ESRGAN+[16],
difference between two images. Learning the inverse mapping SPSR [15], Beby GAN [20], and PDM-GAN [30] when the
from reconstructed image to real image prioritises the amplification factor is 4. The experimental results are shown
perceived similarity between them.The lower the value of in TABLE II. From the data in the table, it can be seen that the
LPIPS, the more similar the two images are. PSNR of our method on the Set5, Set14, and BSDS100
1 datasets has been improved by 0.45dB, 0.14dB, and 0.37dB
𝑑(𝑥, 𝑥0 ) = ∑𝑙 ∑ℎ,ω ||ω ⊙ (𝑦ℎ𝑙ω − 𝑦0𝑙 ℎω )||22 (8)
𝐻 𝑙 𝑊𝑙 compared to SPSR, respectively. This means that our
𝑙 𝑊 × 𝐶𝑙 algorithm performs better in reconstructing the peak signal-to-
Where 𝑦⬚ , 𝑦0𝑙 ∈ 𝑅𝐻𝑙× 𝑙 denotes that the inputs are sent
to the neural network for feature extraction, and the outputs of noise ratio of the image, and the details of the image are clearer.
each layer are normalised after activation, and ω denotes the On the Set5, Set14, and BSDS100 datasets, the SSIM of our
layer of the network. method improved by 0.0145, 0.0065, and 0.0030 compared to
SPSR, respectively, indicating that our method performs better
C. Training details in maintaining image structural similarity. On the Set5, Set14,
In order to ensure the fairness of the experimental results, all and BSDS100 datasets, the LPIPS of our method decreased by
the experiments in this paper use a 4-fold scale factor and are 0.0068, 0.0090, and 0.0072 compared to SPSR, respectively.
conducted in the same hardware environment. The hard ware In summary, by comprehensively evaluating the PSNR, SSIM,
device parameters used in this paper are: CPU: Intel(R) and LPIPS indicators, the method proposed in this paper
Xeon(R) CPU E5-2680 v4, RAM: 12G, number of cores: 28, performs well in terms of super-resolution image quality and
GPU: 3080 Ti-12G. The experiments were carried out based is superior to other methods.
on the Linux operating system environment, using version 2) QUALITATIVE COMPARISON
1.13.1 of the PyTorch framework, Python version 3.8, and In order to better experience the visual effect of the
accelerated learning using Cuda11.3. reconstructed images using the model proposed in this article,
In the training process, in this paper, the parameter algorithms Bicubic, SRGAN, ESRGAN, ESRGAN+, SPSR,
batch_size is set to 4 and the size of the cropped high- Beby GAN, and PDM-GAN were used as comparison
resolution image is set to 128×128. The training is divided into algorithms. Figures 4-6 show some of the image
two stages: first, the improved attention mechanism is added reconstruction results.
to the RRDB module of the model and trained to obtain a pre- From a visual perspective, Fig. 4 (a) shows a realistic image
trained model. Then, the obtained pre-trained model is used to of the butterfly's back and wings, while Fig. 4 (b) shows a local
initialise the generator and the generator is trained using a loss image of the wing root obtained using various reconstruction
function. During the training process, the learning rate is set to methods. As shown in Fig. 4 (b), the images generated by
1e-4 and decayed to 0.5 times the original learning rate after Bicubic are blurry and unclear, while the images generated by
every 5e4 iterations. Such a learning rate decay strategy helps SRGAN, ESRGAN, and ESRGAN+ suffer from detail loss
the model to converge better during the training process. The and severe sharpening. The reconstruction effect of SPSR has
above experimental setup ensures comparability and fairness been improved, but the reconstruction effect in small areas is
of the experiments. not good. The reconstructed image of Beby GAN has blurred
wing lines and still suffers from loss of details. After analysis,
it can be seen that the image reconstructed using the mode

6 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

FIGURE 4. Shows 4 times the reconstruction results of each algorithm. Image "butterfly" from Set5. where (a) is the original image, and (b) is the local
effect of the reconstructed image.

FIGURE 5. Shows 4 times the reconstruction results of each algorithm. Image "baboon" from Set14. where (a) is the original image, and (b) is the local
effect of the reconstructed image.

VOLUME XX, 2017 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

FIGURE 6. Shows 4 times the reconstruction results of each algorithm. Image "8023" from BSDS100. where (a) is the original image, and (b) is the
local effect of the reconstructed image.

proposed in this article is closer to the real image (GT). Fig. 5 gradient loss but applies EA network modules in the network.
(a) shows a real picture of Baboon, and Fig. 5 (b) shows The other algorithm is the complete model proposed in this
thepattern of Baboon's left beard. By observing the texture of paper, which has both enhanced attention machines and uses
the beard, it can be seen that the image generated by Bicubic gradient loss. The experimental results are shown in TABLE
is blurry, and the reconstruction effects of SRGAN, Beby III.
GAN, and PDM-GAN have severe loss of details and TABLE III
sharpening. The reconstruction effect of SPSR is relatively COMMONLY USED SUPER-RESOLUTION RECONSTRUCTION DATASETS
good, but there is still some detail loss compared to the model EACGL-
Dataset Metric SPSR EACGL-SR
proposed in this article. Therefore, the image reconstructed SR no L
using the model proposed in this article is closer to the real PSNR 28.44 28.64 28.89
image (GT). Fig. 6 (a) shows a real picture of the bird, while Set5 SSIM 0.8241 0.8124 0.8386
Fig. 6 (b) highlights the pattern of the bird's wings. The LPIPS 0.0870 0.0862 0.0802
PSNRS 24.75 24.72 24.89
reconstructed images generated by Bicubic, SRGAN,
Set14 SIM 0.6960 0.6916 0.7025
ESRGAN, Beby GAN, and PDM-GAN models are relatively LPIPS 0.1062 0.1047 0.0972
blurry and severely sharpened. Relatively speaking, the PSNR 24.21 24.43 24.58
reconstruction effect of the SPSR model is relatively clear, but BSDS
SSIM 0.6554 0.6347 0.6584
it can be seen to be quite blurry near the first texture of the 100
LPIPS 0.1197 0.1152 0.1125
wings. In contrast, the model proposed in this article can
reconstruct patterns that are closer to real images (GT). In
summary, the model proposed in this article can better
reconstruct images in terms of visual effects, and compared
with other comparative algorithms, it can be closer to the
details and textures of real images (GT). FIGURE 7. Comparison of ablation experiments. Image "flowers" from
set14.
E. Ablation experiments From TABLE III, it can be seen that compared with the SPSR
In order to verify the necessity of each part of the proposed model, the network with the EA module will have an
model, the corresponding ablation experiments are conducted improvement over the original network effect, in addition, it
in this section. Since the model in this paper is proposed based can be seen that the Enhanced Attention Mechanism (EA) has
on the model SPSR, two algorithms are designed to compare a better enhancement effect on the PSNR value. The
with it. One algorithm (EACGL-SR no L) trains without using experimental results of the model with the addition of gradient

8 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

correlation loss on this basis show that it can effectively [13] X. C. Zhang, Q. Chen, R. Ng, et al., “Zoom To Learn, Learn To Zoom,”
arXiv e-prints, 2019, doi: 10.48550/arXiv.1905.05169.
improve the quality of image reconstruction. Each module of [14] K. Prajapati, V. Chudasama, H. Patel, et al., “Unsupervised Single
this paper has improved For Algorithm EACGL-SR, all the Image Super-Resolution Network (USISResNet) for Real-World Data
evaluation metric values are better than Algorithm SPSR on Using Generative Adversarial Network,” in Proc. IEEE, 2020.
[15] C. Ma, Y. Rao, and Y. Cheng, “Structure-Preserving Super Resolution
different test sets. Therefore, the effectiveness of the method
with Gradient Guidance,” in Proc. 2020 IEEE/CVF/CVPR, 2020.
proposed in this paper is verified. As shown in Fig. 7, it can [16] N. C. Rakotonirina and A. Rasoanaivo, “ESRGAN+: Further
also be seen that the blurred areas of this paper's algorithm are Improving Enhanced Super-Resolution Generative Adversarial
reduced and the pattern of the calyx is clearer.” Network,” in Proc.IEEE, 2020.
[17] W. Chen, Y. Ma, and X. Liu, “Hierarchical Generative Adversarial
Networks for Single Image Super-Resolution,” in Proc. IEEE, 2021.
IV. Summary [18] K. Zhang, J. Liang, and L. Van Gool, "Designing a Practical
For SR tasks with high visual quality requirements, this paper Degradation Model for Deep Blind Image Super-Resolution," 2021.
proposes an image super-resolution reconstruction model [19] J. Liang, H. Zeng, and L. Zhang, “Supplementary Material to 'Details
or Artifacts: A Locally Discriminative Learning Approach to Realistic
based on enhanced attention mechanism and gradient loss, Image Super-Resolution',” 2022.
aiming at solving the problems of information loss and edge [20] W. Li, K. Zhou, and L. Qi, “Best-Buddy GANs for Highly Detailed
blurring in image super-resolution reconstruction. The model Image Super-resolution,” AAAI Conference on Artificial Intelligence,
vol. 36, 2022, pp. 1412-1420.
improves the quality of the reconstructed image by [21] J. Yoo, T. Kim, and S. Lee, “Rich CNN-Transformer Feature
introducing an enhanced attention mechanism that effectively Aggregation Networks for Super-Resolution,” arXiv e-prints, 2022.
focuses the important details in the low-resolution image. [22] R. Lee, R. Li, and S. Venieris, “Meta-Learned Kernel for Blind Super-
Resolution Kernel Estimation,” in Proc. IEEE/CVF, 2024.
Meanwhile, the gradient correlation loss function makes the [23] J. Liu, W. Zhang, and Y. Tang, “Residual Feature Aggregation
generated images more realistic and maintains the consistency Network for Image Super-Resolution,” in Proc. IEEE/CVF, 2020.
of the edge structure. The experimental results show that the [24] E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single
Image Super-Resolution: Dataset and Study,” in Proc. IEEE
proposed model achieves certain improvement in PSNR,
Conference on Computer Vision and Pattern Recognition Workshops,
SSIM, and LPIPS metrics, thus verifying the effectiveness of 2017.
the model. In future work, more effective model architectures [25] M. Bevilacqua, A. Roumy, and C. Guillemot, “Neighbor Embedding
and training strategies will be explored to improve the quality Based Single-Image Super-Resolution Using Semi-Nonnegative
Matrix Factorization,” in Proc. IEEE International Conference on
of reconstruction results and reduce the computational cost. Acoustics, Speech and Signal Processing (ICASSP), 2012.
[26] Y. Yuan, S. Liu, and J. Zhang, “Unsupervised Image Super-Resolution
V. References Using Cycle-in-Cycle Generative Adversarial Networks,” in Proc
CVPR, 2018.
[1] S. Zhu, B. Zeng, and L. Zeng, “Image interpolation based on non- [27] D. Martin, C. Fowlkes, and D. Tal, “A Database of Human Segmented
local geometric similarities,” IEEE Transactions on Multimedia, vol. Natural Images and Its Application to Evaluating Segmentation
18, no. 9, pp. 1707-1719, 2016. Algorithms and Measuring Ecological Statistics,” in Eighth IEEE
[2] V. Papyan and M. Elad, “Multi-Scale Patch-Based Image International Conference on Computer Vision, 2001.
Restoration,”IEEE Transactions on Image Processing, vol. 25, no. 1, [28] Z. Wang, A. C. Bovik, and L. Sheikh, “Image Quality Assessment:
pp. 249-261, 2016. From Error Visibility to Structural Similarity,” IEEE Transactions on
[3] C. Dong, C. Loy, and X. Tang, “Accelerating the super-resolution Image Processing, 2004, vol. 13, pp. 600-612.
convolutional neural network,” in Computer Vision - ECCV 2016, B. [29] R. Zhang, P. Isola, and A. Efros, “The Unreasonable Effectiveness of
Leibe, J. Matas, and N. Sebe, Eds., 2016, pp. 391-407. Deep Features as a Perceptual Metric,” in Proc CVPR, 2018.
[4] C. Dong, C. Loy, and K. He, “Image super resolution using deep [30] Z. Luo, Y. Huang, and S. Li, “Learning the Degradation Distribution
convolutional networks,” in Proc. IEEE T-PAMI, vol. 38, pp. 295-307, for Blind Image Super-Resolution,” 2022.
2016.
[5] J. Kim, J. Lee, and K. Lee, “Deeply-Recursive Convolutional
Network for Image Super-Resolution,” in Proc .IEEE, 2016.
[6] B. Lim, S. Son, and H. Kim, “Enhanced Deep Residual Networks for
Single Image Super-Resolution,” in Proc.IEEE, 2017.
[7] Y. Zhang, Y. Tian, and Y. Kong,, “Residual Dense Network for Image
Super-Resolution,” in Proc IEEE, 2018.
[8] B. Niu, W. Wen, and W. Ren, “Single image super-resolution via a
holistic attention network,” in Proc. Computer Vision - ECCV 2020,
Cham: Springer International Publishing, 2020, pp. 191-207.
[9] I. Goodfellow, J. Pouget-Abadie, and M. Mirza, “Generative
adversarial networks,” Communications of the ACM, vol. 63, pp. 139-
144, 2020.
[10] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, and A.
Acosta, “Photo-realistic single image super-resolution using a
generative adversarial network,” in Proc. IEEE Computer Society,
2016.
[11] X. Wang, K. Yu, S. Wu, et al., “ESRGAN: Enhanced Super-
Resolution Generative Adversarial Networks”,Munich, Germany,
September 8-14, 2018, Proceedings, Part V," 2019.
[12] X. Luo, R. Chen, Y. Xie, et al., “Bi-GANs-ST for Perceptual Image
Super-Resolution,” in Proc.ECCV, 2019,Doi: 10.1007/978-3-030-
11021-5_2.

VOLUME XX, 2017 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542

Author Name: Preparation of Papers for IEEE Access (February 2017)

YANLAN SHI received the B.S degree in JIE LAN received the B.S. degree in Applied
software engineering from Lanzhou City Mathematics and the M. S. degree in Control
University in 2022. She is a graduate student Theory and Control Engineering from Jilin
from the School of Electronic and Information Agricultural University, Changchun, China, in
Engineering, Liaoning University of Technology. 2005 and Liaoning University of Technology,
His main research interests are computer vision Jinzhou, China, in 2011, respectively. She is a
and image super-resolution reconstruction. doctoral student in Agricultural Electrification and
Automation from Shenyang Agricultural
University, Shenyang, China, in 2019. She is
currently a lecturer with the College of Science,
Liaoning University of Technology. She is now a
member of Chinese association of automation.
Her research interests include adaptive fuzzy control, nonlinear control,
neural network control, multi-agent and swarm intelligence.

HUAWEI YI received the B.S. degree in


computer science and technology from Liaoning
University of Technology, in 2003, the M.Sc.
degree in Communication and Information
System from the Lanzhou University of
Technology, Lanzhou, China, in 2007, and the
Ph.D. degree in Computer Science and
technology from Yanshan University,
Qinhuangdao, China, in 2017. She is currently
an Associate Professor with the School of
Electronics and Information Engineering,
Liaoning University of Technology. Her main research interests include
recommendation system, trusted computing and information security.

XIN ZHANG received his bachelor's degree in


software engineering from Liaoning University
of Technology in 2023. He is a graduate student
from the School of Electronic and Information
Engineering, Liaoning University of
Technology. His main research interests are
computer vision and image super-resolution
reconstruction.

Xu Lu has studied at Shenyang University of


Technology and obtained bachelor's, master's,
and doctoral degrees. He currently works in
the IT and Product Management Department
of Agricultural Bank of China.

10 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like