Image Super-Resolution Reconstruction Based On Enhanced Attention Mechanism and Gradient Correlation Loss
Image Super-Resolution Reconstruction Based On Enhanced Attention Mechanism and Gradient Correlation Loss
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2024.Doi Number
ABSTRACT In the field of super-resolution reconstruction, generative adversarial networks are able to
generate textures that are more in line with the perception of the human eyes, but low-resolution images often
encounter information loss and edge blurring problems in the process of reconstruction. In order to solve this
problem, this article proposed an image super-resolution reconstruction model based on an enhanced attention
mechanism and gradient loss, which can better focus on important details in low-resolution images, thus
improving the quality of reconstructed images. Firstly, an enhanced attention mechanism is proposed and
incorporated into the generator model as a way to reduce the amount of information loss during image feature
extraction and retain more image details. Furthermore, this paper proposes a gradient correlation loss function
that aims to maximize the correlation between the gradient of the generated image and the gradient of the
original image. Thus, the generated image is more realistic and maintains a consistent edge structure. Finally,
the experimental results on the standard dataset show that compared with other representative algorithms, the
proposed model has achieved some improvement in PSNR, SSIM, and LPIPS, which can verify the
effectiveness of the algorithm.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
SR networks (EDSR). In 2018, ZHANG et al[7] introduced training to better maintain the image's overall edge
the channel attention mechanism into SR and constructed structure features. Rakotonirina et al.[17]improved
residual channel attention networks (RCAN).RCAN is the ESRGAN and proposed a super-resolution reconstruction
first network that applies the attention mechanism to the SR method based on ESRGAN+, which improves the super-
problem, and the information learned by the network is resolution image generation capability of the generative
more effective. In 2020, NIU et al[8]. proposed a holistic network by improving the generative network of ESRGAN
attention network (HAN) to address this problem. This by adding an extra layer of skip connections in its dense
method introduces the layer attention module, learns blocks. In 2021, Chen et al. [17]integrated the hierarchical
feature values through the interrelationship between multi- feature extraction module into the SRGAN model
scale layers, and uses the channel-spatial attention module framework and proposed the HSRGAN model, which
(CSAM) to learn the channel and spatial correlations of extracts image features at multiple scales by hierarchically
features in each layer. guiding the reconstruction, thus enhancing the visual
In recent years, Generative Adversarial Networks (GAN) fidelity of super-resolution reconstructed images.ZHANG
have been widely used in super-resolution reconstruction et al. [18]designed a degradation model that is more
algorithms due to their ability to learn more meaningful suitable for real images based on the above problems, from
loss functions through discriminators than those based on a more practical and depth-blind model. The algorithm
pixel differences. In 2014, Goodfellow[9]et al. surprised contains more complex fuzzy degradation, downsampling,
researchers with the performance of Generative noise and degradation strategies. In 2022, Liang et al.[19]
Adversarial Network (GAN) models in generating image proposed the LDL model for the problem of artifacts in
data, as first proposed by them. Inspired by GAN [8], in images, which determines the regions, penalizes the image
2017, Ledig et al[10]. applied it to the field of image super- generation details, retains the useful textures, reduces the
resolution reconstruction and proposed the super-resolution artifacts and makes the image more realistic. Li et al[20].
reconstruction model for generative adversarial networks argued that the processing of images with a single loss
(SRGAN). The model combines perceptual loss and would produce artifacts as well as part of the information
adversarial loss to recover the texture details of the image. would be too smooth, and therefore proposed a one-to-
Wang et al[11]. proposed an Enhanced Super Resolution many supervised Beby-GAN. In 2023, Yoo et al[21].
Generative Adversarial Network (ESRGAN) based on combined CNN and Transformer and proposed a cross-
SRGAN concerning network architecture, adversarial loss, scale marker attention module, allowing transform
and perceptual loss, which further improved the quality of branches to efficiently exploit informative relationships
the reconstructed image by using residual dense blocks between markers at different scales.In 2024, Lee [22]et al.
instead of the residual blocks in the original generator and performed meta-learning from the information contained in
by using a relative discriminator. In 2018, Luo et al[12]. the distribution of the image, which greatly improved
proposed a new framework for Bi-GANs-ST super- adaptation speed to new images as well as performance in
resolution generative adversarial networks by introducing kernel estimation and image fidelity. Although many
two complementary branches of generative adversarial scholars at home and abroad have achieved some results in
networks, which were trained using a combination of pixel the field of single-image super-resolution reconstruction,
loss, perceptual loss, and adversarial loss, to strike a they often face the problems of information loss and blurred
balance between objective evaluation metrics of the image image edges in the reconstruction process. To address this
and the subjective perception of the visual effect. In 2019, problem this paper proposes an image super-resolution
Zhang et al.[13]. proposed the Rank SRGAN model by reconstruction model based on enhanced attention and
incorporating content ranking into the SRGAN model gradient loss, with the following specific contributions.
framework, which introduces content ranking loss to An enhanced attention mechanism module is designed.
optimize the quality of generated images. In 2020, The feature extraction capa-bility of the generator is
PRAJAPATI et al[14]. used GAN for unsupervised enhanced by reducing the number of channels, introducing
learning of the SR algorithm and introduced a new stride connections and pooling layers to reduce the spatial
objective learning function based on mean opinion score. dimensions of the network, as well as using convolutional
Ma et al[15].proposed a super-resolution generative groups to provide more variations and feature combina-tions.
adversarial network based on SPSR, which establishes the A gradient correlation loss function is proposed. This loss
gradient feature mapping relationship between low- function improves the visual effect of the reconstructed
resolution and high-resolution images by adding a new image by maximising the correlation between the gradient of
gradient branch, introduces a gradient loss to better the generated image and the gradient of the original image,
maintain the geometric structure of the reconstructed image, which makes the generated image more realistic and
and incorporates a combination of minimum absolute value maintains a consistent edge structure.
loss, perceptual loss, adversarial loss, and gradient loss for
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
LR
Upsampling
Conv
Conv
Conv
MERF MERF MERF MERF
SR
Fusion
Conv
H()
H()
Upsampling
Conv
Conv
Conv
GB
GB
GB
GB
LR_G
SR_G
(a)EAGCL-SR生成器模型
HR
Dense
Dense
Dense
LRelu
LRelu
LRelu
Input
Conv
Conv
NB
or
LR
(b)EAGCL-SR判别器器模型
FIGURE 1. Here is the EAGCL-SR model, where (a) is the generator model and (b) is the discriminator model.
low-resolution images (LR). Firstly, the LR is fed to the
II. MODEL convolutional layer for shallow feature extraction. Then, the
Based on the SPSR model proposed in literature [15], this extracted features are passed to the MERF, next, the features
section proposes a super-resolution reconstruction model output from the MERF are passed to the next MERF while
based on enhanced attention mechanism and gradient also passing them to the generator to rebuild another part of
correlation loss (EAGCL-SR),and the overall architecture is the Gradient Block (GB), and so on. The gradient block (GB)
shown in Fig. 1. Where Fig.1(a)shows the EAGCL-SR can be any basic block to extract higher-level features, in the
generator model, in this paper, the generator model is designed experiments of this paper, a 3×3 convolution kernel was used.
from two perspectives, respectively, the EA-RRDB-based After the last MERF is executed, the features required for the
EAGCL-SR generator reconstruction part and the gradient reconstruction of this part of the generator are obtained
graph-based EAGCL-SR generator reconstruction part, where through convolution and upsampling operations in turn.
the latter adopts the gradient branching part as mentioned in
1) ENHANCED ATTENTION MECHANISM
the literature [15], and the focus of this section is on the former.
When performing feature extraction on images, the problem
The features obtained
of detail loss is often faced. To solve this problem, attention
from the two parts are fused by a fusion block, and after
mechanisms have been combined with reconstruction models
reconstruction by a convolutional layer to get the
so that the details of the image are attended to and not easily
reconstructed image, the gradient is extracted to get the
lost. However, ordinary attention mechanisms have the
gradient map. Fig.1(b) shows the discriminator model of
problem of focusing only on features in certain regions of the
EAGCL-SR, in this paper, we adopt the relative discriminator
image and ignoring other key details, resulting in some
design idea in ESRGAN [11], where Conv denotes the regular
important details or features being ignored or blurred, so that
convolutional layer, LRelu denotes the Leaky ReLU
there is still the problem of detail loss.
activation function, BN denotes the batch normalization layer,
In order to solve the above problems, this article proposes
and Dense denotes the fully connected layer.
an Enhanced Attention Block (EA), which aims to further
solve the problem of detail loss in the process of image feature
A. Reconstruction part of EACGL-SR generator based
on EA-RRDB extraction. The module enables the model to focus on feature-
In this section, the EACGL-SR generator reconstruction rich regions and extract more representative features, thus
partial model based on EA-RRDB is given. Firstly, a improving the reconstruction of the image by enhancing the
multilevel residual dense connection module EA-RRDB detail information.
based on an enhanced attention mechanism is proposed and When designing the EA module, it should be designed as a
five EA-RRDBs are combined to obtain the MERF (Multi EA- lightweight module considering that it needs to be inserted into
RRDB Fusion) module. Fig.1(a) gives the framework of the multiple modules of the generator. The attention block needs
reconstruction part of the EAGL-SR generator based on EA- a large receptive field to better perform the image super-
RRDB, which mainly performs reconstruction operations on resolution reconstruction task [23]. The EA module is
designed as shown in Fig.2, where firstly, the feature x is input
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
into a convolutional layer (Conv) which uses a 1×1 which helps to improve image quality and detail retention
convolutional kernel to reduce the channel dimension. Then, a during reconstruction.
stride convolution (Strided conv) with a step size of 2 is used
to expand the receptive field, and subsequently, the input B. Reconstruction part of EACGL-SR generator based
features are amplified using a deconvolution (Deconv) to on EA-RRDB
obtain richer high-frequency information. Using a In the model of this paper, the classical loss function and
combination of Strided convolution and Deconvolution gradient loss function are used. The classical loss function
quickly reduces the spatial dimensionality of the network. includes pixel-based mean absolute error loss (MAE),
Next, different combinations of features are applied using 1×1, perceptual loss, and adversarial loss. A comprehensive loss
3×3, and 5×5 cross-channel convolution (depthwise conv) function is formed by a weighted summation of these four loss
operations to enhance the expressive power of the model. The functions.
spatial dimension is then recovered using an upsampling layer 1) MAE LOSS FUNCTION
and the channel dimension is recovered using a 1×1 The MAE (Mean Absolute Error) loss function calculates the
convolutional layer (Conv), and finally, after a Softmax layer, absolute value of the difference between the predicted value
the deeper features of the image are obtained and fused with and the true value of each sample and then takes the average
the initial feature x to obtain the final features. The model can of the absolute differences of all samples as the loss.
effectively retain the detailed information in the image and Specifically as shown in equation (1) below:
promote the flow of gradient, thus improving the effective 1
transfer of features and the stability of network training. 𝑙𝑀𝐴𝐸 = ∑𝑚 𝐿𝑅 𝐻𝑅
𝑖=1 |𝐺(𝐼𝑖 ) − 𝐼𝑖 | (1)
𝑚
of the ith real image and 𝐺(𝐼𝑖𝐿𝑅 ) is the distribution of the ith
Upsampling
depthwise
Softmax
Deconv
Conv
conv
Conv
LReLU
LReLU
LReLU
Conv
Conv
Conv
Conv
Conv
ᵝ
between them. Specifically, this can be expressed by the
following equation (2):
FIGURE 3. EA-RRDB module.
𝑁 2
The EA proposed above is fused with a multi-level residual 1
𝑙𝑃𝑒𝑟 = ∑ (𝐹𝑖 (𝑥) − 𝐹𝑖 (𝑦)) (2)
𝑁
dense block (RRDB) of the SPSR model [15], and is 𝑖=1
constructed to obtain the multilevel Residual-Residual Dense Where x is the input image, y is the target image, 𝐹𝑖 (𝑥) and
Block (EA-RRDB) based on the Enhanced Attention 𝐹𝑖 (𝑦) denote their feature representations in ith layer,
Mechanism, which is shown in Fig.3 Each EA-RRDB module respectively, in a pre-trained neural network and N denotes the
consists of three Enhanced Attention Mechanism-based Dense number of feature layers. By minimizing the perceptual loss,
Connection Blocks (EADBs). Each EADB block consists of the generator is forced to produce an image that is closer to the
multiple densely connected residual blocks, and the input of target image in terms of the feature space, which in turn
each residual block includes the input and output of the improves the quality of the generated image.
previous level residual block. This densely connected
approach helps to capture detailed information in the image 3) ADVERSARIAL LOSS
and has more chances to pass gradients and can mitigate the A binary cross entropy loss function is used to measure the
problem of gradient vanishing. Based on this, the introduction probability that an image generated by the generator is
of an enhanced attention mechanism allows the EADB to correctly discriminated as a real image or a fake image. This
focus on important image regions in a targeted manner and is shown in equation (3) below:
enhance feature extraction from these regions. This design 𝑙𝑔𝑎𝑛 = − 𝑙𝑜𝑔(𝐷(𝑥) − 𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧))) (3)
enables the EA-RRDB module to extract effective features,
x denotes the real sample, 𝐷(𝑥) denotes the judgement
result of the discriminator on the real sample, 𝐺(𝑧) denotes
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
the forgery sample generated by the generator, and 𝐷(𝐺(𝑧)) Where α and β are the weights of the reconstructed image loss
denotes the judgement result of the discriminator on the and η is the weight of the gradient loss. Among them, α=0.01,
forgery sample. The goal of the discriminator is to minimize β=0.005, and η=0.005.
the adversarial loss function so that the judgement result for
real samples is close to 1 and the judgement result for forged III. EXPERIMENT
samples is close to 0.
4) GRADIENT CORRELATION LOSS FUNCTION A. Experimental environment and dataset
It has been shown that the use of classical loss functions in the In this article, the DIV2K [24] dataset, which includes 800
training process can easily lead to the problem of over- training images, 100 validation images and 100 test images, is
smoothing of the reconstructed image, i.e., it is difficult to used in the training process. In order to avoid overfitting
reconstruct the edges of low-resolution images with the during the training process, data enhancement operations such
trained model to achieve the desired results. The root cause of as random rotation and horizontal flipping are performed on
this problem is that the classical loss function (e.g. mean the training images as a way to increase the diversity of the
square error) focuses more on minimizing the global pixel- data. In order to test the model effect, three standard
level differences in the optimization process while ignoring benchmark datasets (Set5 [25], Set14 [26]and BSDS100 [27])
the importance of image details and edges. This results in a are used as the test sets, and the specific information is shown
model that tends to generate excessively smooth images that in TABLE I:
TABLE I
lack sharp edge features. To solve this problem, a gradient COMMONLY USED SUPER-RESOLUTION RECONSTRUCTION DATASETS
correlation loss function is proposed in this paper.
Scene
The concern of the proposed gradient loss function is to Dataset Number Advantages Disadvantages
content
ensure that the generated image is aligned with the original Small datasets Lack of
Nature
image in the gradient direction to maintain edge and texture Set5 5
People
Convenient for diversity
consistency. The performance of the gradient loss function is quick testing Small size
Medium-sized Still relatively
further enhanced by maximizing the correlation between the People
datasets small scale
Animals
gradient of the generated image and the gradient of the original Set14 14
Landscape
Diverse datasets
image. This loss function enables stronger constraints on the scenario Limited
Nature
content coverage
super-resolution model, which effectively maintains the Scenario types
structural information of the image and helps the generated Cities
are rich may be too
BSDS Architecture
high-resolution image to be more realistic and structurally 100 Diversity of comple Data
100 landscapen
degradation sets x
consistent in terms of details and edges, thus improving the ature
issues
quality and visual effect of the reconstructed image. The
specific calculation of the gradient correlation loss function is
B. Evaluation indicators
shown in Equation (4):
In this paper, PSNR, SSIM [28] and LPIPS [29]are used to
𝑐𝑜𝑣(𝐻(𝐺(𝐼 𝐿𝑅 )),𝐻(𝐼 𝐻𝑅))
𝑟𝐿𝐺 = (4) evaluate the experimental results. As in equation (6), PSNR
√𝜎(𝐻(𝐺(𝐼 𝐿𝑅 )))∗√𝜎(𝐻(𝐼 𝐻𝑅 )) can be evaluated by the grey level difference between the
In Equation (4), 𝑟𝐿𝐺 represents the correlation coefficient, corresponding pixel points of two images, the higher the value
and the result of "1" indicates a complete positive correlation of PSNR, the smaller the distortion.
(optimal performance of the loss function), -1 indicates 2552 ∗𝑤∗ℎ∗𝑐
negative correlation, and 0 indicates no correlation; cov(.) 𝑃𝑆𝑁𝑅 (𝑋, 𝑌) = 10 ∗ 𝑙𝑔 ℎ 𝑐 (6)
∑𝑤
𝑚=1 ∑𝑛=1 ∑𝑧=1[𝑋(𝑚,𝑛)−𝑌(𝑚,𝑛)]
2
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
TABLE II
COMPARISON OF PSNR, SSIM, AND LPIPS VALUES FOR 4X RECONSTRUCTION RESULTS OF VARIOUS ALGORITHMS
Dataset Metric bicubic SRGAN ESRGAN ESRGAN+ PDM-GAN Beby-GAN SPSR Ours
PSNR 26.69 26.69 26.50 25.88 25.62 27.82 28.44 28.89
Set5 SSIM 0.7736 0.7813 0.7565 0.7511 0.7304 0.8004 0.8241 0.8386
LPIPS 0.3644 0.1305 0.1080 0.1178 0.1075 0.0875 0.0870 0.0802
PSNR 26.08 25.88 25.52 25.01 23.69 24.69 24.75 24.89
Set14 SSIM 0.7467 0.7480 0.7175 0.7159 0.6716 0.7016 0.6960 0.7025
LPIPS 0.3870 0.1421 0.1254 0.1362 0.1398 0.1094 0.1062 0.0972
PSNR 22.65 22.67 23.33 23.54 23.84 24.13 24.21 24.58
BSDS100 SSIM 0.6014 0.6363 0.6133 0.6172 0.6235 0.6355 0.6554 0.6584
LPIPS 0.4452 0.1636 0.1436 0.1434 0.1433 0.1274 0.1197 0.1125
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
FIGURE 4. Shows 4 times the reconstruction results of each algorithm. Image "butterfly" from Set5. where (a) is the original image, and (b) is the local
effect of the reconstructed image.
FIGURE 5. Shows 4 times the reconstruction results of each algorithm. Image "baboon" from Set14. where (a) is the original image, and (b) is the local
effect of the reconstructed image.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
FIGURE 6. Shows 4 times the reconstruction results of each algorithm. Image "8023" from BSDS100. where (a) is the original image, and (b) is the
local effect of the reconstructed image.
proposed in this article is closer to the real image (GT). Fig. 5 gradient loss but applies EA network modules in the network.
(a) shows a real picture of Baboon, and Fig. 5 (b) shows The other algorithm is the complete model proposed in this
thepattern of Baboon's left beard. By observing the texture of paper, which has both enhanced attention machines and uses
the beard, it can be seen that the image generated by Bicubic gradient loss. The experimental results are shown in TABLE
is blurry, and the reconstruction effects of SRGAN, Beby III.
GAN, and PDM-GAN have severe loss of details and TABLE III
sharpening. The reconstruction effect of SPSR is relatively COMMONLY USED SUPER-RESOLUTION RECONSTRUCTION DATASETS
good, but there is still some detail loss compared to the model EACGL-
Dataset Metric SPSR EACGL-SR
proposed in this article. Therefore, the image reconstructed SR no L
using the model proposed in this article is closer to the real PSNR 28.44 28.64 28.89
image (GT). Fig. 6 (a) shows a real picture of the bird, while Set5 SSIM 0.8241 0.8124 0.8386
Fig. 6 (b) highlights the pattern of the bird's wings. The LPIPS 0.0870 0.0862 0.0802
PSNRS 24.75 24.72 24.89
reconstructed images generated by Bicubic, SRGAN,
Set14 SIM 0.6960 0.6916 0.7025
ESRGAN, Beby GAN, and PDM-GAN models are relatively LPIPS 0.1062 0.1047 0.0972
blurry and severely sharpened. Relatively speaking, the PSNR 24.21 24.43 24.58
reconstruction effect of the SPSR model is relatively clear, but BSDS
SSIM 0.6554 0.6347 0.6584
it can be seen to be quite blurry near the first texture of the 100
LPIPS 0.1197 0.1152 0.1125
wings. In contrast, the model proposed in this article can
reconstruct patterns that are closer to real images (GT). In
summary, the model proposed in this article can better
reconstruct images in terms of visual effects, and compared
with other comparative algorithms, it can be closer to the
details and textures of real images (GT). FIGURE 7. Comparison of ablation experiments. Image "flowers" from
set14.
E. Ablation experiments From TABLE III, it can be seen that compared with the SPSR
In order to verify the necessity of each part of the proposed model, the network with the EA module will have an
model, the corresponding ablation experiments are conducted improvement over the original network effect, in addition, it
in this section. Since the model in this paper is proposed based can be seen that the Enhanced Attention Mechanism (EA) has
on the model SPSR, two algorithms are designed to compare a better enhancement effect on the PSNR value. The
with it. One algorithm (EACGL-SR no L) trains without using experimental results of the model with the addition of gradient
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
correlation loss on this basis show that it can effectively [13] X. C. Zhang, Q. Chen, R. Ng, et al., “Zoom To Learn, Learn To Zoom,”
arXiv e-prints, 2019, doi: 10.48550/arXiv.1905.05169.
improve the quality of image reconstruction. Each module of [14] K. Prajapati, V. Chudasama, H. Patel, et al., “Unsupervised Single
this paper has improved For Algorithm EACGL-SR, all the Image Super-Resolution Network (USISResNet) for Real-World Data
evaluation metric values are better than Algorithm SPSR on Using Generative Adversarial Network,” in Proc. IEEE, 2020.
[15] C. Ma, Y. Rao, and Y. Cheng, “Structure-Preserving Super Resolution
different test sets. Therefore, the effectiveness of the method
with Gradient Guidance,” in Proc. 2020 IEEE/CVF/CVPR, 2020.
proposed in this paper is verified. As shown in Fig. 7, it can [16] N. C. Rakotonirina and A. Rasoanaivo, “ESRGAN+: Further
also be seen that the blurred areas of this paper's algorithm are Improving Enhanced Super-Resolution Generative Adversarial
reduced and the pattern of the calyx is clearer.” Network,” in Proc.IEEE, 2020.
[17] W. Chen, Y. Ma, and X. Liu, “Hierarchical Generative Adversarial
Networks for Single Image Super-Resolution,” in Proc. IEEE, 2021.
IV. Summary [18] K. Zhang, J. Liang, and L. Van Gool, "Designing a Practical
For SR tasks with high visual quality requirements, this paper Degradation Model for Deep Blind Image Super-Resolution," 2021.
proposes an image super-resolution reconstruction model [19] J. Liang, H. Zeng, and L. Zhang, “Supplementary Material to 'Details
or Artifacts: A Locally Discriminative Learning Approach to Realistic
based on enhanced attention mechanism and gradient loss, Image Super-Resolution',” 2022.
aiming at solving the problems of information loss and edge [20] W. Li, K. Zhou, and L. Qi, “Best-Buddy GANs for Highly Detailed
blurring in image super-resolution reconstruction. The model Image Super-resolution,” AAAI Conference on Artificial Intelligence,
vol. 36, 2022, pp. 1412-1420.
improves the quality of the reconstructed image by [21] J. Yoo, T. Kim, and S. Lee, “Rich CNN-Transformer Feature
introducing an enhanced attention mechanism that effectively Aggregation Networks for Super-Resolution,” arXiv e-prints, 2022.
focuses the important details in the low-resolution image. [22] R. Lee, R. Li, and S. Venieris, “Meta-Learned Kernel for Blind Super-
Resolution Kernel Estimation,” in Proc. IEEE/CVF, 2024.
Meanwhile, the gradient correlation loss function makes the [23] J. Liu, W. Zhang, and Y. Tang, “Residual Feature Aggregation
generated images more realistic and maintains the consistency Network for Image Super-Resolution,” in Proc. IEEE/CVF, 2020.
of the edge structure. The experimental results show that the [24] E. Agustsson and R. Timofte, “NTIRE 2017 Challenge on Single
Image Super-Resolution: Dataset and Study,” in Proc. IEEE
proposed model achieves certain improvement in PSNR,
Conference on Computer Vision and Pattern Recognition Workshops,
SSIM, and LPIPS metrics, thus verifying the effectiveness of 2017.
the model. In future work, more effective model architectures [25] M. Bevilacqua, A. Roumy, and C. Guillemot, “Neighbor Embedding
and training strategies will be explored to improve the quality Based Single-Image Super-Resolution Using Semi-Nonnegative
Matrix Factorization,” in Proc. IEEE International Conference on
of reconstruction results and reduce the computational cost. Acoustics, Speech and Signal Processing (ICASSP), 2012.
[26] Y. Yuan, S. Liu, and J. Zhang, “Unsupervised Image Super-Resolution
V. References Using Cycle-in-Cycle Generative Adversarial Networks,” in Proc
CVPR, 2018.
[1] S. Zhu, B. Zeng, and L. Zeng, “Image interpolation based on non- [27] D. Martin, C. Fowlkes, and D. Tal, “A Database of Human Segmented
local geometric similarities,” IEEE Transactions on Multimedia, vol. Natural Images and Its Application to Evaluating Segmentation
18, no. 9, pp. 1707-1719, 2016. Algorithms and Measuring Ecological Statistics,” in Eighth IEEE
[2] V. Papyan and M. Elad, “Multi-Scale Patch-Based Image International Conference on Computer Vision, 2001.
Restoration,”IEEE Transactions on Image Processing, vol. 25, no. 1, [28] Z. Wang, A. C. Bovik, and L. Sheikh, “Image Quality Assessment:
pp. 249-261, 2016. From Error Visibility to Structural Similarity,” IEEE Transactions on
[3] C. Dong, C. Loy, and X. Tang, “Accelerating the super-resolution Image Processing, 2004, vol. 13, pp. 600-612.
convolutional neural network,” in Computer Vision - ECCV 2016, B. [29] R. Zhang, P. Isola, and A. Efros, “The Unreasonable Effectiveness of
Leibe, J. Matas, and N. Sebe, Eds., 2016, pp. 391-407. Deep Features as a Perceptual Metric,” in Proc CVPR, 2018.
[4] C. Dong, C. Loy, and K. He, “Image super resolution using deep [30] Z. Luo, Y. Huang, and S. Li, “Learning the Degradation Distribution
convolutional networks,” in Proc. IEEE T-PAMI, vol. 38, pp. 295-307, for Blind Image Super-Resolution,” 2022.
2016.
[5] J. Kim, J. Lee, and K. Lee, “Deeply-Recursive Convolutional
Network for Image Super-Resolution,” in Proc .IEEE, 2016.
[6] B. Lim, S. Son, and H. Kim, “Enhanced Deep Residual Networks for
Single Image Super-Resolution,” in Proc.IEEE, 2017.
[7] Y. Zhang, Y. Tian, and Y. Kong,, “Residual Dense Network for Image
Super-Resolution,” in Proc IEEE, 2018.
[8] B. Niu, W. Wen, and W. Ren, “Single image super-resolution via a
holistic attention network,” in Proc. Computer Vision - ECCV 2020,
Cham: Springer International Publishing, 2020, pp. 191-207.
[9] I. Goodfellow, J. Pouget-Abadie, and M. Mirza, “Generative
adversarial networks,” Communications of the ACM, vol. 63, pp. 139-
144, 2020.
[10] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, and A.
Acosta, “Photo-realistic single image super-resolution using a
generative adversarial network,” in Proc. IEEE Computer Society,
2016.
[11] X. Wang, K. Yu, S. Wu, et al., “ESRGAN: Enhanced Super-
Resolution Generative Adversarial Networks”,Munich, Germany,
September 8-14, 2018, Proceedings, Part V," 2019.
[12] X. Luo, R. Chen, Y. Xie, et al., “Bi-GANs-ST for Perceptual Image
Super-Resolution,” in Proc.ECCV, 2019,Doi: 10.1007/978-3-030-
11021-5_2.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3439542
YANLAN SHI received the B.S degree in JIE LAN received the B.S. degree in Applied
software engineering from Lanzhou City Mathematics and the M. S. degree in Control
University in 2022. She is a graduate student Theory and Control Engineering from Jilin
from the School of Electronic and Information Agricultural University, Changchun, China, in
Engineering, Liaoning University of Technology. 2005 and Liaoning University of Technology,
His main research interests are computer vision Jinzhou, China, in 2011, respectively. She is a
and image super-resolution reconstruction. doctoral student in Agricultural Electrification and
Automation from Shenyang Agricultural
University, Shenyang, China, in 2019. She is
currently a lecturer with the College of Science,
Liaoning University of Technology. She is now a
member of Chinese association of automation.
Her research interests include adaptive fuzzy control, nonlinear control,
neural network control, multi-agent and swarm intelligence.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4