Image Upcaling With UNet
Image Upcaling With UNet
A PROJECT REPORT
Submitted by
VIRUTHIRAN S
17CSR231
VIGNESH S
17CSR227
VIGNESH RAAJA M S
17CSL253
BACHELOR OF ENGINEERING IN
COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that the Project report entitled IMAGE UPSCALING USING UNET is the
bonafide record of the project work done by S. VIRUTHIRAN (Register No: 17CSR231), S.
the partial fulfillment of the requirements for the award of the Degree of Bachelor of Engineering
in Computer Science and Engineering of Anna University Chennai during the year 2020 – 2021.
Date:
ii
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
KONGU ENGINEERING COLLEGE
(Autonomous)
DECLARATION
We affirm that the Project Report titled IMAGE UPSCALING USING UNET being
Engineering is the original work carried out by us. It has not formed the part of any other
project report or dissertation on the basis of which a degree or award was conferred on an
VIRUTHIRAN S
(Reg.No: 17CSR231)
VIGNESH S
VIGNESH RAAJA M S
I certify that the declaration made by the above candidates is true to the best of my
knowledge.
ABSTRACT
Image upscaling (increasing the quality of an image) with the help of U-Net, a deep
convolutional neural network with a series of residual blocks with skip connections.
The problem deep machine learning based super resolution is trying to solve is that
traditional algorithm based upscaling methods lack fine detail and cannot remove defects and
compression artifacts. For humans who carry out these tasks manually it is a very slow and
painstaking process.
For image upscaling, machine learning employs simulated neurons or neural networks to
predict a high-resolution image from a given low-resolution image. To predict the upscaled
images with high accuracy, a neural network model must be trained on countless images. The
deployed AI model can then take low-resolution video and produce incredible sharpness and
enhanced details no traditional scaler can recreate. As an advancement to the existing system,
we are trying to give more precise and accurate predictions using U-Net instead of RDN.
iv
ACKNOWLEDGEMENT
Owing deeply to the supreme, we extend our thanks to the Almighty, who has
blessed us to come out successfully with our project. We take pleasure to express out deep
We express our hearty sincere thanks and gratitude to our beloved Correspondent,
Thiru. P.SACHITHANANDAN for giving the opportunity to pursue this course. We are
Dr.V.BALUSAMY B.E (Hons), MTech, Ph.D., for providing the necessary facilities to
complete our work.We would like to express our sincere gratitude to our respected Head of
Ph.D., for her constant encouragement for the development of our project.
We take immense pleasure to express our hearty thanks and gratitude to our
suggestions, which have been very helpful in the project. We are grateful to all the faculty
members of the Computer Science and engineering Department for their support.
v
TABLE OF CONTENTS
CHAPTER No. TITLE PAGE No.
I ABSTRACT iii
II LIST OF FIGURES vi
1 INTRODUCTION 1
2 GENERAL DESCRIPTION 4
3 REQUIREMENTS 7
3.3.1 COLABORATORY 8
3.3.2 PYTHON 8
4 DETAILED DESIGN 9
4.1 ARCHITECTURES 9
4.1.1 UNET 9
4.1.2 VGG16 12
4.2 PROCEDURE 12
vi
5.4 OUTPUT 21
IR Image Resolution
Software Requirements 3.2 7
HQ High Quality
LQ Low Quality
CHAPTER 1
INTRODUCTION
When you expand(zoom in) an image, the computer does bi-linear interpolation to
upscale the image which gets blurred due to over-smoothening (averaging the neighbouring
pixels). Or the low quality of images which are captured in a surveillance camera. They all
cause discomfort to view due to the lack of fine detail.
In order to avoid the loss of fine detail, we are in need of a new technology which
upscales an image without losing the quality. We implemented a deep convolutional neural
network (U-Net) which is widely used in the area of Image segmentation. It helps us to
upscale the image without losing the quality. We used a subset of the ImageNet dataset and
Oxford-IIIT PETS dataset for training and testing the model. The prediction results are far
better than the classic image upscaling methods with remarkably fine details. Thus, we think
our model serves the need of a good alternative to the Classical Image Upscaling method
which is used in the area of Image Processing and Image Restoration.
the four surrounding textures is computed and applied to the screen pixel. This process is
repeated for each pixel forming the object being textured. When an image needs to be scaled
up, each pixel of the original image needs to be moved in a certain direction based on the
scale constant. However, when scaling up an image by a non-integral scale factor, there are
pixels (i.e., holes) that are not assigned appropriate pixel values. In this case, those holes
should be assigned appropriate RGB or grayscale values so that the output image does not
have non-valued pixels.
The classical method for upscaling an Image has been done using Bi-Linear
Interpolation, a method which interpolates a pixel in a 2d image. This bi-linear interpolation
takes 4 nearest neighbors for the current selected pixel and outputs the results based on the
weighted average taken. This traditional bilinear interpolation upscaling method leaves the
scaled image blurry due to over-smoothening of neighbouring pixels i.e. averaging the pixels
3
of the neighbours. In order to avoid the loss of fine detail, we are in need of a new
technology which upscales an image without losing the quality.
Without a good quality image, places that deal with crime, medicine, etc have a
disadvantage of making a rational decision which could cause a bigger problem. The
creation of image upscaling goes way beyond when they need a clear image in situations like
these. Although the traditional way of doing this is moderate and inefficient, modern
problems need modern solutions with updated technological advancements. That’s where
deep convolutional neural networks come in. With this method, the output can be viewed
with more detail.
ORIGINAL IMAGE
And
UNET (1.2)
4
CHAPTER 2
LITERATURE REVIEW
1. Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, Yun Fu (2018):Residual Dense
Network for Image Super-Resolution. Residual learning makes the loss surface smooth than
a deep neural network with disordered loss surface. They split the training model into two
phases namely, global feature fusion and local feature fusion. In local feature fusion, the
model learns the local features of an image which helps later to generate the high resolution
image. And also with global feature fusion, the model learns the features of an image
efficiently and generates more authentic high resolution image.
2. Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue(2019) :This
review represent Image Super-Resolution with deep neural networks, and they are
categorized into two methods according to their major contributions aspects of image super-
resolution: Experimenting efficient deep neural networks for single image super-resolution,
and the optimization of redundant layers i.e. the layers learn nothing much beneficial to
obtain the target.
2. Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham,
Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe
ShiIn (2017): In this SRGAN method, a generative adversarial network (GAN) is used for
the image super-resolution (SR).This first neural network architecture able to inferring 4x
upscaling factors in photo-realistic natural images. A perceptual loss with L1 error is used to
train the model to generate authentic images . The generator generates the authentic images
while the discriminator estimates the similarity between the generated image and the original
image with perceptual loss.
4. Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee (2017):This
paper is about enhanced deep super-resolution network (EDSR) with performance that
5
5.Ying Tai, Jian Yang, Xiaoming Liu(2017):This paper is about a deep Convolutional
Neural Network model called as Deep Recursive Residual Network (DRRN) with 52
convolutional layers. Usually, Deep neural networks’ loss diverges than a shallow neural
networks’ loss and it is known as “vanishing gradient” which in theory, having many layers
in a neural network aids to minimize the loss but not in practical. So, they use residual
blocks to avoid the divergence of loss while increasing the layers in the network.
6
CHAPTER 3
REQUIREMENTS
Processor Intel/AMD
RAM 16 GB
Software Colaboratory
3.3.SOFTWARE DESCRIPTION
3.3.1. Colaboratory
3.3.2. Python
Python supports modules and packages, which encourages program modularity and
code reuse. The Python interpreter and the extensive standard library are available in source
or binary form without charge for all major platforms, and can be freely distributed.
8
CHAPTER 4
DETAILED DESIGN
The goal of this project is to upscale and improve the quality of low resolution
images. This project developed with UNET, a model developed for bio-medical
segmentation, used here for Single Image Super-Resolution (ISR) as well as a multi-output
version of the VGG16 network for deep features extraction used in the perceptual loss.
4.1. ARCHITECTURES
4.1.1. UNET
9
When convolutional neural nets are commonly used with images for classification,
the image is taken and downsampled into one or more classifications using a series of stride
two convolutions reducing the grid size each time.
To be able to output a generated image of the same size as the input, or larger, there
needs to be an upsampling path to increase the grid size. This makes the network layout
resemble a U shape, a U-Net the downsampling/encoder path forms the left hand side of the
U and the upsampling/decoder path forms the right hand part of the U.
10
This UNET model contains a workflow (shown in figure 4.1) of the following things,
H(x) := F(x) + x
11
Here, the main advantage of adding skip connections is because if any layer hurt the
performance of architecture then it will be skipped by regularization. So, this results in
training very deep neural network without the problems caused by vanishing/exploding
gradient. It was experimented on 100-1000 layers on CIFAR-10 dataset. There are similar
approaches present everywhere however those could not provide accuracy better than UNET
architecture.
The loss surface with and without the skip connections (4.3)
4.1.2. VGG16
One caveat in this generative model is that it lacks to re-produce valuable features
like eyes, hair, grass etc in the image. This information is learned by a pre-trained model
named VGG-16(Visual Geometry Group), which is a 16 layer CNN architecture trained to
classify images based on its features.
VGG16 ARCHITECTURE(4.4)
12
4.2. PROCEDURE
ImageNet is a vast dataset of annotated images by humans and majorly used for
computer vision research. There are more than 14 million images in the ImageNet dataset
and categorized into sub-categories as “synset” or “synonym set”. There are 100k synset in
the ImageNet dataset each with 1000 images. The dataset can be downloaded via this
link(http://image-net.org/download-imageurls).
Downscale each image by compression for details analyzing and store it with a copy.
After that, the dataset separated for training and validation
A low-res input image goes through a series of convolutions i.e. the encoder path
which encodes the useful features for the reconstruction of low-resolution image via skip
connections through the corresponding upsampling layer. The flattened encoded image now
goes through a series of deconvolutions i.e. the decoder path which upsamples the image
with the help of features obtained from the corresponding downsampling layer. This series
of convolutions and deconvolutions of the architecture resembles the letter ‘U’, thus it is
named as UNet.
When using a deep neural network, the loss tends to diverge more than a shallow
neural network. This is resolved by using ResBlocks which has an Identity shortcut function
between every two layers to minimize the loss and it’s widely adapted in the machine
learning community to use very deep neural nets without caring about vanishing gradients.
U-net uses this analogy as in skip-connections to aid the loss converge faster.
14
Training begins with downloading a subset of the ImageNet dataset and compressing
each image to 64 x 64 resolution with 50% quality by bi-linear interpolation. Splitting the
low-resolution images into a folder called “Train” and the original high-resolution images
into another folder called “Valid”.
Data augmentation in training is a very helpful process which allows the model to
learn and improvise in many aspects. This augmentation is done in so many ways including
compressing the quality of the image, flipping horizontally and vertically, adjusting,
lightening and perspective of the image and also adding noise too.
DATA AUGMENTATION(4.6)
Then, the training images are loaded into a data loader with 128 images per batch.
Each input low-resolution image goes through a series of 2-stride convolutions which halves
the image size at each convolution and it gets flattened into one dimensional vector of pixels
at the end of encoder path.
15
Upscaling starts with the flattened 1D vector of pixels and performs de-convolutions
which is padding an extra pixel around every pixel in the flattened vector. This process is
repetitively done till it gets to the size of the ground truth. Also, the features are gradually
improving with the perceptual loss provided from the VGG-16 and MSE pixel loss.
We observe the generative loss converges to a quite reasonable extent after 30 epochs
of training. Now, our model is all set to evaluate its inference. Testing is done with another
subset of ImageNet dataset with images compressed and reduced to low quality.
What we really want is the network to learn patterns idiosyncratic to the noise and
replace the corresponding portions with a realistic approximation of the original image,
hence preserving the precious details and delivering an image that is both clean of noise and
with a good perceptual quality.
A common effect of using a pixel-wise MSE loss function for image super resolution
is a smooth output: minimizing the MSE for each pixel leads to “average good results”,
which are though not correlated with perceptually pleasing results, those crisp details that we
are looking for. The idea would then be to use a loss function that penalizes more heavily
reconstruction errors committed on perceptually relevant elements of the image such as
edges, lines and shapes, textures and so on.
CHAPTER 5
The Oxford-IIIT Pet Dataset contains a total 37 category pet set with more or less
200 images for each class. In terms of scale, pose and lighting, those data contains more
number of variations. All the images inside the set have an associated original interpretation
of various kinds like breed, head ROI, and pixel level trimap segmentation.
ImageNet training data contains 12 lacks images in thousand categories for global
simplification in implementing. These data inside are only for the purpose of training and
not for validation of the model. They picked up random subsets with over 50,000 for that.
The overall view of the data here,
Precision based evaluation metrics Among the image quality assessment metrics,
MSE (Mean square error) and PSNR (Peak signal to noise ratio) are most widely used
metrics due to their simplicity. MSE is defined as the pixel-wise difference average:
m −1 n −1
1
MSE= ∑ ❑ ∑ ❑¿
mn i=0 j=0
Where I and K are the baseline image and the distorted image respectively. Based on
MSE, PSNR (Peak signal to noise ratio) is defined as
Where MAXI is the maximum signal value in the signal.255 is used in the image
evaluation applications. However, MSE or PSNR does not perform as well when it comes to
detailed texture. As these metrics take the regression-to-mean approach, which means
multiple potential solutions with high texture details are averaged to create a smooth
reconstruction, it can cause the over-smooth problem in the detail texture.
Algorithms based on PSNR or MSE as their sole loss function always face the
problem of poor perceptual quality. While error sensitivity based image quality assessment
metrics can reflect the pixel-wise difference of an image as compared to the non-distorted
image, it is not clear these metrics can reflect the fidelity of an image. The reason is that
natural images are highly structured, there exists strong dependencies within a local region
and among regions, reflecting the structures of an object. Pixel-wise difference is
independent of the inability to capture this type of structural information.
Following the above philosophy in order to evaluate the image degradation according
to the changes in structural information, Structural Similarity Index (SSIM) is designed. It
evaluated three independent components of an image, including luminance, contrast and
structure. It evaluated three independent components of an image, including luminance,
18
contrast and structure. Luminance signal l(x, y) is measured as the mean intensity
5.3.RESULT
The trained model along with skip connections and feature extractor VGG16, gave
results far better than the classic models.
Figure 4.1 shows the learning rate of the model to the loss. And it can be interpret
that the low loss value tends to pull up model’s learning.
OUTPUT(5.4)
Figure 4.3 contains three images side by side where the first one(left) is the low-res
input image and the middle one is predicted output image and final one on the right is the
ground truth which is original.
5.4.DISCUSSIONS
To understand where our model generalizes well and where it does not, we extracted
patches that have high PSNR values and patches that have a low value of the metric from the
validation set.
21
Unsurprisingly, the best performing patches are the ones with large flat areas while
more complex patterns are harder to reproduce accurately. We might then want to focus on
these areas for both training and evaluation of the results. The same is also highlighted by this
a heat map representing the error between the original HR image and the SR output of the
network: darker colors correspond to higher pixel-wise mean squared error, while lighter
colors correspond to lower error, or better results.
We can see how areas with more patterns correspond to higher errors but also
intuitively “simpler” transition areas (cloud-sky for instance) are fairly dark.
CHAPTER 6
With the advancement in AI technology, the classic upscaling couldn’t suffice the
problem. Also the results are not reliable enough for people to satisfy with the traditional
algorithms. This made the current situation where upscaling is critical in so many fields.
Thus, to avoid the problems with classic solutions and also to maintain the quality to the
advancement, the deep convolutional neural networks become essential to solve this issue
22
It’s a matter of time for this SR to dive deep in every field. The U-Net based model
which is initially developed for biomedical segmentation can have a clear understanding of
what needs to be done here. Before this, all the other models had their works with only a
lower prediction rate. U-Net became one of the best models for this super resolution purpose
as time went by in the past decade.
Our future work involves in enhancing the model from super resolution to repairing
images such as disoriented, noised, overlapped, damaged, etc. Neural nets are getting so
much powerful that is enough to achieve something that is far greater than the human mind
could think of. These rational synthetic beings can revolt again in history. Our work will
depend on that.
APPENDIX 1
import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *
from torchvision.models import vgg16_bn
path = untar_data(URLs.PETS)
path_hr = path/'images'
path_lr = path/'small-96'
path_mr = path/'small-256'
23
il = ImageList.from_folder(path_hr)
def resize_one(fn, i, path, size):
dest = path/fn.relative_to(path_hr)
dest.parent.mkdir(parents=True, exist_ok=True)
img = PIL.Image.open(fn)
targ_sz = resize_to(img, size, use_min=True)
img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
img.save(dest, quality=60)
sets = [(path_lr, 96), (path_mr, 256)]
for p,size in sets:
if not p.exists():
print(f"resizing to {size} into {p}")
parallel(partial(resize_one, path=p, size=size), il.items)
bs,size=32,128
arch = models.resnet34
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.1, seed=42)
def get_data(bs,size):
data = (src.label_from_func(lambda x: path_hr/x.name)
.transform(get_transforms(max_zoom=2.), size=size, tfm_y=True)
.databunch(bs=bs).normalize(imagenet_stats, do_y=True))
data.c = 3
return data
data = get_data(bs,size)
data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))
t = data.valid_ds[0][1].data
t = torch.stack([t,t])
def gram_matrix(x):
n,c,h,w = x.size()
x = x.view(n, c, -1)
return (x @ x.transpose(1,2))/(c*h*w)
24
gram_matrix(t)
base_loss = F.l1_loss
vgg_m = vgg16_bn(True).features.cuda().eval()
requires_grad(vgg_m, False)
blocks = [i-1 for i,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]
blocks, [vgg_m[i] for i in blocks]
class FeatureLoss(nn.Module):
def __init__(self, m_feat, layer_ids, layer_wgts):
super().__init__()
self.m_feat = m_feat
self.loss_features = [self.m_feat[i] for i in layer_ids]
self.hooks = hook_outputs(self.loss_features, detach=False)
self.wgts = layer_wgts
self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
] + [f'gram_{i}' for i in range(len(layer_ids))]
def make_features(self, x, clone=False):
self.m_feat(x)
return [(o.clone() if clone else o) for o in self.hooks.stored]
def forward(self, input, target):
out_feat = self.make_features(target, clone=True)
in_feat = self.make_features(input)
self.feat_losses = [base_loss(input,target)]
self.feat_losses += [base_loss(f_in, f_out)*w
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
self.metrics = dict(zip(self.metric_names, self.feat_losses))
return sum(self.feat_losses)
def __del__(self): self.hooks.remove()
feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])
wd = 1e-3
learn = unet_learner(data, arch, wd=wd, loss_func=feat_loss, callback_fns=LossMetrics,
25
blur=True, norm_type=NormType.Weight)
gc.collect();
learn.lr_find()
learn.recorder.plot()
lr = 1e-3
def do_fit(save_name, lrs=slice(lr), pct_start=0.9):
learn.fit_one_cycle(10, lrs, pct_start=pct_start)
learn.save(save_name)
learn.show_results(rows=1, imgsize=5)
do_fit('1a', slice(lr*10))
learn.unfreeze()
do_fit('1b', slice(1e-5,lr))
data = get_data(12,size*2)
learn.data = data
learn.freeze()
gc.collect()
learn.load('1b');
do_fit('2a')
learn.unfreeze()
do_fit('2b', slice(1e-6,1e-4), pct_start=0.3)
learn = None
gc.collect();
free = gpu_mem_get_free_no_cache()
# the max size of the test image depends on the available GPU RAM
if free > 8000: size=(1280, 1600) # > 8GB RAM
else: size=( 820, 1024) # <= 8GB RAM
print(f"using size={size}, have {free}MB of GPU RAM free")
data_mr = (ImageImageList.from_folder(path_mr).split_by_rand_pct(0.1, seed=42)
.label_from_func(lambda x: path_hr/x.name)
.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=1).normalize(imagenet_stats, do_y=True))
data_mr.c = 3
learn = unet_learner(data, arch, loss_func=F.l1_loss, blur=True, norm_type=NormType.Weight)
26
learn.load('2b');
learn.data = data_mr
fn = data_mr.valid_ds.x.items[0]; fn
img = open_image(fn); img.shape
p,img_hr,b = learn.predict(img)
show_image(img, figsize=(18,15), interpolation='nearest');
Image(img_hr).show(figsize=(18,15))
27
epo train_lo valid_lo pixel feat_0 feat_1 feat_2 gram_0 gram_1 gram_
ch ss ss 2
APPENDIX 2
import fastai
from fastai.vision import *
from fastai.callbacks import *
from fastai.utils.mem import *
from torchvision.models import vgg16_bn
torch.cuda.set_device(0)
path = Path('data/imagenet')
path_hr = path/'train'
path_lr = path/'small-64/train'
path_mr = path/'small-256/train'
path_pets = untar_data(URLs.PETS)
il = ImageList.from_folder(path_hr)
def resize_one(fn, i, path, size):
dest = path/fn.relative_to(path_hr)
dest.parent.mkdir(parents=True, exist_ok=True)
img = PIL.Image.open(fn)
targ_sz = resize_to(img, size, use_min=True)
img = img.resize(targ_sz, resample=PIL.Image.BILINEAR).convert('RGB')
img.save(dest, quality=60)
assert path.exists(), f"need imagenet dataset @ {path}"
# create smaller image sets the first time this nb is run
sets = [(path_lr, 64), (path_mr, 256)]
29
learn.save('imagenet')
learn.show_results(rows=3, imgsize=5)
learn.recorder.plot_losses()
_=learn.load('imagenet')
data_mr = (ImageImageList.from_folder(path_mr).split_by_rand_pct(0.1, seed=42)
.label_from_func(lambda x: path_hr/x.relative_to(path_mr))
.transform(get_transforms(), size=(820,1024), tfm_y=True)
.databunch(bs=2).normalize(imagenet_stats, do_y=True))
learn.data = data_mr
fn = path_pets/'other'/'dropout.jpg'
img = open_image(fn); img.shape
_,img_hr,b = learn.predict(img)
show_image(img, figsize=(18,15), interpolation='nearest');
Image(img_hr).show(figsize=(18,15))
32
REFERENCE
[1] Andreas Lugmayr, Martin Danelljan, Radu Timofte. NTIRE 2020 Challenge on Real-
World Image Super-Resolution: Methods and Results. 2020 CVPRW
[2] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee. Enhanced Deep
Residual Networks for Single Image Super-Resolution. In 2017 IEEE Conference on
Computer Vision and Pattern Recognition Workshops, 2 - 8
[3] Gwantae Kim, Kanghyu Lee, Junyeop Lee, Jeongki Min,Bokyeung Lee, Jaihyun Park,
David K. Han, and Hanseok Ko. Unsupervised real-world super resolution with cycle
generative adversarial network and domain discriminator. In CVPR Workshops, 2020
[4] Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, Tom Goldstein. Visualizing the Loss
Landscape of Neural Nets. arXiv:1712.09913v3 [cs.LG] 7 Nov 2018
[5] He Zhang, Vishwanath Sindagi, Vishal M. Patel. Image De-raining Using a Conditional
Generative Adversarial Network. In IEEE Transactions on Circuits and Systems for Video
Technology. Volume: 30, Nov. 2020
[6] Justin Johnson, Alexandre Alahi, Li Fei-Fei. Perceptual Losses for Real-Time Style
Transfer and Super-Resolution. In European Conference on Computer Vision 2016.
[8] Radu Timofte, Rasmus Rothe, and Luc Van Gool. Seven ways to improve example-
based single image super resolution. In CVPR, pages 1865–1873. IEEE Computer Society,
2016.
[9] Vincent Dumoulin and Francesco Visin. A guide to convolution arithmetic for deep
learning. arXiv:1603.07285v2 [stat.ML] 11 Jan 2018.
[10] Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang, Jing-Hao Xue, Qingmin
Liao. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE
Transactions on Multimedia. Volume: 21, Dec. 2019
[11] Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy. Recovering Realistic Texture in
Image Super-resolution by Deep Spatial Feature Transform. arXiv:1804.02815v1 [cs.CV] 9
Apr 2018.
[12] Ying Tai, Jian Yang and Xiaoming Liu. Image Super-Resolution via Deep Recursive
Residual Network. In CVPR 2017
Yiwen Huang and Ming Qin. Densely connected high order residual network for single
frame image super resolution. arXiv preprint arXiv:1804.05902, 2018.