0% found this document useful (0 votes)

2 views

On combining denoising with learning-based image decoding

This paper reviews innovative solutions for image denoising in the compressed domain by integrating denoising operations into learning-based compression decoders. It discusses both blind and non-blind methods, comparing their performance against state-of-the-art techniques, and emphasizes the advantages of applying denoising directly in the latent space of image compression. The findings indicate improved perceptual quality and computational efficiency, highlighting the need for further research in this emerging field.

Uploaded by

elaheh.hassanzadeh6

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

On combining denoising with learning-based image decoding

Uploaded by

elaheh.hassanzadeh6

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

On combining denoising with learning-based image decoding

Léo Larigauderie, Michela Testolina, and Touradj Ebrahimi

Multimedia Signal Processing Group (MMSPG)
Ecole Polytechnique Fédérale de Lausanne (EPFL)
CH-1015 Lausanne, Switzerland
[email protected], [email protected], [email protected]

ABSTRACT
Noise is an intrinsic part of any sensor and is present, in various degrees, in any content that has been captured
in real life environments. In imaging applications, several pre- and post-processing solutions have been proposed
to cope with noise in captured images. More recently, learning-based solutions have shown impressive results
in image enhancement in general, and in image denoising in particular. In this paper, we review multiple novel
solutions for image denoising in the compressed domain, by integrating denoising operations into the decoder
of a learning-based compression method. The paper starts by explaining the advantages of such an approach
from different points of view. We then describe the proposed solutions, including both blind and non-blind
methods, comparing them to state of the art methods. Finally, conclusions are drawn from the obtained results,
summarizing the advantages and drawbacks of each method.
Keywords: Image denoising, learning-based compression, latent space, image processing, deep-learning

1. INTRODUCTION
Capturing images through digital devices such as smartphones, tablets or cameras has recently become a common
practice, leading to a growing demand for storage of trillions of pictures per year ∗ . The vast amount of stored
data motivates the research toward novel and more efficient compression methods, which could allow reducing
the enormous needs for storage space. While a number of conventional standards have been proposed in the past,
recent research efforts are mostly devoted to learning-based compression methods.1 As an example, the JPEG
Committee has recently organized an activity with the goal of standardizing a novel learning-based compression
algorithm, also known as JPEG AI. The group first reported the leading performance of learning-based methods
over conventional image compression during a Call for Evidence,2 and successively compared different emerging
technologies in a Call for Proposals.3
Recent trends reveal that images are nowadays not only intended for human consumption, but also for
computer vision applications. Therefore, compressed contents should not only maximize the perceptual similarity
with its original version, but also guarantee good performance for computer vision and image processing tasks.
In this context, JPEG AI proposed, in the ”Use Cases and Requirements for JPEG AI” document,4 a framework
that allows image processing and computer vision tasks applied directly in the latent space of learning-based
image compression, therefore without the need for standard decoding. In particular, the compressed stream
should not only allow reconstruction with a standard decoder that specifically targets the human vision, but
should also allow computer vision tasks applied to the compressed domain or non-normative decoders for image
processing operations like denoising or super-resolution. This framework has the advantage that it does not
require preliminary information about the target application, and that the non-normative decoders or computer
vision networks can be updated to the most recent technology without the need to transcode or re-capture the
content.
Noise is a common disturbance factor in images, which impacts the visual quality of images and the perfor-
mance of multiple computer vision methods, including face detection and recognition.5 Noise is often caused
by both intrinsic factors, like the camera sensor, and extrinsic factors, like the ambient light, and may be im-
possible to avoid in many situations. This makes image denoising necessary and desirable, and a classical and
∗
https://blog.mylio.com/how-many-photos-will-be-taken-in-2021-stats/
well-studied problem in the state of the art. Generally, the goal of image denoising is to reconstruct an image
x̂ from its noisy observation y = x + n. The noise n is often approximated in the literature as additive white
Gaussian noise (AWGN), which is signal-independent with zero mean and standard deviation σ. Real noise can
be more realistically approximated with the Gaussian-Poissonian model,6 where the noise is approximated by a
Poissonian signal-dependent component ηp and a Gaussian signal-independent component ηg .
In this paper, we propose and assess different non-normative decoders able to jointly reconstruct and denoise
a compressed stream generated by a learned encoder. Notably, different blind and non-blind solutions are
implemented and compared, and the results are assessed using a number of objective quality metrics. Moreover,
the benefit of including extra information, e.g. the standard deviation of the noise σ, is discussed. All the
proposed solutions allow for improved performance when compared to the anchor methods (including compression
and denoising in cascade) in terms of perceptual visual quality and computational complexity.
The remaining of this paper is structured as follows: Section 2 summarizes the state of the art in learning-based
image compression, image denoising, and computer vision and image processing methods applied directly to the
latent space of image compression. Section 3 reviews the different proposed methods for combined compression
and denoising. Results are reported and discussed in Section 4, while conclusions are drawn in Section 5.

2. RELATED WORK
Following the constant growth in the total number of images taken by and stored on digital devices, new and
more efficient solutions to image compression are consistently being researched. Recently, a number of image
compression solutions based on autoencoders have been investigated,7–12 reporting high performance in terms of
compression efficiency and perceived visual quality.13 In particular, Ballè et al. firstly proposed an autoencoder
solution using nonlinear transforms in cascade to linear convolutions,7 which was then extended by introducing
side information in the form of a hyperprior that captures the spatial dependencies in the latent representation,8
and includes an autoregressive model to reduce the amount of side information.9 More recently, generative models
have been proposed, synthesizing details of the image to improve the performance at the lowest bitrates first,11
and successively maximizing perceptual similarity metrics to generate images with improved visual quality.12
In conventional scenarios, image compression is followed by either pre- or post-processing operations, with
the goal of limiting the distortions introduced by capture, compression and other factors. In this context, image
denoising is used as both pre- and post- processing operations. Multiple conventional denoising methods have
been proposed in the state of the art. As an example, Wavelet thresholding14 relies on the wavelet transform
to denoise images. More recently, denoising methods based on neural networks15, 16 were able to achieve better
performance at the cost of an additional computational cost. Notably, Zhang et al. proposed a denoising solution
based on a deep convolutional neural network (CNN), known as DnCNN, trained to estimate the residual noise
from a noisy observation,15 and successively improved the method by integrating a uniform noise level map as
input to the network16 in FFDNet. This additional information enables the network to handle a wide range of
noise levels and to compromise between noise reduction and detail preservation. Recently, Guo et al.17 proposed
a learning-based approach combining a noise level estimation network with a non-blind denoising network into a
unified blind method known as CBDNet, trained on realistic noise and with emphasis on mitigating noise level
under-estimation. Finally, Yue et al.18 proposed an innovative deep-learning-based bayesian framework for blind
image denoising and noise modeling, based on variational inference.
In recent years, due to the large amount of images that are intended for machine consumption, researchers
in image compression try to design compression methods able to encode images that are not only visually
pleasing after the reconstruction with a conventional decoder, but that also optimize computer vision and image
processing tasks.4, 19 A limited number of methods attempted to apply computer vision and image processing
methods directly in the latent space of image compression. Early results have been presented by Torfason et al.,20
which proposed to apply image classification and semantic segmentation in the latent representation of a learning-
based image compression method, showing improvements in run-time, memory usage, robustness and synergy,
and by compromising only the performance at the lowest bitrates. More recently, super-resolution algorithms
have been applied to the latent space of image compression,21 showing promising results in terms of visual
quality. Preliminary work in the compressed domain image denoising field proposed a non-normative decoder
solution able to combine decoding and denoising operations, while reducing the computational complexity of
the pipeline.22 A different approach for latent-space denoising was proposed by Alvar et al.,23 where a joint
compression and denoising network based on a scalable latent space allowed to achieve BD-rate savings and
improve the quality of images simultaneously. A joint compression and denoising method designed for satellite
images was proposed, by training both the encoder and the decoder of a learning-based compression algorithm
with an alternative loss function.24 Finally, Cheng et al25 recently proposed a pipeline for joint compression and
denoising, with the goal of reducing the storage space by minimizing the allocated bits used to store the noise
information. While these last methods have demonstrated improved performance, they are only suitable for a
limited number of applications but not all; for instance, reconstructing the original image without denoising is
desirable for preserving artistic intent. Focusing the denoising operations at the decoder side allows for a more
flexible choice of the desired decoder, without the need of storing multiple versions of the same content.
Regardless, the research on learning-based computer vision and image processing techniques applied to the
latent space of image compression is still at an early stage, and more efforts are needed to design robust coding
methods which are suitable for both machine and human vision. Notably, the impact of different architectures
on the performance of denoising methods applied in the latent space of image compression has not been fully
investigated yet.

3. COMBINED COMPRESSION AND DENOISING

In this section, different pipelines for combined decoding and denoising are proposed. Notably, six different
methods are presented, including both blind and non-blind methods.For the experiments, the variational au-
toencoder with a scale hyperprior model,8 and specifically the CompressAI implementation26 pre-trained for
mean squared error (MSE), is used as a baseline, and non-normative decoders are implemented by using novel
training strategies and loss functions. For all the experiments, the encoder is frozen to allow the fine-tuning of
the decoder only. The proposed training strategies can be applied to almost any learning-based compression
method, and therefore, in the future, it can be tested on upcoming learning-based image compression standards,
e.g.the JPEG AI compatible coding. For all the experiments, synthetic Gaussian-Poissonian noise6 is considered
and applied to the images using the practical noise generator designed for the JPEG AI CfP.27

3.1 Blind combined decoding and denoising

The first proposed blind solution is able to blindly reconstruct and denoise images simultaneously, i.e. without
using the information of the standard deviation of the noise σ, or of the a and b parameters used to generate the
Gaussian-Poissonian noise.6 The loss function is edited by removing the rate computation and optimizing only
the distortion, since freezing the encoder allows for the rate to be fixed. The distortion D is computed between
the reconstructed and original noise-free images, allowing the decoder to learn denoising in parallel to decoding.
Consequently, the proposed loss is the following:

L(x, x
bn ) = D(x, x
bn ) (1)

Where x is the original noise-free image, and x

bn is the reconstructed and denoised image. In this case, both
the original and noisy images are available to the network, as synthetic random Poissonian-Gaussian noise is
applied to each batch during the training. The distortion metric D is that used in the original compression
model, i.e. MSE, in this case. Therefore, the decoder is fine-tuned using the loss function 1, or by computing
the distortion metric between the reconstructed and the original noise-free image. The pipeline of the proposed
blind combined decoding and denoising method is presented in Figure 1. We denote this model as blind.

3.2 Non-blind combined denoising and decoding

Two additional solutions for non-blind combined denoising and decoding strategies are proposed. Notably, the
decoder architecture is extended to take as input a noise map concatenated to the image latent, and particularly n
input channels and corresponding filters are added to the first convolutional layer of the learning-based decoder,
being the noise map composed of n channels. The architecture of the other layers of the decoder remains
unchanged. The noise map used as input to the decoder is a lower resolution version of the true point-wise noise
^n) = D(x,x
L(x,x ^n)

NOISE
GENERATOR

FROZEN

Figure 1: Training pipeline of the proposed blind combined decoding and denoising method. Here, x represents
the original noise-free image, x̃ the noisy input image, ga the encoder, gs the decoder, ỹ the latent presentation,
and x̂ the reconstructed noise-free image.

reshape non-blind B

average average average

pooling pooling pooling

average
pooling
non-blind S

Figure 2: Noise map computation process. The noise map with 12 channels σb is used in the non-blind B method,
the noise map having 3 channels σs is used in the non-blind S method.

NOISE CONCAT.
GENERATOR

FROZEN

NOISE MAP
COMPUTATION

Figure 3: Training pipeline of the proposed non-blind denoising and decoding methods. Here σ refers to the low
resolution noise map with either 3 channels σs for non-blind S or 12 channels σb for non-blind B.
NOISE
GENERATOR

FROZEN
CONCAT.

Figure 4: Training pipeline of the proposed blind E denoising and decoding method. The architecture is built
upon the non-blind method, with gs referring to the non-blind denoising decoder. ge refers to the noise level
estimation network. The ground truth uniform noise level map σu contains the overall noise level of the noisy
image x̃, i.e., an estimation of the standard deviation of x̃ − x over all pixels

NOISE CONCAT.
GENERATOR

FROZEN

Figure 5: Training pipeline of the blind L method.

level map in the original RGB space, given by the parameters of the Poisson-Gaussian model and by the original
image. For an original RGB image x, in channel i, at a 2D-position p, the local noise level (standard deviation)
σi (p) is given by:

p
σi (p) = ai xi (p) + bi (2)

Where ai and bi are the parameters of the Poissonian-Gaussian noise6, 27 in channel i.

Equation 2 is used to compute the ground-truth point-wise noise map in the original RGB space. A lower
resolution noise map is obtained by passing the point-wise noise map through successive 2x2 average pooling
stages, which results in a weighted average where local noise level changes have a higher contribution, while
simultaneously increasing the receptive field when compared to a 16x16 average pooling or 16x16 downsampling.
It is then reshaped to have the same height and width as the image latent, in order to be concatenated.
Two variants of the model are proposed, by reviewing different resolution noise maps. In particular, different
channel sizes of the input noise map are explored, either using 3 or 12 channels for the noise map, and corre-
sponding to 4 and 3 average pooling stages respectively. We denote the model which uses a noise map with 3
channels as non-blind S, and the model with 12 channels as non-blind B. The pipeline to obtain both noise maps
is presented in Figure 2.
During the training, the MSE is used as a distortion metric, between the original noise-free and the decoded
image. This is equivalent to considering only the distortion term from the rate-distortion trade-off in the baseline
compression model, with no perceptual transform applied to images. The non-blind architecture is shown in
Figure 3.
Conv. 32 x 5 x 5

Conv. 32 x 3 x 3

Conv. 32 x 3 x 3
ReLU

ReLU

ReLU
IGDN
Figure 6: ge architecture used in blind E to estimate the uniform noise level map σu , based on the architecture
of CN NE from CBDNet.17

Conv. 2M x 3 x 3

Conv. M x 3 x 3
ReLU

ReLU

ReLU
Figure 7: ge architecture used in blind L to estimate the point-wise latent noise level map σ, based on the
architecture of CN NE from CBDNet .17 M is the number of channels of the noisy latent ỹ.

As obtaining the ground truth spatially variant noise level map may not be always feasible in practice, a
relaxation of the non-blind S model is proposed. During inference only, a uniform noise map σu containing the
empirical noise level of the image is used instead of the ground truth spatially variant noise map. We refer to
this pipeline as non-blind U.

3.3 Blind combined decoding and denoising with noise map estimation
An additional blind solution is proposed, denoted as blind E, taking advantage of an additional learned model to
estimate the noise map. Similarly to CBDNet,17 the pipeline is composed of a subnetwork (denoted here as ge ) to
estimate the noise level, and of a non-blind denoising subnetwork. The estimation network is trained separately
from the decoder, using Mean Square Error between a uniform ground truth noise level map and the output
noise map as the objective. The employed denoising network is the non-blind denoising decoder gs presented
in Section 3.2. More specifically, the decoder is chosen depending on the dimensionality of the latent space in
the baseline compression network. The image latent is composed of either 192 or 384 channels, for which the
non-blind S and the non-blind B decoders are used respectively.
The pipeline of the blind E method is represented in Figure 4.

3.4 Blind combined decoding and denoising with noise modeling in the latent space
Instead of estimating the noise level information in the original RGB space, an additional method that aims at
inferring the point-wise latent noise level directly from the latent space, here denoted as blind L, is presented.
Notably, we define the latent noise as the difference between the quantized latent ỹ of the noisy image and the
inversely quantized latent y of the clean image. The relationship between the latent noise and the noise applied
to the original image is unknown, as the latent noise is influenced by the encoding and quantization operations.
We approximate this noise in the latent image as zero-mean, point-wise independent Gaussian noise applied to
the clean latent, as such is typically done in the literature for noise with unknown properties applied to RGB
images.17, 18 The network is trained for inference of the latent noise level map σ and for combined denoising and
decoding of the noisy latent ỹ simultaneously, using the following loss function :

" #
2 2

1X (ỹi − ŷi ) + ε ŷ
L(D; θ) = E(x,ỹ)∼D log (σi2 + ε) + + 2 + λ||x̂ − x||22 (3)
2 i σi2 + ε si + ε

Where θ refers to the learned parameters of gs and ge . D is the set of clean-image and noisy-latent pairs
(x, ỹ) that compose the training dataset. λ > 0, ε > 0 are hyperparameters. The detailed derivation of the loss
function can be found in Appendix A. In our implementation, we choose ε = 1e − 3 and use the same λ value
as in the rate-distortion trade-off from the baseline compression model. Analogously to the non-blind methods
presented in Section 3.2, the number of output channels in the first convolutional layer of the decoder is doubled
and σ is concatenated to ỹ before decoding.

4. EXPERIMENTAL RESULTS AND DISCUSSION

Results of the proposed pipelines are computed on images from the JPEG AI noisy test set.28 Notably, the
dataset includes both noisy images and the relative noise-free original versions, presenting three different noise
levels (i.e. low, medium and high), randomly generated using the JPEG AI noise generator.27 The results of the
proposed methods are compared to two anchor pipelines, namely:

1. Original anchor: the learning-based anchor denoising method, i.e. FFDNet,16 is used to denoise the
images in the JPEG AI noisy test dataset. The denoising is applied before any compression, thus avoiding
any compression artifact.
2. Decoded anchor: the learning-based anchor denoising method, i.e. FFDNet,16 is applied in the pixel
domain after encoding and decoding the noisy test images with the variational autoencoder with a scale
hyperprior model at multiple bitrates.8

The results are reported both in the form of rate-distortion plots and through visual examples. In this paper,
only the results for images ‘00001’ and ‘00016’ of the JPEG AI datasets28 are reported. Notably, the first
image was chosen as it presents a wide smooth area, i.e. a white background; instead, the second image presents
high-frequency patterns, corresponding to the feathers of a bird. Therefore, the performance of the proposed
methods are assessed on a variety of conditions.
Figure 8 and Figure 9 present the objective results for image ‘00001’ and image ‘00016’ respectively. The
results are presented in the form of rate-distortion plots for a number of metrics, namely PSNR Y (i.e. computed
on the luminance component), MS-SSIM Y, VIFp Y, FSIM, and VMAF.28 The objective metrics have been
computed using the objective quality framework provided by JPEG AI † .
Figure 10 and Figure 11, on the other hand, present some visual examples of details from images decoded
and denoised with the proposed methods. Notably, the results for the highest rate (i.e. approximately 1.5bpp)
are presented, as the effects of compression are milder at such rates and therefore the visual difference between
the methods is more prominent.

4.1 Discussion
The objective quality and visual results presented above highlight that all the proposed methods are able to
improve the performance of the decoded anchor, but generally not the performance of the original anchor.
This can be explained by the fact that FFDNet was trained only on noisy uncompressed images, therefore the
performance on the decoded images is expected to be lower and could be improved by including examples of
encoded noisy images during the training of the network. In addition, the following observations can be drawn
from the rate-distortion plots:

• the proposed blind method, being the simplest and least complex method, shows lower performance than
the non-blind methods. This indicates that the decoder benefits from the added information about the
noise, generating better results both in terms of subjective visual quality and according to the objective
quality metrics. Regardless, the blind method has the advantage of not requiring any prior information
about the noise level, or any additional network which estimates information about the noise, making it
suitable for applications with low-latency constraints.
†
https://gitlab.com/wg1/jpeg-ai/jpeg-ai-qaf
0.98 0.52
29.5
0.50
29.0 0.97
28.5 0.48

28.0 0.96 0.46

27.5 0.44
0.95
27.0 0.42
26.5 0.94 0.40
26.0 0.38
0.93
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
BPP BPP BPP

(a) PSNR Y Image: 00016, noise: 003, metric: PSNR_HVS

(b) MS-SSIM (c) VIFp

0.980
29
68
original anchor FFDNET
0.975
66
decoded anchor FFDNET
64
blind
0.970
28
62
non-blind U
0.965
60
non-blind S
58
non-blind B
0.960 56
blind E
27 54 blind L
0.955 52
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
BPP BPP

26
(d) FSIM (e) VMAF
Figure 8: Rate-distortion results for image ‘00001’ of the JPEG AI noisy test set. The results regard only the
25
images with the highest noise level.

24
0.5 1.0 1.5 2.0 2.5 3.0
BPP
0.44
0.96
31 0.42
0.95
0.40
30 0.94
0.38
0.93
29 0.36
0.92
0.34
28 0.91

0.90 0.32

27 0.89 0.30
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
BPP BPP BPP

(a) PSNR Y Image: 00016, noise: 003, metric: PSNR_HVS

(b) MS-SSIM (c) VIFp

0.97
75
0.96 29 original anchor FFDNET
0.95
70 decoded anchor FFDNET
blind
0.94
28
65
non-blind U
0.93 60 non-blind S
0.92
non-blind B
55 blind E
0.91 27 blind L
50
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
BPP BPP

26
(d) FSIM (e) VMAF
Figure 9: Rate-distortion results for image ‘00016’ of the JPEG AI noisy test set. The results regard only the
25
images with the highest noise level.

24
0.5 1.0 1.5 2.0 2.5 3.0
BPP
(a) original (b) noisy (c) original anchor

(d) decoded anchor (e) blind (f) non-blind U

(g) non-blind B (h) blind E (i) blind L

Figure 10: Visual results for image ‘00001’ of the JPEG AI noisy test set. The results regard only the images
with the highest noise level, encoded at the highest bitrate.
(a) original (b) noisy (c) original anchor

(d) decoded anchor (e) blind (f) non-blind U

(g) non-blind B (h) blind E (i) blind L

Figure 11: Visual results for image ‘00016’ of the JPEG AI noisy test set. The results regard only the images
with the highest noise level, encoded at the highest bitrate.
• the proposed non-blind methods, namely non-blind U, non-blind S and non-blind B, are all able to improve
the performance of the blind method. Notably, the non-blind U and the non-blind S methods present
similar performance, while the non-blind B method presents improved objective performance. Yet, the
difference in terms of visual quality between these methods is limited.
• while the blind E method presents higher performance than the blind method, benefiting from the noise
level estimation, the objective metrics reveal lower performance than the non-blind methods. This shows
that the estimating network is not able to accurately estimate the noise map, and revealing the need for
further research in this direction.
• the blind L method presents improved performance, especially at the lower bitrates, where the distribution
of the noise is highly distorted by the encoding and quantization operations. Visually, this method is able
to preserve high-frequency details better than the other proposed methods.

5. CONCLUSIONS
In this paper, different methods for integrating denoising operations into the decoder of a learning-based compres-
sion framework are proposed. Notably, both blind and non-blind solutions have been explored. Experimental
results reveal that additional information about the noise distribution benefits the combined methods, which
achieve higher performance both objectively and subjectively when compared to an anchor performing decoding
and denoising in cascade. While in this paper the proposed strategies are only applied to a single framework,
they are flexible enough to be adapted to a wide variety of other learning-based compression methods, e.g. in the
future it can be applied to the upcoming JPEG AI learning-based codec. In this work, only the distortion metric
used in by original compression model (i.e. MSE) is used. As future work, a trade-off between two objective
metrics (e.g. MSE and SSIM) or a metric specific to noise reduction performance assessment might be used to
further improve the perceptual visual quality of the decoded and denoised images. Additionally, more advanced
approaches to estimate properties of latent noise might be explored.

ACKNOWLEDGMENTS
The authors would like to acknowledge support from the Swiss National Scientific Research project enti-
tled ”Advanced Visual Representation and Coding in Augmented and Virtual Reality” under grant number
200021 178854.

REFERENCES
[1] Testolina, M., Upenik, E., and Ebrahimi, T., “Comprehensive assessment of image compression algorithms,”
in [Applications of Digital Image Processing XLIII ], 11510, 469–485, SPIE (2020).
[2] ISO/IEC JTC 1/SC29/WG1 N89022, “Report on the JPEG AI Call for Evidence Results.” 89th JPEG
Meeting, Online, October 2020.
[3] ISO/IEC JTC 1/SC29/WG1 N100250, “Report on the JPEG AI Call for Proposals Results.” 96th JPEG
Meeting, Online, July 2022.
[4] ISO/IEC JTC 1/SC29/WG1 N100094, “Use Cases and Requirements for JPEG AI.” 94th JPEG Meeting,
Online, January 2022.
[5] Lu, Y., Barras, L., and Ebrahimi, T., “A novel framework for assessment of deep face recognition systems
in realistic conditions,” in [10th European Workshop on Visual Information Processing (EUVIP)], IEEE
(2022).
[6] Foi, A., Trimeche, M., Katkovnik, V., and Egiazarian, K., “Practical poissonian-gaussian noise modeling
and fitting for single-image raw-data,” IEEE Transactions on Image Processing 17(10), 1737–1754 (2008).
[7] Ballé, J., Laparra, V., and Simoncelli, E. P., “End-to-end optimized image compression,” in [5th Interna-
tional Conference on Learning Representations, ICLR 2017], (2017).
[8] Ballé, J., Minnen, D., Singh, S., Hwang, S. J., and Johnston, N., “Variational image compression with a
scale hyperprior,” in [International Conference on Learning Representations ], (2018).
[9] Minnen, D., Ballé, J., and Toderici, G. D., “Joint autoregressive and hierarchical priors for learned image
compression,” Advances in neural information processing systems 31 (2018).
[10] Cheng, Z., Sun, H., Takeuchi, M., and Katto, J., “Learned image compression with discretized gaussian
mixture likelihoods and attention modules,” in [Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition ], 7939–7948 (2020).
[11] Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., and Gool, L. V., “Generative adversarial networks
for extreme learned image compression,” in [Proceedings of the IEEE/CVF International Conference on
Computer Vision], 221–231 (2019).
[12] Mentzer, F., Toderici, G. D., Tschannen, M., and Agustsson, E., “High-fidelity generative image compres-
sion,” Advances in Neural Information Processing Systems 33, 11913–11924 (2020).
[13] Ascenso, J., Akyazi, P., Pereira, F., and Ebrahimi, T., “Learning-based image coding: early solutions
reviewing and subjective quality evaluation,” in [Optics, Photonics and Digital Technologies for Imaging
Applications VI], 11353, 164–176, SPIE (2020).
[14] Chang, S. G., Yu, B., and Vetterli, M., “Adaptive wavelet thresholding for image denoising and compres-
sion,” IEEE transactions on image processing 9(9), 1532–1546 (2000).
[15] Zhang, K., Zuo, W., Chen, Y., Meng, D., and Zhang, L., “Beyond a Gaussian denoiser: Residual learning
of deep CNN for image denoising,” IEEE Transactions on Image Processing 26(7), 3142–3155 (2017).
[16] Zhang, K., Zuo, W., and Zhang, L., “Ffdnet: Toward a fast and flexible solution for cnn-based image
denoising,” IEEE Transactions on Image Processing 27(9), 4608–4622 (2018).
[17] Guo, S., Yan, Z., Zhang, K., Zuo, W., and Zhang, L., “Toward convolutional blind denoising of real
photographs,” in [Proceedings of the IEEE/CVF conference on computer vision and pattern recognition ],
1712–1722 (2019).
[18] Yue, Z., Yong, H., Zhao, Q., Meng, D., and Zhang, L., “Variational denoising network: Toward blind noise
modeling and removal,” Advances in neural information processing systems 32 (2019).
[19] Choi, H. and Bajić, I. V., “Scalable image coding for humans and machines,” IEEE Transactions on Image
Processing 31, 2739–2754 (2022).
[20] Torfason, R., Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., and Van Gool, L., “Towards im-
age understanding from deep compression without decoding,” in [International Conference on Learning
Representations], (2018).
[21] Upenik, E., Testolina, M., and Ebrahimi, T., “Towards super resolution in the compressed domain of
learning-based image codecs,” in [Applications of Digital Image Processing XLIV ], 11842, 531–541, SPIE
(2021).
[22] Testolina, M., Upenik, E., and Ebrahimi, T., “Towards image denoising in the latent space of learning-based
compression,” in [Applications of Digital Image Processing XLIV ], 11842, 412–422, SPIE (2021).
[23] Alvar, S. R., Ulhaq, M., Choi, H., and Bajić, I. V., “Joint image compression and denoising via latent-space
scalability,” arXiv preprint arXiv:2205.01874 (2022).
[24] de Oliveira, V. A., Chabert, M., Oberlin, T., Poulliat, C., Bruno, M., Latry, C., Carlavan, M., Henrot,
S., Falzon, F., and Camarero, R., “Satellite image compression and denoising with neural networks,” IEEE
Geoscience and Remote Sensing Letters 19, 1–5 (2022).
[25] Cheng, K. L., Xie, Y., and Chen, Q., “Optimizing image compression via joint learning with denoising,”
arXiv preprint arXiv:2207.10869 (2022).
[26] Bégaint, J., Racapé, F., Feltman, S., and Pushparaja, A., “Compressai: a pytorch library and evaluation
platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029 (2020).
[27] Alvar, S. R. and Bajić, I. V., “Practical noise simulation for rgb images,” arXiv preprint arXiv:2201.12773
(2022).
[28] ISO/IEC JTC1/SC29/WG1 N100106, “JPEG AI Common Training and Testing Conditions.” 94th Meeting,
Online, January 2022.
APPENDIX A. LOSS FUNCTION DERIVATION OF THE BLIND L METHOD
Notation :
x : original noise-free image
y : (unquantized) latent representation of the noise-free image, y = ga (x)
s : noise-free latent scale hyperprior s = hs (ha (Q{y}))
ỹ : noisy latent
x̂ : reconstructed image, x̂ = gs (ỹ, σ)
ŷ : reconstructed latent, ŷ = ga (x̂) = ga (gs (ỹ, σ))
σ : noise level map of the noisy latent, σ = ge (ỹ)
z : unobserved noise-free latent

The combined denoising and decoding problem is first posed as the modeling of our data, original noise-free
image/noisy-latent pairs (x, ỹ). The objective is to find the network parameter values that maximize the expected
log-likelihood of the joint distribution p(x, ỹ), over the dataset D of original noise-free images/noisy-latent pairs.

E(x,ỹ)∈D [log p(x, ỹ)] = E(x,ỹ)∈D [log p(ỹ)] + E(x,ỹ)∈D [log p(x|ỹ)] (4)

Under the same assumptions as in the compression framework,8 where λ is the hyper-parameter of the
rate-distortion trade-off :

(x|ỹ) ∼ N (x̂, (2λ)−1 I) (5)

log p(x|ỹ) = −λ||x − x̂||22 + cst. (6)

The above allows for interpretation of the objective in parallel to that of a compression model :

E(x,ỹ)∈D [− log p(x, ỹ)] = E(x,ỹ)∈D [− log p(ỹ)] + λ||x − x̂||22

= R + λD

Where the term R corresponds to the rate of latent ỹ and D to the distortion in a classic rate distortion
trade-off, where before quantization, the latent is perturbed by a more complex noise source. Note that unlike
learned compression, our scope here is not to find a latent representation that minimizes rate, but to minimize
the rate given the fixed noisy latent, by infering parameters σ and ŷ on the distribution of the latent.
As the evidence log ỹ is untractable, we thus consider instead its evidence lowerbound, using an approximation
q(z) of the true distribution p(z|ỹ), similarly to what is proposed for VDNet.18 Note that unlike in the framework
presented by Yue et al.,18 only the noise-free latent z is an unobserved variable and not the noise level map σ.

log p(ỹ) = ELBO(q) + KL(q(z)||p(z|ỹ)) ≥ ELBO(q) (7)

ELBO(q) = Ez∼q [log(p(ỹ|z))] − KL(q(z)||p(z)) (8)
= Ez∼q [log(p(ỹ|z))] + Ez∼q [log p(z)] − Ez∼q [log q(z)] (9)

Similarly to the approach taken by VDNet framework18 for denoising in the RGB space but here in the
latent space, a true distribution is imposed on z, where ε is a hyperparameter. The distribution q(z) which
approximates p(z|ỹ) is also defined:
z ∼ N (0, S + εI) (10)
(
s2i if i=j
Sij = (11)
0 otherwise
q
z ∼ N (ŷ, εI) (12)

Finally, based on our point-wise independent gaussian latent noise assumption, the distribution of ỹ|z is
given by :

(ỹ|z) ∼ N (z, Σ + εI) (13)

(
σi2 if i=j
Σij = (14)
0 otherwise

The 3 terms of the evidence lower bound can be computed as :

1X
Ez∼q [log q(z)] = − [log 2π + log ε + 1]
2 i
ŷi2

1X 2 ε
Ez∼q [log p(z)] = − log 2π + log (si + ε) + 2 +
2 i si + ε s2i + ε
(ỹi − ŷi )2

1X 2 ε
Ez∼q [log(p(ỹ|z))] = − log 2π + log (σi + ε) + 2 +
2 i σi + ε σi2 + ε

From which the evidence lower bound is obtained :

(ỹi − ŷi )2 + ε ŷi2

1X 2
ELBO(q) = − log (σi + ε) + + 2 + cst.
2 i σi2 + ε si + ε

Which gives the following loss function to minimize as a function of the learned parameter θ of gs and ge :
" #
(ỹi − ŷi )2 + ε ŷ 2

1X 2 2
L(D; θ) = E(x,ỹ)∼D log (σi + ε) + + 2 + λ||x̂ − x||2 (15)
2 i σi2 + ε si + ε

Intermediate Microeconomics 8th Edition Varian Solution Manual (PDFDrive)
50% (2)
Intermediate Microeconomics 8th Edition Varian Solution Manual (PDFDrive)
8 pages
Jpeg Image Compression Using DCT
100% (2)
Jpeg Image Compression Using DCT
56 pages
Sop For Image Processing
No ratings yet
Sop For Image Processing
5 pages
海关数据
No ratings yet
海关数据
125 pages
Student Text - MechanicalRailway Diesel Shed Training Notes
90% (10)
Student Text - MechanicalRailway Diesel Shed Training Notes
222 pages
Thesis On Image Compression PDF
100% (3)
Thesis On Image Compression PDF
5 pages
Research Article Image Processing Design and Algorithm Research Based On Cloud Computing
No ratings yet
Research Article Image Processing Design and Algorithm Research Based On Cloud Computing
10 pages
Attention-guided CNN for image denoising
No ratings yet
Attention-guided CNN for image denoising
25 pages
A novel image denoising algorithm combining attention mechanism and residual UNet network
No ratings yet
A novel image denoising algorithm combining attention mechanism and residual UNet network
31 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
7 pages
PHD Thesis Image Compression
100% (1)
PHD Thesis Image Compression
4 pages
A Different Approach For Spatial Prediction and Transform Using Video Image Coding
No ratings yet
A Different Approach For Spatial Prediction and Transform Using Video Image Coding
6 pages
Result Analysis of Blur and Noise On Image Denoising Based On PDE
No ratings yet
Result Analysis of Blur and Noise On Image Denoising Based On PDE
8 pages
Aivp Paper 5837 15848 1 PB
No ratings yet
Aivp Paper 5837 15848 1 PB
12 pages
IJETR031104
No ratings yet
IJETR031104
3 pages
Practical Blind Image Denoising Via Swin-Conv-Unet and Data Synthesis
No ratings yet
Practical Blind Image Denoising Via Swin-Conv-Unet and Data Synthesis
15 pages
Thesis Image Compression
100% (3)
Thesis Image Compression
5 pages
Lossy Image Compression by Rounding The Intensity Followed by Dividing (RIFD
No ratings yet
Lossy Image Compression by Rounding The Intensity Followed by Dividing (RIFD
6 pages
Jpeg Image Compression Thesis
100% (2)
Jpeg Image Compression Thesis
6 pages
An Efficient Image Compression Technique Using Tcheb
No ratings yet
An Efficient Image Compression Technique Using Tcheb
33 pages
3-B
No ratings yet
3-B
11 pages
Thesis On Image Compression Using DCT
100% (3)
Thesis On Image Compression Using DCT
8 pages
Novel Cooperative Neural Fusion Algorithms For Image Restoration and Image Fusion
No ratings yet
Novel Cooperative Neural Fusion Algorithms For Image Restoration and Image Fusion
15 pages
Fast Adaptive and Effective Image Reconstruction Based On Transfer Learning
No ratings yet
Fast Adaptive and Effective Image Reconstruction Based On Transfer Learning
6 pages
Weber - 16 - Rapid, Detail-Preserving Image Downscaling
No ratings yet
Weber - 16 - Rapid, Detail-Preserving Image Downscaling
6 pages
image plane sweep illumination volume
No ratings yet
image plane sweep illumination volume
11 pages
Optimizing Image Compression Via Joint Learning With Denoising
No ratings yet
Optimizing Image Compression Via Joint Learning With Denoising
18 pages
Angelelli-2015-PQA
No ratings yet
Angelelli-2015-PQA
8 pages
Thesis On Medical Image Compression
100% (2)
Thesis On Medical Image Compression
7 pages
Bayers
No ratings yet
Bayers
4 pages
Reduced 3DGS i3d
No ratings yet
Reduced 3DGS i3d
17 pages
Haar Wavelet Based Approach For Image Compression and Quality Assessment of Compressed Image
No ratings yet
Haar Wavelet Based Approach For Image Compression and Quality Assessment of Compressed Image
8 pages
Image Denoising Based On Deep Learning
No ratings yet
Image Denoising Based On Deep Learning
7 pages
52 Denoising
No ratings yet
52 Denoising
13 pages
Research Article: Image Dehazing Based On Improved Color Channel Transfer and Multiexposure Fusion
No ratings yet
Research Article: Image Dehazing Based On Improved Color Channel Transfer and Multiexposure Fusion
10 pages
v1 Covered
No ratings yet
v1 Covered
15 pages
Image, Video Compression Techniques: JPEG, MPEG: Seminar Report On
No ratings yet
Image, Video Compression Techniques: JPEG, MPEG: Seminar Report On
55 pages
Ajassp 2014 1128 1134
No ratings yet
Ajassp 2014 1128 1134
6 pages
Blur2Sharp_A_GAN-Based_Model_for_Document_Image_De
No ratings yet
Blur2Sharp_A_GAN-Based_Model_for_Document_Image_De
7 pages
Texto Completo
No ratings yet
Texto Completo
7 pages
483171_7889
No ratings yet
483171_7889
19 pages
Wa0007.
No ratings yet
Wa0007.
44 pages
Brummer_On_the_Importance_of_Denoising_When_Learning_To_Compress_Images_WACV_2023_paper (1)
No ratings yet
Brummer_On_the_Importance_of_Denoising_When_Learning_To_Compress_Images_WACV_2023_paper (1)
9 pages
An Object For Finding An Effective and Source Authentication Mechanism For Multicast Communication in The Hash Tree: Survey Paper
No ratings yet
An Object For Finding An Effective and Source Authentication Mechanism For Multicast Communication in The Hash Tree: Survey Paper
11 pages
Image Enhancement Under Improper Lightning Condition: A Project Review On
No ratings yet
Image Enhancement Under Improper Lightning Condition: A Project Review On
27 pages
ICMLA_Dehazing_CameraReady
No ratings yet
ICMLA_Dehazing_CameraReady
8 pages
A Novel Approach For Patch-Based Image Denoising Based On Optimized Pixel-Wise Weighting
No ratings yet
A Novel Approach For Patch-Based Image Denoising Based On Optimized Pixel-Wise Weighting
6 pages
A GA-based Window Selection Methodology To Enhance Window-Based Multi-Wavelet Transformation and Thresholding Aided CT Image Denoising Technique
No ratings yet
A GA-based Window Selection Methodology To Enhance Window-Based Multi-Wavelet Transformation and Thresholding Aided CT Image Denoising Technique
9 pages
Digital Image Processing: Abstract
No ratings yet
Digital Image Processing: Abstract
4 pages
Big Data-Driven Fast Reducing The Visual Block Artifacts of DCT Compressed Images For Urban Surveillance Systems
No ratings yet
Big Data-Driven Fast Reducing The Visual Block Artifacts of DCT Compressed Images For Urban Surveillance Systems
11 pages
Real Time Ultrasound Image Denoising
No ratings yet
Real Time Ultrasound Image Denoising
8 pages
Term Paper On Digital Image Processing
100% (1)
Term Paper On Digital Image Processing
5 pages
Comparative Study of Techniques For Image Compression
No ratings yet
Comparative Study of Techniques For Image Compression
5 pages
Lossy Image Compression With Foundation Diffusion Models Paper
No ratings yet
Lossy Image Compression With Foundation Diffusion Models Paper
17 pages
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
No ratings yet
Efficient Hybrid Tree-Based Stereo Matching With Applications To Postcapture Image Refocusing
15 pages
Wavelets and LPG-PCA For Image Denoising
No ratings yet
Wavelets and LPG-PCA For Image Denoising
22 pages
IMAGE INPAINTING Seminar Paper
No ratings yet
IMAGE INPAINTING Seminar Paper
5 pages
Image Pruning
No ratings yet
Image Pruning
69 pages
Image Compression: A Survey
No ratings yet
Image Compression: A Survey
17 pages
Ijret 110306025
No ratings yet
Ijret 110306025
6 pages
IMAGE INPAINTING Final
No ratings yet
IMAGE INPAINTING Final
5 pages
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
No ratings yet
Beyond A Gaussian Denoiser: Residual Learning of Deep CNN For Image Denoising
14 pages
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
From Everand
AN IMPROVED TECHNIQUE FOR MIX NOISE AND BLURRING REMOVAL IN DIGITAL IMAGES
UTKARSH SHUKLA
No ratings yet
preprints202403.1272.v1
No ratings yet
preprints202403.1272.v1
37 pages
wg1n90021-REQ-JPEG_AI_Use_Cases_and_Requirements
No ratings yet
wg1n90021-REQ-JPEG_AI_Use_Cases_and_Requirements
7 pages
Video Surveillance Image Enhancement Using Deep Learning
No ratings yet
Video Surveillance Image Enhancement Using Deep Learning
24 pages
Image De-Hazing Techniques For Vision Based Applications - A Survey
No ratings yet
Image De-Hazing Techniques For Vision Based Applications - A Survey
5 pages
Seminar
No ratings yet
Seminar
13 pages
A CMOS Thin Film Fluorescence Contact Im
No ratings yet
A CMOS Thin Film Fluorescence Contact Im
10 pages
Aircraft Taxi Route Planning For A SMGCS Based On Discrete Event
No ratings yet
Aircraft Taxi Route Planning For A SMGCS Based On Discrete Event
5 pages
Lightweight Image Super-Resolution Based On
No ratings yet
Lightweight Image Super-Resolution Based On
27 pages
Solution E 66 Teen Demo
No ratings yet
Solution E 66 Teen Demo
2 pages
1 s2.0 S2666285X2200053X Main
No ratings yet
1 s2.0 S2666285X2200053X Main
6 pages
JJM-10!02!20-452 - Full - Full Handbook - Jaguar X-Type Owner's Handbook - MY 2004
No ratings yet
JJM-10!02!20-452 - Full - Full Handbook - Jaguar X-Type Owner's Handbook - MY 2004
216 pages
Plasma YH-400 English Manual
No ratings yet
Plasma YH-400 English Manual
29 pages
Inisiasi Menyusui Dini
No ratings yet
Inisiasi Menyusui Dini
14 pages
Title - The Enigmatic World of Quantum Computing - Unraveling The Power of Qubits
No ratings yet
Title - The Enigmatic World of Quantum Computing - Unraveling The Power of Qubits
2 pages
Feminism Topical Qs
No ratings yet
Feminism Topical Qs
9 pages
Argumentessay
No ratings yet
Argumentessay
4 pages
Norsok.650.edn.4 2
No ratings yet
Norsok.650.edn.4 2
30 pages
Journal of Nano Technology
No ratings yet
Journal of Nano Technology
15 pages
ICWAI Paper 1 Fundamentals of Economics and Management
100% (2)
ICWAI Paper 1 Fundamentals of Economics and Management
416 pages
Belt Sway Switch Working Principle - InstrumentationTools
No ratings yet
Belt Sway Switch Working Principle - InstrumentationTools
13 pages
Full download Management and Change in Africa A Cross Cultural Perspective T. Jackson pdf docx
No ratings yet
Full download Management and Change in Africa A Cross Cultural Perspective T. Jackson pdf docx
67 pages
12 CS Prac File
No ratings yet
12 CS Prac File
59 pages
State Bank of India
No ratings yet
State Bank of India
10 pages
Certificate of Recognition For Committees
No ratings yet
Certificate of Recognition For Committees
40 pages
Prospectus2020 PDF
No ratings yet
Prospectus2020 PDF
130 pages
ECUST PROII Advanced Training PDF
100% (1)
ECUST PROII Advanced Training PDF
118 pages
BC Explore Actions Handbook
No ratings yet
BC Explore Actions Handbook
118 pages
Absolute and Constitutional Monarchy PPT
No ratings yet
Absolute and Constitutional Monarchy PPT
10 pages
Use Case Template
No ratings yet
Use Case Template
10 pages
2019 WASTE HEAT RECOVERY 11-COM.P-18-rev.64 PDF
No ratings yet
2019 WASTE HEAT RECOVERY 11-COM.P-18-rev.64 PDF
41 pages
Physics Unit 1 2021 - 22 QP
No ratings yet
Physics Unit 1 2021 - 22 QP
2 pages
Seminar Report 2013 Autonomous Car: Dept. of ECE, T.K.M Institute of Technology, Kollam
100% (1)
Seminar Report 2013 Autonomous Car: Dept. of ECE, T.K.M Institute of Technology, Kollam
33 pages
7075
100% (1)
7075
102 pages
OB Practical Questionnaire KG Agarwal
No ratings yet
OB Practical Questionnaire KG Agarwal
5 pages
Apartment Maintenance Accounts Excel Template
No ratings yet
Apartment Maintenance Accounts Excel Template
27 pages
Dokumen - Tips - Law 243 Constitutional Law Dl4a 243pdflaw 243 Constitutional Law Contents Pages
No ratings yet
Dokumen - Tips - Law 243 Constitutional Law Dl4a 243pdflaw 243 Constitutional Law Contents Pages
90 pages
On Directing Film PDF
No ratings yet
On Directing Film PDF
27 pages