没有合适的资源?快使用搜索试试~ 我知道了~
基于CNN与Transformer联合网络的红外可见图像压缩融合算法研究
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 70 浏览量
2024-12-30
12:53:20
上传
评论
收藏 5.23MB PDF 举报
温馨提示
内容概要:本文提出了一种名为CFNet的红外可见图像压缩融合网络,该网络结合了变分自编码器(VAE)图像压缩方法和CNN与Transformer联合网络结构。该网络不仅能够有效融合红外和可见光图像,同时还能进行高效的数据压缩。实验结果表明,该方法在数据存储和传输方面表现优异,尤其是在率失真性能上优于现有的最新算法。该网络能够在融合过程中减少冗余信息,提升数据利用率,同时保持较高的视觉质量。 适合人群:具备机器学习和深度学习基础知识的研究人员和技术人员,尤其是从事图像处理、图像融合和图像压缩相关领域的专业人员。 使用场景及目标:适用于需要高效压缩和高质量图像融合的实际应用,如视频监控、目标检测、自动驾驶系统等场景。旨在提高图像数据的传输效率和存储空间利用率。 其他说明:本文还进行了大量的定量比较实验,验证了提出的CFNet算法在多个数据集上的优秀性能。此外,该网络设计了一个区域感兴趣多通道损失函数,可以有效地为前景区域分配更多比特位,从而提高压缩比。
资源推荐
资源详情
资源评论






























Pattern Recognition 156 (2024) 110774
Available online 14 July 2024
0031-3203/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
CFNet: An infrared and visible image compression fusion network
Mengliang Xing
a
, Gang Liu
a,
∗
, Haojie Tang
a
, Yao Qian
a
, Jun Zhang
b
a
School of Automation Engineering, Shanghai University of Electric Power, Shanghai 200090, China
b
Shanghai JA SOLAR Technology Co., Ltd, Shanghai 200436, China
A R T I C L E I N F O
Keywords:
Image fusion
Image compression
Variational autoencoder
Transformer
Region of interest
A B S T R A C T
Image fusion aims to acquire a more complete image representation within a limited physical space to more
effectively support practical vision applications. Although the currently popular infrared and visible image
fusion algorithms take practical applications into consideration. However, they did not fully consider the
redundancy and transmission efficiency of image data. To address this limitation, this paper proposes a
compression fusion network for infrared and visible images based on joint CNN and Transformer, termed
CFNet. First of all, the idea of variational autoencoder image compression is introduced into the image fusion
framework, achieving data compression while maintaining image fusion quality and reducing redundancy.
Moreover, a joint CNN and Transformer network structure is proposed, which comprehensively considers the
local information extracted by CNN and the global long-distance dependencies emphasized by Transformer.
Finally, multi-channel loss based on region of interest is used to guide network training. Not only can color
visible and infrared images be fused directly but more bits can be allocated to the foreground region of
interest, resulting in a superior compression ratio. Extensive qualitative and quantitative analyses affirm that
the proposed compression fusion algorithm achieves state-of-the-art performance. In particular, rate–distortion
performance experiments demonstrate the great advantages of the proposed algorithm for data storage and
transmission. The source code is available at https://github.com/Xiaoxing0503/CFNet.
1. Introduction
Pattern recognition technology, with its ability to identify and inter-
pret complex patterns in data, has become a crucial branch of modern
computing science. As an extension of pattern recognition, data fusion
technology is widely applied across diverse scientific and engineering
domains. Its purpose is to integrate multiple data sources to generate
more comprehensive and accurate information than any single data
source, while occupying less space, thereby greatly enhancing the accu-
racy and efficiency of decision-making systems. Similarly, the purpose
of data compression is to retain as much comprehensive information as
possible within the original space size while reducing space occupancy.
Fused images, rich in information and superior in visual effects, can
better perform practical visual tasks [1] such as semantic segmentation,
video surveillance, and target detection, etc.
Infrared and visible image fusion is a branch of data fusion. The
primary goal is to generate a fused image and retain the thermal
radiation information from the original infrared image and the texture
gradient information from the original visible image. In general, in-
frared and visible image fusion technology can be broadly categorized
into deep learning-based and traditional algorithms. The traditional
∗
Corresponding author.
algorithms can be roughly divided into multi-scale transformation al-
gorithms [2], sparse representation algorithms [3], subspace-based
algorithms [4], saliency-based algorithms [5], and hybrid methods [6],
etc. With the popularity of artificial intelligence, the prosperity of
infrared and visible image fusion algorithms based on deep learning has
been witnessed in recent years [7]. The mainstream deep learning al-
gorithms are an end-to-end based method [8], autoencoder-based (AE)
method [9], generative adversarial network (GAN)-based method [10],
and Transformer-based image fusion algorithm [11].
Typical image fusion algorithms do not have obvious differences in
visual quality, so image fusion algorithm components that fit actual
tasks have become mainstream. For example, RFNet [12] considers
registration during image fusion, which effectively alleviates the prob-
lem of artifacts after image fusion. SeAFusion [13] cascades the fusion
network with the semantic segmentation network, so that the fusion
task meets the requirements of advanced visual tasks. PIAFusion [14]
designs image fusion algorithms under abnormal lighting conditions
to effectively deal with extreme scenes such as underexposure and
overexposure. SDNet [8] designs a small model for image fusion to
achieve real-time fusion efficiency. However, these image fusion meth-
ods, while emphasizing actual task requirements, fail to fully consider
https://doi.org/10.1016/j.patcog.2024.110774
Received 5 February 2024; Received in revised form 8 July 2024; Accepted 9 July 2024

Pattern Recognition 156 (2024) 110774
2
M. Xing et al.
Fig. 1. Existing and proposed compression fusion algorithms.
the redundancy and transmission efficiency of image data. A notable
instance is the application of infrared and visible image fusion in
surveillance. The substantial volume of image data poses challenges
in terms of increased storage space and associated costs. Therefore,
a comprehensive consideration of storage and transmission concerns
associated with image fusion data becomes imperative.
As illustrated in Fig. 1(a), the existing image fusion algorithm
uses the fusion algorithm to fuse the original image, subsequently
compressing it into a bitstream for storage or transmission. Typically,
to preserve maximal information from the original image, most fused
images are stored in a lossless compression format. This practice incurs
significant storage space requirements. Alternatively, the use of lossy
compression methods jeopardizes the visual quality of the image. We
hope to design a joint network that unifies infrared and visible image
fusion and data compression into one framework. Fig. 1(b) shows the
proposed image fusion and compression algorithm idea. The original
image is fused and compressed by the encoder to obtain a data stream
for storage and transmission. The data stream passes through the fusion
and compression decoder to yield the final fused image. Image fusion
technology can improve the information content of the image, which
enables the pattern recognition system to learn from richer data and
improve the accuracy of recognition. When this is combined with an
efficient compression algorithm, effective feature extraction and recog-
nition can be achieved without sacrificing image quality. The network
that integrates image fusion and compression can reduce the dimension
and complexity of the data before processing. This approach enables the
subsequent pattern recognition algorithm to process large-scale datasets
more quickly without compromising image quality.
Based on the above discussion, we initially explore the collabora-
tive relationship between image fusion and image compression tasks.
Inevitably, we face some obstacles: (1) Designing an effective com-
pression fusion network is imperative. Image fusion methods primarily
emphasize information extraction and synthesis, whereas compression
networks target the elimination of redundancy in image data. This
design approach facilitates the comprehensive integration and opti-
mization of data. (2) Developing a new feature extraction module. The
fusion of infrared and visible images requires that the network not
only attends to local information but also possesses global modeling
capabilities [15,16]. Simultaneously, image compression requires the
extraction of relationships between features to exploit redundancy.
Although previous works use CNN and Transformer to obtain local
and global information simultaneously, their use of features is still
separate. For example, TCCFusion [17] et al. use CNN to model local
features and then use a transformer to mine global dependencies. (3)
Designing an efficient loss strategy suitable for color visible images.
Images contain both unimportant background areas and foreground
areas of interest. High compression ratios are used for unimportant
areas and low compression ratios for important foreground areas to
achieve efficient compression. Furthermore, most existing image fusion
algorithms, such as MFIFusion [18], typically transform color images
into YCbCr space. Only the brightness (Y) channel is used as the input
of the deep network to obtain the fused brightness channel. However,
relying solely on the brightness (Y) channel as the input for the fusion
compression network raises concerns about the compression of the
chroma channels (Cb and Cr). Ensuring comprehensive compression
across all channels is crucial for the objectives of this study.
To address these issues, this paper proposes a compression fusion
model that joint CNN and Transformer. First, our goal is to improve the
ability of computer to store images or the ability of network to transmit
images while ensuring the quality of image fusion. Therefore, we in-
corporate the variational autoencoder (VAE) image compression model
into the image fusion framework. Specifically, the image is mapped
into a latent feature space through an image encoder, facilitating the
subsequent quantized encoding process. Then, the hyper-prior codec is
employed to derive the probability distribution function of the feature
points, resulting in more compact features. This eliminates statistical
dependence to the greatest extent. Quantization coding is then applied
to generate a bitstream for storage and transmission, subsequently
decoded to produce the final fused image. Second, we embed CNN
and Transformer into the same module to aggregate local and non-
local information. The noteworthy accomplishments of Transformer in
image fusion have garnered considerable attention, primarily attributed
to its adeptness in capturing non-local information. Simultaneously,
local information significantly influences image fusion performance.
Hence, there arises a necessity to amalgamate the strengths of the
convolutional neural network (CNN), which captures local context
information, and the Transformer, known for its proficiency in han-
dling global dependencies, to effectively undertake image fusion tasks.
Finally, network training is guided by multi-channel pixel loss and
multi-channel gradient loss. Additionally, to optimize the compression
ratio, the fusion loss incorporates the foreground mask of the region of
interest. This enables the compression fusion network to automatically
allocate more bits to the foreground region of interest, while assigning
fewer bits to the background. This approach serves to safeguard criti-
cal information effectively, simultaneously minimizing redundancy. In
summary, the key contributions can be summarized as:
(1) As far as we know, this has been the first time the VAE com-
pression framework has been introduced into the field of image
fusion. This allows for a more compact feature representation.
While ensuring the quality of image fusion, it also improves the
efficiency of image storage and transmission.
(2) A novel joint CNN and Transformer network structure is pro-
posed. It aggregates the advantages of CNN in modeling local
information and Transformer in modeling global dependencies
at the same scale.
(3) A novel region of interest multi-channel loss function is designed
to guide network training. Not only can color visible and infrared
images be fused directly, but more bits can be allocated to the
foreground area of interest, resulting in a superior compression
ratio.
The rest of the article follows. In Section 2, we briefly introduce
related work on image fusion and image compression. In Section 3,
the proposed method is described. Section 4 illustrates the advan-
tages of the proposed method over other alternatives, especially the
rate–distortion performance. Conclusions are given in Section 5.
2. Related works
In this section, the development of image fusion algorithms based
on deep learning is briefly reviewed, and image compression methods
are briefly introduced.

Pattern Recognition 156 (2024) 110774
3
M. Xing et al.
2.1. Image fusion based on deep learning
2.1.1. Typical image fusion
In recent years, image fusion algorithms based on deep learning
have attracted great interest due to their powerful nonlinear fitting
capabilities. These algorithms can be categorized into four main types:
(1) AE-based image fusion methods: The AE-based method achieves
feature extraction and image reconstruction of a single source image
by training the encoder and decoder. Then, manually designed fusion
rules are used to fuse the features extracted by the encoder. The decoder
completes image reconstruction based on the fused features. As a classic
AE-based fusion algorithm, DenseFuse [9] designs dense block coding
layers to fully extract and effectively utilize features. MFIFusion [18]
uses dual-scale decomposition and multi-level feature injection to effec-
tively utilize valuable information in the middle layer of the network
and enrich the visual effect of the fused image. FSFusion [19] embeds
edge prior information into the network and designs a triple fusion
mechanism to achieve a high-contrast, clear-detail fused image.
(2) GAN-based image fusion methods: The GAN-based method takes
into account the ground truth, so that the discriminator forces the
generator to generate a fused image with infrared intensity and visible
gradient from a probabilistic perspective. FusionGAN [10] first intro-
duced the GAN into the field of image fusion, forcing the generator to
generate more texture details through the discriminator scoring of the
fused and the visible image. However, a single adversarial mechanism
can easily lead to unbalanced fusion. To this end, subsequent studies
such as DDcGAN [20] and TarDAL [21] use dual discriminators to
balance the distribution of different source images.
(3) The end-to-end fusion method: The end-to-end method directly
learns the fusion function from the original image and has become
a prevalent deep learning algorithm. As a representative end-to-end
image fusion algorithm, SDNet [8] introduces the idea of compression
decomposition to force the fusion result to contain more scene details.
ASFFuse [22] proposed an adaptive feature selection strategy to reduce
the impact of noise feature maps on fusion results and achieve better
visual effects.
(4) Transformer-based image fusion methods: CNNs are confined to
extracting local receptive fields, lacking the capability to unveil non-
local interrelationships within source images. In response, researchers
try to integrate transformers into the domain of image fusion, intending
to achieve local to global perception of the original image. Seeking
to harness both the local feature extraction capacities of CNN and
the global modeling capabilities of the Transformer simultaneously,
SePT [23] and TCCFusion [17] initially employ CNN for local fea-
ture modeling, followed by Transformer utilization to explore global
dependencies.
2.1.2. Practice-driven image fusion
Although typical image fusion methods achieve satisfactory results
in pursuing subjective visual quality and objective evaluation metrics,
they ignore the requirements of the real world. Consequently, many
more practical image fusion algorithms have been derived in recent
years:
(1) Unregistered Image Fusion: Image fusion without strict regis-
tration often leads to artifacts. To address this, many methods have
emerged that incorporate registration into image fusion. SuperFu-
sion [24], in managing registration and fusion, considers the semantic
requirements of high-level vision tasks. Unlike treating image regis-
tration and fusion as separate problems, MURF [25] intertwines these
tasks, making them mutually reinforcing.
(2) High-level visual tasks driven image fusion: To ensure that
the fused image aligns with the demands of high-level visual tasks,
Tang et al. first proposed the semantic-driven image fusion algorithm,
named SeAFusion [13]. SeAFusion cascades the fusion network and
the semantic segmentation network, enabling the feedback of semantic
data to the fusion network. Similarly, TarDAL [21], proposed by Liu
et al. jointly optimizes image fusion and target detection, achieving
higher detection accuracy in fused images.
(3) Image fusion under extreme conditions: In practical applications,
addressing extreme scenarios, such as underexposure and overexpo-
sure, is often imperative. Consequently, image fusion algorithms tai-
lored for abnormal lighting conditions have emerged. PIAFusion [14]
designed a lighting perception sub-network, aiding in fusion network
training and adaptively integrating meaningful information based on
lighting conditions. DIVFusion [26] introduces a scene illumination
unwrapping network and a texture contrast enhancement fusion net-
work. This establishes an effective coupling and reciprocal relationship
between image fusion and image enhancement.
(4) Real-time image fusion: For high-level visual tasks, early pre-
processing processes such as image fusion often have high real-time
requirements. SDNet [8] optimizes image fusion operation efficiency
by reducing the number of model parameters. Recently, APWNet [27]
has designed a simple network and incorporates a detection model to
dynamically learn pixel-wise weights for image fusion, achieving the
fastest fusion network.
2.2. Image compression
Image compression technology, as one of the key technologies in
the field of image processing, holds significance in alleviating stor-
age and transmission overhead. Over the past few decades, numerous
traditional compression standards have emerged. Specifically, tradi-
tional image compression mostly uses fixed transformation methods
and quantization coding frameworks, such as discrete cosine transform
and discrete wavelet transform. As one of the compression standards,
JPEG [28] has played a pivotal role in the field of image compression in
recent decades and is still widely used today. The JPEG2000 [29] image
compression standard first proposed a region of interest compression
method. It changes the compression rate and quality of each region so
that the regions designated as important are of high quality.
Traditional compression standards rely on manual techniques. In
recent years, learned image compression methods have attracted in-
creasing attention. Ballé et al. first proposed an image compression
model by minimizing the rate–distortion trade-off. Then they intro-
duced a VAE model containing a hyper-prior network [30] to improve
image compression. In order to achieve better bitrate estimation, [31]
extended the work of [30] and used an autoregressive model to im-
prove image compression, surpassing the BPG traditional compression
algorithm in rate–distortion performance for the first time. Recently,
LIC-TCM [32] introduced an efficient parallel Transformer-CNN mod-
ule, inspiring the joint CNN and Transformer model proposed in this
article.
3. Method
This section provides a comprehensive introduction to the image
compression fusion network. The problem formulation is first clearly
defined and then provides a detailed description of the network frame-
work and loss function.
3.1. Problem formulation
We first define the problem that this study attempts to solve, which
is how to effectively fuse large-scale, high-dimensional, and hetero-
geneous data. In the field of multi-source image processing, infrared
and visible image fusion is applied in various practical scenarios. The
purpose is to combine the thermal radiation information of infrared
images and the color and texture information of visible images. This
provides a more comprehensive and accurate image than any single
image. At the same time, with the increasing scale and complexity of
image data, traditional image fusion technology faces huge challenges
in fusion quality and data storage.

Pattern Recognition 156 (2024) 110774
4
M. Xing et al.
To address these challenges, diverging from the prior emphasis
solely on fused image quality, we propose that the image fusion method
should consider the space utilization issues caused by storage and
transmission, that is, compression-oriented image fusion. The core idea
of compression-oriented fusion involves integrating image compression
technology into the image fusion process. The primary objective is to
minimize the space occupied by data while ensuring optimal fusion
quality.
In infrared and visible image fusion applications, global information
provides an overall thermal radiation picture, while local information
reveals details such as subtle thermal radiation and object edges. Tra-
ditional fusion networks may struggle to balance these aspects, so we
design a joint CNN and Transformer structure to extract both local and
global interrelationships within the source image.
In addition, within the compression concept, a crucial element is
the region of interest (ROI). Similarly, in the fusion process of infrared
and visible images, we designate ROI as the region encompassing
vital thermal radiation information and segmented foreground targets.
Conversely, the relatively unimportant background (BG) area typi-
cally contains less information. Therefore, throughout the compression
and fusion process, more resources are allocated to the ROI, thereby
retaining more important information. Building upon the preceding
discussion, by training the compression fusion model end-to-end, we
define the overall optimization loss as:
= + 𝜆 ⋅
𝑓
, (1)
where is the rate, denoting the expected code length (bit rate) after
compression,
𝑓
is the fusion loss, and 𝜆 is a trade-off coefficient
used to balance rate and fusion distortion. The minimum average
code length is achieved by minimizing the expectation on the data
distribution 𝑝
𝑥
of the original image 𝑥. is given by the Shannon cross-
entropy between the latent representation and the actual distribution:
= E
𝑥∼𝑝
𝑥
[− log
2
𝑝
𝑦|𝑧
( 𝑦 ∣ 𝑧) − log
2
𝑝
𝑧
( 𝑧)], (2)
where 𝑥 represents the original input image, 𝑦 and 𝑧 represent the
quantized latent representations and hyper latent representations, re-
spectively.
The fusion loss, incorporating both ROI loss (
𝑟𝑜𝑖
) and BG (
𝑏𝑔
) loss,
is calculated by the formula:
𝑓
=
𝑟𝑜𝑖
(𝐼
1
; 𝐼
2
; 𝑀) + 𝛼 ⋅
𝑏𝑔
(𝐼
1
; 𝐼
2
; 1 − 𝑀), (3)
where
𝑟𝑜𝑖
(𝐼
1
; 𝐼
2
; 𝑀) is a loss function in the interest area to evaluate
the similarity or differences between the two images 𝐼
1
and 𝐼
1
in the
ROI area. Background loss
𝑏𝑔
(𝐼
1
; 𝐼
2
; 1 − 𝑀) uses 1 − 𝑀 as a mask
to evaluate the performance of images 𝐼
1
and 𝐼
2
in the background
area. 𝐼
1
, 𝐼
2
represent the original input image. 𝑀 stands for the ROI
foreground mask, and 1 − 𝑀 means the background mask. 𝛼 is the
trade-off coefficient between ROI and BG area. To focus on the ROI
area, 𝜇 < 1.
3.2. Overall framework
The method we proposed uses the variational autoencoder, which
is the first attempt in the field of image fusion. As depicted in Fig. 2,
the proposed method includes an image encoder module 𝑔
𝑎
, a hyper-
prior encoder module ℎ
𝑎
, a hyper-prior decoder module ℎ
𝑠
, an image
decoder module 𝑔
𝑠
and two quantization and entropy coding modules
𝑈 |𝑄
𝑦
and 𝑈 |𝑄
𝑧
. In summary, the infrared image 𝐼
𝑖𝑟
∈ R
𝐻×𝑊 ×1
and the
visible image 𝐼
𝜈𝑖
∈ R
𝐻×𝑊 ×3
are first cascaded into the image encoder
module 𝑔
𝑎
to derive the latent feature representation 𝑦. These features
are input into the hyper-prior encoder module ℎ
𝑎
, yielding the hyper-
prior representation 𝑧. The latent feature representation 𝑦 and the
hyper-prior representation 𝑧 has passed through the quantification
process and converted into discrete forms. 𝑧 is arithmetic encoded and
arithmetic decoded into the quantized hyper latent representation 𝑧.
Passing through the hyper-prior decoder ℎ
𝑠
, 𝑧 produces the mean 𝜇
and scale 𝜎 of the Gaussian condition (𝜇, 𝜎). 𝑦 is encoded according
to the probability distribution of features, which depends on statistical
data (𝜇, 𝜎) output by the hyper-prior encoder output. Then sample
a specific latent feature 𝑦 from the probability distribution. The new 𝑦
passes through the image decoder 𝑔
𝑠
to yield the final fused image.
The entire process is described in detail below. The input grayscale
infrared image 𝐼
𝑖𝑟
∈ R
𝐻×𝑊 ×1
and the color visible image 𝐼
𝑣𝑖
∈ R
𝐻×𝑊 ×3
are cascaded and sent to the image encoder 𝑔
𝑎
. 𝑔
𝑎
consists of three
groups of downsampling residual blocks (DRB) and a joint CNN and
Transformer module (JCT), and finally passes through a downsampling
convolution layer. In each residual block, the input undergoes a series
of transformations: it traverses a convolutional layer with a LeakyReLU
activation and a stride of 2, followed by another convolutional layer
with a generalized division normalization (GDN). The output is then
added to the initial input, producing the residual block output. Notably,
GDN has been proven to be a nonlinear transformation layer more
suitable for image reconstruction or generation [33]. The JCT module
is a module that aggregates the local modeling capabilities of CNN
and the global modeling capabilities of Transformer. It is introduced in
detail in Section 3.3. The downsampling convolutional layer, featuring
a stride of 2, concludes the encoding process, yielding the latent feature
representation 𝑦. Image size is reduced by a factor of 16 during image
encoding. The entire image encoding process can be expressed as:
𝑦 = 𝑔
𝑎
(𝐶𝑎𝑡(𝑥
1
, 𝑥
2
)), (4)
where 𝐶𝑎𝑡 denotes the concatenation operation in the channel dimen-
sion.
Subsequently, the latent feature representation 𝑦 is fed into the
hyper-prior encoder ℎ
𝑠
, which consists of downsampling residual
blocks, JCT and downsampling convolutional layers. The output is the
hyper latent feature representation 𝑧, with the feature size reduced by
a factor of 4. The entire hyper-prior encoding process can be expressed
as:
𝑧 = ℎ
𝑎
(𝑦), (5)
where ℎ
𝑎
is a hyper-prior encoder, which maps the input data 𝑦 to
an encoding to represent 𝑧. This code indicates that 𝑧 aims to capture
the key information of 𝑦, and at the same time remove redundant and
unrelated parts, thereby achieving effective compression of data.
Afterwards, 𝑧 is quantized and arithmetic encoded to generate
a hyper latent bitstream. The bitstream is used as part of storage
or transmission. The bitstream is arithmetic decoded to obtain the
quantized hyper latent feature representation 𝑧. Note that entropy
coding is lossless, so entropy coding does not appear in training. The
quantization process can be expressed as:
𝑧 = 𝑄(𝑧), (6)
where 𝑄 represents the quantization and entropy coding. During train-
ing, quantization is approximated by uniform noise 𝑢(−
1
2
,
1
2
). The quan-
tization method has little impact on the quality of decoding and recon-
struction. Therefore, this article, akin to most methods, uses a method
of adding uniform noise that has low computational complexity and is
easy to operate.
Then, the quantized hyper-latent feature representation 𝑧 is passed
through the hyper-prior decoder ℎ
𝑠
to generate the Gaussian-
conditioned mean 𝜇 and scale 𝜎. ℎ
𝑠
corresponds to ℎ
𝑎
and consists
of upsampling residual blocks, JCT and sub-pixel convolution layers.
In the upsampling residual blocks (URB), the input initially traverses
a sub-pixel convolution layer with a LeakyReLU activation, followed
by a convolution layer employing an inverse generalized division
normalization (IGDN). The final output is obtained through an addition
剩余15页未读,继续阅读
资源评论


普通网友
- 粉丝: 4167
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- matlab-Matlab资源
- 【DevOps领域】DevOps流程落地实战指南:涵盖代码管理、持续集成、容器化部署与自动化运维的全流程实践
- 深度学习图像分类领域的新手入门指导教程
- 卫星拍摄下的水体图像语义分割数据集(约2300张数据和标签,已处理完可以直接训练,2类别图像分割)
- 微服务与前端开发实战指南
- yiwa-机器人开发资源
- nexfly-AI人工智能资源
- salvo-Rust资源
- 编程语言Go语言特性解析与应用开发:涵盖高效并发编程、跨平台支持及命令行工具开发
- 基于深度学习的无线通信论文与代码整理
- Web开发PHP服务器端脚本语言特性、功能及应用场景详解:从简单示例到项目实践
- tpframe-移动应用开发资源
- STM32F103RCT6-单片机开发资源
- vue3-ts-cesium-map-show-Typescript资源
- PandaX-Go资源
- 【单片机开发】从基础到实践:涵盖硬件组成、开发环境搭建、编程基础、外设接口、系统设计进阶、调试优化及实际项目案例
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
