基于CNN与Transformer联合网络的红外可见图像压缩融合算法研究

版权申诉

70 浏览量 2024-12-30 12:53:20 上传评论收藏 5.23MB PDF 举报

资源推荐

资源详情

资源评论

Pattern Recognition 156 (2024) 110774

Available online 14 July 2024

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/pr

CFNet: An infrared and visible image compression fusion network

Mengliang Xing

, Gang Liu

∗

, Haojie Tang

, Yao Qian

, Jun Zhang

School of Automation Engineering, Shanghai University of Electric Power, Shanghai 200090, China

Shanghai JA SOLAR Technology Co., Ltd, Shanghai 200436, China

A R T I C L E I N F O

Keywords:

Image fusion

Image compression

Variational autoencoder

Transformer

Region of interest

A B S T R A C T

Image fusion aims to acquire a more complete image representation within a limited physical space to more

effectively support practical vision applications. Although the currently popular infrared and visible image

fusion algorithms take practical applications into consideration. However, they did not fully consider the

redundancy and transmission efficiency of image data. To address this limitation, this paper proposes a

compression fusion network for infrared and visible images based on joint CNN and Transformer, termed

CFNet. First of all, the idea of variational autoencoder image compression is introduced into the image fusion

framework, achieving data compression while maintaining image fusion quality and reducing redundancy.

Moreover, a joint CNN and Transformer network structure is proposed, which comprehensively considers the

local information extracted by CNN and the global long-distance dependencies emphasized by Transformer.

Finally, multi-channel loss based on region of interest is used to guide network training. Not only can color

visible and infrared images be fused directly but more bits can be allocated to the foreground region of

interest, resulting in a superior compression ratio. Extensive qualitative and quantitative analyses affirm that

the proposed compression fusion algorithm achieves state-of-the-art performance. In particular, rate–distortion

performance experiments demonstrate the great advantages of the proposed algorithm for data storage and

transmission. The source code is available at https://github.com/Xiaoxing0503/CFNet.

1. Introduction

Pattern recognition technology, with its ability to identify and inter-

pret complex patterns in data, has become a crucial branch of modern

computing science. As an extension of pattern recognition, data fusion

technology is widely applied across diverse scientific and engineering

domains. Its purpose is to integrate multiple data sources to generate

more comprehensive and accurate information than any single data

source, while occupying less space, thereby greatly enhancing the accu-

racy and efficiency of decision-making systems. Similarly, the purpose

of data compression is to retain as much comprehensive information as

possible within the original space size while reducing space occupancy.

Fused images, rich in information and superior in visual effects, can

better perform practical visual tasks [1] such as semantic segmentation,

video surveillance, and target detection, etc.

Infrared and visible image fusion is a branch of data fusion. The

primary goal is to generate a fused image and retain the thermal

radiation information from the original infrared image and the texture

gradient information from the original visible image. In general, in-

frared and visible image fusion technology can be broadly categorized

into deep learning-based and traditional algorithms. The traditional

∗

Corresponding author.

E-mail address: [email protected] (G. Liu).

algorithms can be roughly divided into multi-scale transformation al-

gorithms [2], sparse representation algorithms [3], subspace-based

algorithms [4], saliency-based algorithms [5], and hybrid methods [6],

etc. With the popularity of artificial intelligence, the prosperity of

infrared and visible image fusion algorithms based on deep learning has

been witnessed in recent years [7]. The mainstream deep learning al-

gorithms are an end-to-end based method [8], autoencoder-based (AE)

method [9], generative adversarial network (GAN)-based method [10],

and Transformer-based image fusion algorithm [11].

Typical image fusion algorithms do not have obvious differences in

visual quality, so image fusion algorithm components that fit actual

tasks have become mainstream. For example, RFNet [12] considers

registration during image fusion, which effectively alleviates the prob-

lem of artifacts after image fusion. SeAFusion [13] cascades the fusion

network with the semantic segmentation network, so that the fusion

task meets the requirements of advanced visual tasks. PIAFusion [14]

designs image fusion algorithms under abnormal lighting conditions

to effectively deal with extreme scenes such as underexposure and

overexposure. SDNet [8] designs a small model for image fusion to

achieve real-time fusion efficiency. However, these image fusion meth-

ods, while emphasizing actual task requirements, fail to fully consider

https://doi.org/10.1016/j.patcog.2024.110774

Received 5 February 2024; Received in revised form 8 July 2024; Accepted 9 July 2024

Pattern Recognition 156 (2024) 110774

M. Xing et al.

Fig. 1. Existing and proposed compression fusion algorithms.

the redundancy and transmission efficiency of image data. A notable

instance is the application of infrared and visible image fusion in

surveillance. The substantial volume of image data poses challenges

in terms of increased storage space and associated costs. Therefore,

a comprehensive consideration of storage and transmission concerns

associated with image fusion data becomes imperative.

As illustrated in Fig. 1(a), the existing image fusion algorithm

uses the fusion algorithm to fuse the original image, subsequently

compressing it into a bitstream for storage or transmission. Typically,

to preserve maximal information from the original image, most fused

images are stored in a lossless compression format. This practice incurs

significant storage space requirements. Alternatively, the use of lossy

compression methods jeopardizes the visual quality of the image. We

hope to design a joint network that unifies infrared and visible image

fusion and data compression into one framework. Fig. 1(b) shows the

proposed image fusion and compression algorithm idea. The original

image is fused and compressed by the encoder to obtain a data stream

for storage and transmission. The data stream passes through the fusion

and compression decoder to yield the final fused image. Image fusion

technology can improve the information content of the image, which

enables the pattern recognition system to learn from richer data and

improve the accuracy of recognition. When this is combined with an

efficient compression algorithm, effective feature extraction and recog-

nition can be achieved without sacrificing image quality. The network

that integrates image fusion and compression can reduce the dimension

and complexity of the data before processing. This approach enables the

subsequent pattern recognition algorithm to process large-scale datasets

more quickly without compromising image quality.

Based on the above discussion, we initially explore the collabora-

tive relationship between image fusion and image compression tasks.

Inevitably, we face some obstacles: (1) Designing an effective com-

pression fusion network is imperative. Image fusion methods primarily

emphasize information extraction and synthesis, whereas compression

networks target the elimination of redundancy in image data. This

design approach facilitates the comprehensive integration and opti-

mization of data. (2) Developing a new feature extraction module. The

fusion of infrared and visible images requires that the network not

only attends to local information but also possesses global modeling

capabilities [15,16]. Simultaneously, image compression requires the

extraction of relationships between features to exploit redundancy.

Although previous works use CNN and Transformer to obtain local

and global information simultaneously, their use of features is still

separate. For example, TCCFusion [17] et al. use CNN to model local

features and then use a transformer to mine global dependencies. (3)

Designing an efficient loss strategy suitable for color visible images.

Images contain both unimportant background areas and foreground

areas of interest. High compression ratios are used for unimportant

areas and low compression ratios for important foreground areas to

achieve efficient compression. Furthermore, most existing image fusion

algorithms, such as MFIFusion [18], typically transform color images

into YCbCr space. Only the brightness (Y) channel is used as the input

of the deep network to obtain the fused brightness channel. However,

relying solely on the brightness (Y) channel as the input for the fusion

compression network raises concerns about the compression of the

chroma channels (Cb and Cr). Ensuring comprehensive compression

across all channels is crucial for the objectives of this study.

To address these issues, this paper proposes a compression fusion

model that joint CNN and Transformer. First, our goal is to improve the

ability of computer to store images or the ability of network to transmit

images while ensuring the quality of image fusion. Therefore, we in-

corporate the variational autoencoder (VAE) image compression model

into the image fusion framework. Specifically, the image is mapped

into a latent feature space through an image encoder, facilitating the

subsequent quantized encoding process. Then, the hyper-prior codec is

employed to derive the probability distribution function of the feature

points, resulting in more compact features. This eliminates statistical

dependence to the greatest extent. Quantization coding is then applied

to generate a bitstream for storage and transmission, subsequently

decoded to produce the final fused image. Second, we embed CNN

and Transformer into the same module to aggregate local and non-

local information. The noteworthy accomplishments of Transformer in

image fusion have garnered considerable attention, primarily attributed

to its adeptness in capturing non-local information. Simultaneously,

local information significantly influences image fusion performance.

Hence, there arises a necessity to amalgamate the strengths of the

convolutional neural network (CNN), which captures local context

information, and the Transformer, known for its proficiency in han-

dling global dependencies, to effectively undertake image fusion tasks.

Finally, network training is guided by multi-channel pixel loss and

multi-channel gradient loss. Additionally, to optimize the compression

ratio, the fusion loss incorporates the foreground mask of the region of

interest. This enables the compression fusion network to automatically

allocate more bits to the foreground region of interest, while assigning

fewer bits to the background. This approach serves to safeguard criti-

cal information effectively, simultaneously minimizing redundancy. In

summary, the key contributions can be summarized as:

(1) As far as we know, this has been the first time the VAE com-

pression framework has been introduced into the field of image

fusion. This allows for a more compact feature representation.

While ensuring the quality of image fusion, it also improves the

efficiency of image storage and transmission.

(2) A novel joint CNN and Transformer network structure is pro-

posed. It aggregates the advantages of CNN in modeling local

information and Transformer in modeling global dependencies

at the same scale.

(3) A novel region of interest multi-channel loss function is designed

to guide network training. Not only can color visible and infrared

images be fused directly, but more bits can be allocated to the

foreground area of interest, resulting in a superior compression

ratio.

The rest of the article follows. In Section 2, we briefly introduce

related work on image fusion and image compression. In Section 3,

the proposed method is described. Section 4 illustrates the advan-

tages of the proposed method over other alternatives, especially the

rate–distortion performance. Conclusions are given in Section 5.

2. Related works

In this section, the development of image fusion algorithms based

on deep learning is briefly reviewed, and image compression methods

are briefly introduced.

Pattern Recognition 156 (2024) 110774

M. Xing et al.

2.1. Image fusion based on deep learning

2.1.1. Typical image fusion

In recent years, image fusion algorithms based on deep learning

have attracted great interest due to their powerful nonlinear fitting

capabilities. These algorithms can be categorized into four main types:

(1) AE-based image fusion methods: The AE-based method achieves

feature extraction and image reconstruction of a single source image

by training the encoder and decoder. Then, manually designed fusion

rules are used to fuse the features extracted by the encoder. The decoder

completes image reconstruction based on the fused features. As a classic

AE-based fusion algorithm, DenseFuse [9] designs dense block coding

layers to fully extract and effectively utilize features. MFIFusion [18]

uses dual-scale decomposition and multi-level feature injection to effec-

tively utilize valuable information in the middle layer of the network

and enrich the visual effect of the fused image. FSFusion [19] embeds

edge prior information into the network and designs a triple fusion

mechanism to achieve a high-contrast, clear-detail fused image.

(2) GAN-based image fusion methods: The GAN-based method takes

into account the ground truth, so that the discriminator forces the

generator to generate a fused image with infrared intensity and visible

gradient from a probabilistic perspective. FusionGAN [10] first intro-

duced the GAN into the field of image fusion, forcing the generator to

generate more texture details through the discriminator scoring of the

fused and the visible image. However, a single adversarial mechanism

can easily lead to unbalanced fusion. To this end, subsequent studies

such as DDcGAN [20] and TarDAL [21] use dual discriminators to

balance the distribution of different source images.

(3) The end-to-end fusion method: The end-to-end method directly

learns the fusion function from the original image and has become

a prevalent deep learning algorithm. As a representative end-to-end

image fusion algorithm, SDNet [8] introduces the idea of compression

decomposition to force the fusion result to contain more scene details.

ASFFuse [22] proposed an adaptive feature selection strategy to reduce

the impact of noise feature maps on fusion results and achieve better

visual effects.

(4) Transformer-based image fusion methods: CNNs are confined to

extracting local receptive fields, lacking the capability to unveil non-

local interrelationships within source images. In response, researchers

try to integrate transformers into the domain of image fusion, intending

to achieve local to global perception of the original image. Seeking

to harness both the local feature extraction capacities of CNN and

the global modeling capabilities of the Transformer simultaneously,

SePT [23] and TCCFusion [17] initially employ CNN for local fea-

ture modeling, followed by Transformer utilization to explore global

dependencies.

2.1.2. Practice-driven image fusion

Although typical image fusion methods achieve satisfactory results

in pursuing subjective visual quality and objective evaluation metrics,

they ignore the requirements of the real world. Consequently, many

more practical image fusion algorithms have been derived in recent

years:

(1) Unregistered Image Fusion: Image fusion without strict regis-

tration often leads to artifacts. To address this, many methods have

emerged that incorporate registration into image fusion. SuperFu-

sion [24], in managing registration and fusion, considers the semantic

requirements of high-level vision tasks. Unlike treating image regis-

tration and fusion as separate problems, MURF [25] intertwines these

tasks, making them mutually reinforcing.

(2) High-level visual tasks driven image fusion: To ensure that

the fused image aligns with the demands of high-level visual tasks,

Tang et al. first proposed the semantic-driven image fusion algorithm,

named SeAFusion [13]. SeAFusion cascades the fusion network and

the semantic segmentation network, enabling the feedback of semantic

data to the fusion network. Similarly, TarDAL [21], proposed by Liu

et al. jointly optimizes image fusion and target detection, achieving

higher detection accuracy in fused images.

(3) Image fusion under extreme conditions: In practical applications,

addressing extreme scenarios, such as underexposure and overexpo-

sure, is often imperative. Consequently, image fusion algorithms tai-

lored for abnormal lighting conditions have emerged. PIAFusion [14]

designed a lighting perception sub-network, aiding in fusion network

training and adaptively integrating meaningful information based on

lighting conditions. DIVFusion [26] introduces a scene illumination

unwrapping network and a texture contrast enhancement fusion net-

work. This establishes an effective coupling and reciprocal relationship

between image fusion and image enhancement.

(4) Real-time image fusion: For high-level visual tasks, early pre-

processing processes such as image fusion often have high real-time

requirements. SDNet [8] optimizes image fusion operation efficiency

by reducing the number of model parameters. Recently, APWNet [27]

has designed a simple network and incorporates a detection model to

dynamically learn pixel-wise weights for image fusion, achieving the

fastest fusion network.

2.2. Image compression

Image compression technology, as one of the key technologies in

the field of image processing, holds significance in alleviating stor-

age and transmission overhead. Over the past few decades, numerous

traditional compression standards have emerged. Specifically, tradi-

tional image compression mostly uses fixed transformation methods

and quantization coding frameworks, such as discrete cosine transform

and discrete wavelet transform. As one of the compression standards,

JPEG [28] has played a pivotal role in the field of image compression in

recent decades and is still widely used today. The JPEG2000 [29] image

compression standard first proposed a region of interest compression

method. It changes the compression rate and quality of each region so

that the regions designated as important are of high quality.

Traditional compression standards rely on manual techniques. In

recent years, learned image compression methods have attracted in-

creasing attention. Ballé et al. first proposed an image compression

model by minimizing the rate–distortion trade-off. Then they intro-

duced a VAE model containing a hyper-prior network [30] to improve

image compression. In order to achieve better bitrate estimation, [31]

extended the work of [30] and used an autoregressive model to im-

prove image compression, surpassing the BPG traditional compression

algorithm in rate–distortion performance for the first time. Recently,

LIC-TCM [32] introduced an efficient parallel Transformer-CNN mod-

ule, inspiring the joint CNN and Transformer model proposed in this

article.

3. Method

This section provides a comprehensive introduction to the image

compression fusion network. The problem formulation is first clearly

defined and then provides a detailed description of the network frame-

work and loss function.

3.1. Problem formulation

We first define the problem that this study attempts to solve, which

is how to effectively fuse large-scale, high-dimensional, and hetero-

geneous data. In the field of multi-source image processing, infrared

and visible image fusion is applied in various practical scenarios. The

purpose is to combine the thermal radiation information of infrared

images and the color and texture information of visible images. This

provides a more comprehensive and accurate image than any single

image. At the same time, with the increasing scale and complexity of

image data, traditional image fusion technology faces huge challenges

in fusion quality and data storage.

Pattern Recognition 156 (2024) 110774

M. Xing et al.

To address these challenges, diverging from the prior emphasis

solely on fused image quality, we propose that the image fusion method

should consider the space utilization issues caused by storage and

transmission, that is, compression-oriented image fusion. The core idea

of compression-oriented fusion involves integrating image compression

technology into the image fusion process. The primary objective is to

minimize the space occupied by data while ensuring optimal fusion

quality.

In infrared and visible image fusion applications, global information

provides an overall thermal radiation picture, while local information

reveals details such as subtle thermal radiation and object edges. Tra-

ditional fusion networks may struggle to balance these aspects, so we

design a joint CNN and Transformer structure to extract both local and

global interrelationships within the source image.

In addition, within the compression concept, a crucial element is

the region of interest (ROI). Similarly, in the fusion process of infrared

and visible images, we designate ROI as the region encompassing

vital thermal radiation information and segmented foreground targets.

Conversely, the relatively unimportant background (BG) area typi-

cally contains less information. Therefore, throughout the compression

and fusion process, more resources are allocated to the ROI, thereby

retaining more important information. Building upon the preceding

discussion, by training the compression fusion model end-to-end, we

define the overall optimization loss as:

 =  + 𝜆 ⋅ 

𝑓

, (1)

where  is the rate, denoting the expected code length (bit rate) after

compression, 

𝑓

is the fusion loss, and 𝜆 is a trade-off coefficient

used to balance rate and fusion distortion. The minimum average

code length  is achieved by minimizing the expectation on the data

distribution 𝑝

𝑥

of the original image 𝑥.  is given by the Shannon cross-

entropy between the latent representation and the actual distribution:

 = E

𝑥∼𝑝

𝑥

[− log

𝑝

𝑦|𝑧

( 𝑦 ∣ 𝑧) − log

𝑝

𝑧

( 𝑧)], (2)

where 𝑥 represents the original input image, 𝑦 and 𝑧 represent the

quantized latent representations and hyper latent representations, re-

spectively.

The fusion loss, incorporating both ROI loss (

𝑟𝑜𝑖

) and BG (

𝑏𝑔

) loss,

is calculated by the formula:



𝑓

= 

𝑟𝑜𝑖

(𝐼

; 𝐼

; 𝑀) + 𝛼 ⋅ 

𝑏𝑔

(𝐼

; 𝐼

; 1 − 𝑀), (3)

where 

𝑟𝑜𝑖

(𝐼

; 𝐼

; 𝑀) is a loss function in the interest area to evaluate

the similarity or differences between the two images 𝐼

and 𝐼

in the

ROI area. Background loss 

𝑏𝑔

(𝐼

; 𝐼

; 1 − 𝑀) uses 1 − 𝑀 as a mask

to evaluate the performance of images 𝐼

and 𝐼

in the background

area. 𝐼

, 𝐼

represent the original input image. 𝑀 stands for the ROI

foreground mask, and 1 − 𝑀 means the background mask. 𝛼 is the

trade-off coefficient between ROI and BG area. To focus on the ROI

area, 𝜇 < 1.

3.2. Overall framework

The method we proposed uses the variational autoencoder, which

is the first attempt in the field of image fusion. As depicted in Fig. 2,

the proposed method includes an image encoder module 𝑔

𝑎

, a hyper-

prior encoder module ℎ

𝑎

, a hyper-prior decoder module ℎ

𝑠

, an image

decoder module 𝑔

𝑠

and two quantization and entropy coding modules

𝑈 |𝑄

𝑦

and 𝑈 |𝑄

𝑧

. In summary, the infrared image 𝐼

𝑖𝑟

∈ R

𝐻×𝑊 ×1

and the

visible image 𝐼

𝜈𝑖

∈ R

𝐻×𝑊 ×3

are first cascaded into the image encoder

module 𝑔

𝑎

to derive the latent feature representation 𝑦. These features

are input into the hyper-prior encoder module ℎ

𝑎

, yielding the hyper-

prior representation 𝑧. The latent feature representation 𝑦 and the

hyper-prior representation 𝑧 has passed through the quantification

process and converted into discrete forms. 𝑧 is arithmetic encoded and

arithmetic decoded into the quantized hyper latent representation 𝑧.

Passing through the hyper-prior decoder ℎ

𝑠

, 𝑧 produces the mean 𝜇

and scale 𝜎 of the Gaussian condition  (𝜇, 𝜎). 𝑦 is encoded according

to the probability distribution of features, which depends on statistical

data  (𝜇, 𝜎) output by the hyper-prior encoder output. Then sample

a specific latent feature 𝑦 from the probability distribution. The new 𝑦

passes through the image decoder 𝑔

𝑠

to yield the final fused image.

The entire process is described in detail below. The input grayscale

infrared image 𝐼

𝑖𝑟

∈ R

𝐻×𝑊 ×1

and the color visible image 𝐼

𝑣𝑖

∈ R

𝐻×𝑊 ×3

are cascaded and sent to the image encoder 𝑔

𝑎

. 𝑔

𝑎

consists of three

groups of downsampling residual blocks (DRB) and a joint CNN and

Transformer module (JCT), and finally passes through a downsampling

convolution layer. In each residual block, the input undergoes a series

of transformations: it traverses a convolutional layer with a LeakyReLU

activation and a stride of 2, followed by another convolutional layer

with a generalized division normalization (GDN). The output is then

added to the initial input, producing the residual block output. Notably,

GDN has been proven to be a nonlinear transformation layer more

suitable for image reconstruction or generation [33]. The JCT module

is a module that aggregates the local modeling capabilities of CNN

and the global modeling capabilities of Transformer. It is introduced in

detail in Section 3.3. The downsampling convolutional layer, featuring

a stride of 2, concludes the encoding process, yielding the latent feature

representation 𝑦. Image size is reduced by a factor of 16 during image

encoding. The entire image encoding process can be expressed as:

𝑦 = 𝑔

𝑎

(𝐶𝑎𝑡(𝑥

, 𝑥

)), (4)

where 𝐶𝑎𝑡 denotes the concatenation operation in the channel dimen-

sion.

Subsequently, the latent feature representation 𝑦 is fed into the

hyper-prior encoder ℎ

𝑠

, which consists of downsampling residual

blocks, JCT and downsampling convolutional layers. The output is the

hyper latent feature representation 𝑧, with the feature size reduced by

a factor of 4. The entire hyper-prior encoding process can be expressed

as:

𝑧 = ℎ

𝑎

(𝑦), (5)

where ℎ

𝑎

is a hyper-prior encoder, which maps the input data 𝑦 to

an encoding to represent 𝑧. This code indicates that 𝑧 aims to capture

the key information of 𝑦, and at the same time remove redundant and

unrelated parts, thereby achieving effective compression of data.

Afterwards, 𝑧 is quantized and arithmetic encoded to generate

a hyper latent bitstream. The bitstream is used as part of storage

or transmission. The bitstream is arithmetic decoded to obtain the

quantized hyper latent feature representation 𝑧. Note that entropy

coding is lossless, so entropy coding does not appear in training. The

quantization process can be expressed as:

𝑧 = 𝑄(𝑧), (6)

where 𝑄 represents the quantization and entropy coding. During train-

ing, quantization is approximated by uniform noise 𝑢(−

). The quan-

tization method has little impact on the quality of decoding and recon-

struction. Therefore, this article, akin to most methods, uses a method

of adding uniform noise that has low computational complexity and is

easy to operate.

Then, the quantized hyper-latent feature representation 𝑧 is passed

through the hyper-prior decoder ℎ

𝑠

to generate the Gaussian-

conditioned mean 𝜇 and scale 𝜎. ℎ

𝑠

corresponds to ℎ

𝑎

and consists

of upsampling residual blocks, JCT and sub-pixel convolution layers.

In the upsampling residual blocks (URB), the input initially traverses

a sub-pixel convolution layer with a LeakyReLU activation, followed

by a convolution layer employing an inverse generalized division

normalization (IGDN). The final output is obtained through an addition

剩余15页未读，继续阅读

评论收藏

内容反馈

版权申诉

普通网友

粉丝: 4167

基于CNN与Transformer联合网络的红外可见图像压缩融合算法研究

基于CNN和Transformer的网络入侵检测系统python源码+数据集（高分课设）

网络安全课设-基于CNN和Transformer的网络入侵检测系统python源码+数据集+详细注释.zip

基于 CNN-Transformer 的深度学习模型探究.pdf

基于Mamba的医学图像分割技术：融合CNN与Transformer优势，构建高效处理复杂结构与模式的模型架构,基于Mamba模型的医学图像分割：融合CNN与Transformer优势的深度学习解决方

基于Vision Transformer的图像去雾算法研究与实现python源码+使用说明.zip

基于CNN与视觉Transformer融合的图像分类模型

图像去噪-基于Swin-Transformer+UNet实现的图像去噪算法-效果佳-附项目源码-优质项目实战.zip

基于CNN-Transformer融合的频谱感知方法研究

场景融合-基于SwinTransformer实现的端到端红外+可见光增强融合算法-附项目源码-优质项目实战.zip

基于深度学习算法superpoint的可见光与红外图像关键点检测与对齐源码

基于CNN+Transformer的手机惯性信号的步态识别方法项目源码+论文资料（毕业设计完整项目）

基于python实现Transformer+CNN实现的网络入侵检测源码+详细注释+项目说明.zip

基于CNN与Transformer的运动想象脑电分类.zip

目前，基于CNN和Transformer的医学图像分割面临着许多挑战 比如CNN在长距离建模能力上存在不足，而Transformer则受到其二次计算复杂度的制约 相比之下，Mamba的设计允许模型在

基于CNN+Transformer的图像质量评估python源码+项目说明（如清晰度评分）.zip

Python毕业设计-基于CNN卷积神经网络的网络入侵检测python源码+全部数据

本科毕业设计基于Transformer的运动想象脑电信号分类（CNN和Transformer框架）

本科毕业设计-基于Transformer脑电信号分类系统源码（CNN+Transformer框架）

基于Vision Transformer的图像去雾算法研究与实现python源码+项目介绍使用说明.zip

毕业设计-基于Transformer的运动想象脑电信号分类，采用CNN+Transformer框架

基于CNN卷积神经网络的网络入侵检测python源码+详细注释.zip

《基于 Vision Transformer 的图像去雾算法 研究与实现》（毕业设计，源码，教程）简单部署即可运行。功能完善、操作简单，适合毕设或课程设计.zip

"基于Mamba结构与视觉态空间块构建的医学图像分割模型：在挑战中突破限制，优化处理复杂结构和模式的能力",目前，基于CNN和Transformer的医学图像分割面临着许多挑战 比如CNN在长距离建

Mamba架构革新医学图像分割：融合CNN与Transformer优势的解决方案 - VSS模块

本科毕业设计，基于Transformer的运动想象脑电信号分类，采用CNN+Transformer框架，CNN提取局部时间空间特

基于pytorch vision transformer的乳腺癌图像分类 完整代码+数据 可直接运行 毕业设计

DAMSDet: 动态自适应多光谱检测Transformer解决红外可见物体融合与错位问题

DeepSeek从入门到精通-清华大学-202502.pdf

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

ElementUI - 表格组件（el-table）数据发生改变时视图不刷新渲染问题解决方案

rtthread动态创建线程莫名多了好多内存被分配到bss

最新资源

目前，基于CNN和Transformer的医学图像分割面临着许多挑战比如CNN在长距离建模能力上存在不足，而Transformer则受到其二次计算复杂度的制约相比之下，Mamba的设计允许模型在

《基于 Vision Transformer 的图像去雾算法研究与实现》（毕业设计，源码，教程）简单部署即可运行。功能完善、操作简单，适合毕设或课程设计.zip

"基于Mamba结构与视觉态空间块构建的医学图像分割模型：在挑战中突破限制，优化处理复杂结构和模式的能力",目前，基于CNN和Transformer的医学图像分割面临着许多挑战比如CNN在长距离建

基于pytorch vision transformer的乳腺癌图像分类完整代码+数据可直接运行毕业设计