13、Deconstructing Denoising Diffusion Models for Self-Supervised Learning

研究发现,去噪扩散模型(DDM)的表示学习主要依赖于去噪过程而非扩散过程,简化版的l-DAE在自监督下表现出色。关键在于低维潜在空间的噪声添加,与经典DAE类似但无需复杂组件如分类条件和高级tokenizer。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

简介

研究了最初用于图像生成的去噪扩散模型(DDM)的表示学习能力
解构DDM,逐步将其转变为经典的去噪自动编码器(DAE)
探索现代ddm的各个组成部分如何影响自监督表征学习

结论:
只有很少的现代组件对于学习良好的表示是至关重要的,而其他许多组件则是不必要的
DDM的表示能力主要是由去噪驱动的过程获得的,而不是扩散驱动的过程
成果:一种高度简化的方法,并且在很大程度上类似于经典的DAE——l-DAE

在这里插入图片描述
通过主成分基(即V)将输入图像投影到隐空间中,在隐空间中加入噪声,并通过逆主成分基将带噪声的隐投影回图像空间

上图(中间,底部)显示了在潜在空间中添加噪声的示例图像

将这个有噪声的图像作为网络的输入,可以应用一个标准的ViT网络,它直接对图像进行操作,就好像没有tokenizer一样

背景

Denoising Diffusion Models (DDM) 实现了令人印象深刻的图像生成质量,特别是对于高分辨率、逼真的图像,对于理解视觉内容似乎具有很强的识别表征

t时间步的噪声图像为
在这里插入图片描述
ϵ∼N(0,I)\epsilon \sim N(0,I)ϵN(0,I),γt2+σt2=1\gamma^2_t + \sigma^2_t=1γt2+σt2=1

网络预测噪声拟合增加的噪声
在这里插入图片描述

实验过程

noise

去掉DDM的分类条件

假设直接对模型进行类标签的调节可以减少模型对与类标签相关的信息编码的需求。移除类条件可以迫使模型学习更多的语义

采用线性衰减噪声噪声

实验结果
在这里插入图片描述

在这里插入图片描述

结论:自监督学习绩效与生成质量无关

Tokenizer

更换Tokenizer,分别为Convolutional VAE、Patch-wise VAE、Patch-wise AE和Patch-wise PCA
在这里插入图片描述
在这里插入图片描述

结论:标记器的潜在维数是DDM在自监督学习中发挥作用的关键

卷积VAE标记器既不是必要的,也不是有利的;相反,所有基于补丁的标记器,其中每个补丁都是独立编码的,彼此之间的表现相似,并且始终优于Conv VAE变体。此外,KL正则化项是不必要的,因为AE和PCA变体都能很好地工作

结论:高分辨率、基于像素的ddm不如自监督学习。

Autoencoders

不同与DDM预测噪声,经典DAE直接预测清晰图像
在这里插入图片描述
λt=γt2/σt2\lambda_t =\gamma^2_t / \sigma^2_tλt=γt2/σt2,实验设置 λt=γt2\lambda_t=\gamma^2_tλt=γt2效果更好

去掉输入缩放
设置 γt=1\gamma_t=1γt=1,σt\sigma_tσt是0到2\sqrt{2}2的线性变化,λt=1/(1+σt2)\lambda_t=1/(1+\sigma^2_t)λt=1/(1+σt2)
在这里插入图片描述
结论:不需要按γt\gamma_tγt缩放数据3

用逆PCA对图像空间进行操作

通过主成分基(即V)将输入图像投影到隐空间中,在隐空间中加入噪声,并通过逆主成分基将带噪声的隐投影回图像空间,将这个有噪声的图像作为网络的输入,可以应用一个标准的ViT网络,它直接对图像进行操作,就好像没有tokenizer一样
在这里插入图片描述

结论:用逆主成分分析对图像空间进行处理可以获得与对潜在空间进行处理相似的结果

预测原始图像(l-DAE

PCA对于任何降维d都是有损编码器

当让网络预测原始图像时,引入的“噪声”包括两部分:(i)加性高斯噪声,其固有维数为d; (ii) PCA重构误差,其固有维数为D - d (d为768)

使用干净的原始图像x0x_0x0和网络预测网(xtx_txt),可以计算投影到完整PCA空间上的残差r≜V(x0−net(xt))r \triangleq V(x_0-net(x_t))rV(x0net(xt)),V是表示完整PCA基的D乘D矩阵

损失函数为:
在这里插入图片描述
i 表示向量 r 的第 i 维,当 i≤d 时,每维权重 wiw_iwi 为1,当d < i≤d时,每维权重 wiw_iwi 为0.1,wiw_iwi 降低了PCA重构误差损失的权重

在这里插入图片描述

单层噪声
设置σ=1/3\sigma=\sqrt{1/3}σ=1/3,指标下降为61.5%下降了三个点

结论:
使用多级噪声类似于DAE中的一种数据增强形式:它是有益的,但不是促成因素
DDM的表示能力主要是通过去噪驱动过程获得的,而不是扩散驱动过程

总结

在这里插入图片描述
在这里插入图片描述
可视化可以帮助更好地理解l-DAE如何学习良好的表示

l-DAE,它在很大程度上类似于经典DAE,可以在自监督学习中表现得很有竞争力。关键分量是加有噪声的低维潜在空间。

### Self-Ensemble Concept In the realm of machine learning, self-ensemble refers to a technique where multiple models are created from variations or augmentations of training data points. These models collectively contribute towards making predictions that can be more robust than those made by any single model alone[^1]. The ensemble is built using different snapshots of the same neural network at various stages during its training process. The core idea behind this approach lies in leveraging diverse perspectives provided by these varied instances of the model trained on slightly altered datasets derived through transformations like noise addition or dropout regularization techniques applied over original inputs. This diversity helps improve generalization capabilities while reducing variance across predictions. #### Applications in Machine Learning One prominent application area for self-ensembles involves semi-supervised learning scenarios wherein only limited labeled examples exist alongside abundant unlabeled ones available for use during training phases. By applying consistency regularization methods such as Mean Teacher (MT), Temporal Ensembling (TE), Virtual Adversarial Training (VAT), etc., one ensures stable performance even when dealing with scarce supervision signals. Another significant utilization pertains to unsupervised domain adaptation tasks aiming to transfer knowledge acquired within source domains characterized by ample annotated samples into target environments lacking sufficient labeling but sharing similar characteristics otherwise unobserved directly due to distributional shifts between them both spatially and temporally speaking. Additionally, self-ensemble has been successfully employed in improving adversarial robustness against carefully crafted attacks designed specifically targeting deep networks' vulnerabilities exposed under certain conditions leading potentially catastrophic failures unless properly mitigated beforehand via defensive mechanisms embedded throughout architecture design choices including preprocessing steps taken prior feeding raw input features into subsequent layers responsible ultimately producing final outputs after passing several intermediate computations along pathways connecting neurons together forming complex webs capable performing intricate pattern recognition feats beyond human comprehension levels achievable today thanks largely advances brought forth recent years particularly around computational power availability coupled efficient algorithms development enabling faster experimentation cycles yielding better results overall time frame considered historically relevant benchmarks established previously before current era commenced officially ushering new age artificial intelligence research endeavors worldwide spanning numerous disciplines ranging natural sciences social studies humanities arts culture technology engineering mathematics statistics physics chemistry biology medicine health care environmental sustainability energy resources management policy governance ethics law regulation compliance security privacy protection safety assurance quality control standards setting benchmark creation measurement evaluation assessment feedback improvement innovation disruption transformation evolution revolution renaissance enlightenment awakening consciousness expansion awareness elevation transcendence ascension liberation freedom empowerment autonomy sovereignty independence interdependence cooperation collaboration coordination synchronization harmonization integration synthesis analysis decomposition reconstruction deconstruction construction building designing creating imagining envisioning conceptualizing theorizing hypothesizing experimenting validating verifying falsifying refuting rebutting arguing debating discussing communicating collaborating cooperating coordinating synchronizing harmonizing integrating synthesizing analyzing decomposing reconstructing deconstructing constructing building designing creating imagining envisioning conceptualizing theorizing hypothesizing experimenting validating verifying falsifying refuting rebutting arguing debating discussing communicating. ```python import numpy as np def create_self_ensemble(model, X_train, y_train=None): ensembles = [] # Create multiple versions of the dataset with slight modifications. for i in range(5): modified_X = apply_transformation(X_train.copy()) if y_train is not None: ensemble_model = train_model(model, modified_X, y_train) else: ensemble_model = train_unsupervised_model(model, modified_X) ensembles.append(ensemble_model) return ensembles def predict_with_self_ensemble(ensembles, X_test): all_predictions = [] for model in ensembles: prediction = model.predict(X_test) all_predictions.append(prediction) averaged_prediction = np.mean(all_predictions, axis=0) return averaged_prediction ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值