[NeurIPS 2023]Reconstructing the Mind‘s Eye: fMRI-to-Image with Contrastive Learning and Diffusion

论文网址:Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

论文代码:GitHub - MedARC-AI/fMRI-reconstruction-NSD: fMRI-to-image reconstruction on the NSD dataset.

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. MindEye

2.3.1. High-Level (Semantic) Pipeline

2.3.2. Low-Level (Perceptual) Pipeline

2.4. Results

2.4.1. Image/Brain Retrieval

2.4.2. fMRI-to-Image Reconstruction

2.4.3. Ablations

2.5. Related Work

2.6. Conclusion

3. Reference

1. 心得

(1)emm,是概念上比较简单的论文可以速速扫一遍。

2. 论文逐段精读

2.1. Abstract

        ①MindEye maps fMRI signals to CLIP image space

2.2. Introduction

        ①Exampled reconstructed image:

2.3. MindEye

        ①Framework:

2.3.1. High-Level (Semantic) Pipeline

        ①Components of MLP backbone: a linear layer, 4 residual blocks and a linear layer

        ②Loss: MSE and bidirectional CLIP loss

        ③They changed CLIP loss to MixCo:

x_{\min_{i,k_{i}}}=\lambda_{i}\cdot x_{i}+(1-\lambda_{i})\cdot x_{k_{i}},\quad p_{i}^{*}=f(x_{\min_{i,k_{i}}}),\quad p_{i}=f(x_{i}),\quad t_{i}=\mathrm{CLIP}_{\mathrm{Image}}(y_{i})

where \lambda is sampled from Beta distribution with \alpha=\beta=0.15,

\begin{gathered} \mathcal{L}_{\mathrm{BiMxCo}}=-\sum_{i=1}^{N}\left[\lambda_{i}\cdot\log\left(\frac{\exp\left(\frac{p_{i}^{*}\cdot t_{i}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}^{*}\cdot t_{m}}{\tau}\right)}\right)+(1-\lambda_{i})\cdot\log\left(\frac{\exp\left(\frac{p_{i}^{*}\cdot t_{k_{i}}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}^{*}\cdot t_{m}}{\tau}\right)}\right)\right] \\ -\sum_{j=1}^{N}\left[\lambda_{j}\cdot\log\left(\frac{\exp\left(\frac{p_{j}^{*}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{m}^{*}\cdot t_{j}}{\tau}\right)}\right)+\sum_{\{l|k_{l}=j\}}(1-\lambda_{l})\cdot\log\left(\frac{\exp\left(\frac{p_{l}^{*}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{m}^{*}\cdot t_{j}}{\tau}\right)}\right)\right] \end{gathered}

where \tau is temperature hyperparameter and N denotes batch size

        ④They stop using mixup and switch from a hard contrastive loss to a soft contrastive loss one-third of training, which will get better performance (retrieve ↑↑ but reconstruction ↓, so they aim to balance)

        ⑤Soft contrastive loss:

\mathcal{L}_{\mathrm{SoftCLIP}}=-\sum_{i=1}^{N}\sum_{j=1}^{N}\left[\frac{\exp\left(\frac{t_{i}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{t_{i}\cdot t_{m}}{\tau}\right)}\cdot\log\left(\frac{\exp\left(\frac{p_{i}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}\cdot t_{m}}{\tau}\right)}\right)\right]

        ⑥Total loss:

\mathcal{L}=\mathcal{L}_{\text{BiMixCo|SoftCLIP}}+\alpha\cdot\mathcal{L}_{\mathrm{prior}}

where \mathcal{L}_{\mathrm{prior}} is come from DALL-E 2, \alpha =0.3

        ⑦Experiment settings: 240 epoch with 32 batch size

2.3.2. Low-Level (Perceptual) Pipeline

        ①Employ img2img

2.4. Results

        ①Dataset: the Natural Scenes Dataset (NSD)

        ②Subject: 4

        ③Training and testing: 24980/982, where test processes are averaged by 3 times repeating

2.4.1. Image/Brain Retrieval

        ①Image retrieval performance:

(对于少量样本的检索,如第一行,MindEye可以准确定位到被试真正看的图片。同时可以扩展到更大的数据集LAION-5B,在五十亿图像中检索最为相近的图片)

        ②Performance:

2.4.2. fMRI-to-Image Reconstruction

        ①Reconstruction performance:

2.4.3. Ablations

(1)Architectural Improvements

        ①MLP size ablation:

(2)Training Strategies (Losses and Data Augmentations)

        ①Loss ablation:

(3)Reconstruction Strategies

        ①Effects of diffusion prior and MLP projector on reconstruction and retrieval metrics:

2.5. Related Work

        ①我就不转述了,有点太多了

2.6. Conclusion

        ①干嘛写这么多现状分析在结论里,不如甩进附录,把一些实验放在这来。什么信息泄露数据安全的

        ②建立了讨论社区:MedARC

3. Reference

@inproceedings{NEURIPS2023_4ddab70b,
 author = {Scotti, Paul and Banerjee, Atmadeep and Goode, Jimmie and Shabalin, Stepan and Nguyen, Alex and cohen, ethan and Dempster, Aidan and Verlinde, Nathalie and Yundler, Elad and Weisberg, David and Norman, Kenneth and Abraham, Tanishq},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
 pages = {24705--24728},
 publisher = {Curran Associates, Inc.},
 title = {Reconstructing the Mind\textquotesingle s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors},
 url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/4ddab70bf41ffe5d423840644d3357f4-Paper-Conference.pdf},
 volume = {36},
 year = {2023}
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值