[NeurIPS 2023]Reconstructing the Mind‘s Eye: fMRI-to-Image with Contrastive Learning and Diffusion

本文链接：https://blog.csdn.net/Sherlily/article/details/147190953

论文网址：Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

论文代码：GitHub - MedARC-AI/fMRI-reconstruction-NSD: fMRI-to-image reconstruction on the NSD dataset.

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.3.1. High-Level (Semantic) Pipeline

2.3.2. Low-Level (Perceptual) Pipeline

2.4. Results

2.4.1. Image/Brain Retrieval

2.4.2. fMRI-to-Image Reconstruction

1. 心得

（1）emm，是概念上比较简单的论文可以速速扫一遍。

2. 论文逐段精读

2.1. Abstract

①MindEye maps fMRI signals to CLIP image space

2.2. Introduction

①Exampled reconstructed image:

2.3. MindEye

①Framework:

2.3.1. High-Level (Semantic) Pipeline

①Components of MLP backbone: a linear layer, 4 residual blocks and a linear layer

②Loss: MSE and bidirectional CLIP loss

③They changed CLIP loss to MixCo:

$x_{\min_{i,k_{i}}}=\lambda_{i}\cdot x_{i}+(1-\lambda_{i})\cdot x_{k_{i}},\quad p_{i}^{*}=f(x_{\min_{i,k_{i}}}),\quad p_{i}=f(x_{i}),\quad t_{i}=\mathrm{CLIP}_{\mathrm{Image}}(y_{i})$

where $\lambda$ is sampled from Beta distribution with $\alpha=\beta=0.15$ ,

$\begin{gathered} \mathcal{L}_{\mathrm{BiMxCo}}=-\sum_{i=1}^{N}\left[\lambda_{i}\cdot\log\left(\frac{\exp\left(\frac{p_{i}^{*}\cdot t_{i}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}^{*}\cdot t_{m}}{\tau}\right)}\right)+(1-\lambda_{i})\cdot\log\left(\frac{\exp\left(\frac{p_{i}^{*}\cdot t_{k_{i}}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}^{*}\cdot t_{m}}{\tau}\right)}\right)\right] \\ -\sum_{j=1}^{N}\left[\lambda_{j}\cdot\log\left(\frac{\exp\left(\frac{p_{j}^{*}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{m}^{*}\cdot t_{j}}{\tau}\right)}\right)+\sum_{\{l|k_{l}=j\}}(1-\lambda_{l})\cdot\log\left(\frac{\exp\left(\frac{p_{l}^{*}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{m}^{*}\cdot t_{j}}{\tau}\right)}\right)\right] \end{gathered}$

where $\tau$ is temperature hyperparameter and $N$ denotes batch size

④They stop using mixup and switch from a hard contrastive loss to a soft contrastive loss one-third of training, which will get better performance (retrieve ↑↑ but reconstruction ↓, so they aim to balance)

⑤Soft contrastive loss:

$\mathcal{L}_{\mathrm{SoftCLIP}}=-\sum_{i=1}^{N}\sum_{j=1}^{N}\left[\frac{\exp\left(\frac{t_{i}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{t_{i}\cdot t_{m}}{\tau}\right)}\cdot\log\left(\frac{\exp\left(\frac{p_{i}\cdot t_{j}}{\tau}\right)}{\sum_{m=1}^{N}\exp\left(\frac{p_{i}\cdot t_{m}}{\tau}\right)}\right)\right]$

⑥Total loss:

$\mathcal{L}=\mathcal{L}_{\text{BiMixCo|SoftCLIP}}+\alpha\cdot\mathcal{L}_{\mathrm{prior}}$

where $\mathcal{L}_{\mathrm{prior}}$ is come from DALL-E 2, $\alpha =0.3$

⑦Experiment settings: 240 epoch with 32 batch size

2.3.2. Low-Level (Perceptual) Pipeline

①Employ img2img

2.4. Results

①Dataset: the Natural Scenes Dataset (NSD)

②Subject: 4

③Training and testing: 24980/982, where test processes are averaged by 3 times repeating

2.4.1. Image/Brain Retrieval

①Image retrieval performance:

（对于少量样本的检索，如第一行，MindEye可以准确定位到被试真正看的图片。同时可以扩展到更大的数据集LAION-5B，在五十亿图像中检索最为相近的图片）

②Performance:

2.4.2. fMRI-to-Image Reconstruction

①Reconstruction performance:

2.4.3. Ablations

（1）Architectural Improvements

①MLP size ablation:

（2）Training Strategies (Losses and Data Augmentations)

①Loss ablation:

（3）Reconstruction Strategies

①Effects of diffusion prior and MLP projector on reconstruction and retrieval metrics:

2.5. Related Work

①我就不转述了，有点太多了

2.6. Conclusion

①干嘛写这么多现状分析在结论里，不如甩进附录，把一些实验放在这来。什么信息泄露数据安全的

②建立了讨论社区：MedARC

3. Reference

@inproceedings{NEURIPS2023_4ddab70b,
author = {Scotti, Paul and Banerjee, Atmadeep and Goode, Jimmie and Shabalin, Stepan and Nguyen, Alex and cohen, ethan and Dempster, Aidan and Verlinde, Nathalie and Yundler, Elad and Weisberg, David and Norman, Kenneth and Abraham, Tanishq},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {24705--24728},
publisher = {Curran Associates, Inc.},
title = {Reconstructing the Mind\textquotesingle s Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/4ddab70bf41ffe5d423840644d3357f4-Paper-Conference.pdf},
volume = {36},
year = {2023}
}