[ICML 2025]MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

论文网址:[2502.15786] MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Works

2.4. Method

2.4.1. Method Overview

2.4.2. fMRI Encoder

2.4.3. Brain Instruction Tuning (BIT)

2.5. Experiments

2.5.1. Settings

2.5.2. Brain Captioning

2.5.3. Versatile Decoding

2.5.4. Unseen Subject Generalization

2.5.5. Adapting to New Tasks

2.5.6. Ablation Study

2.5.7. Visualizations and Interpretations

2.6. Conclusion

1. 心得

(1)做了很多工作

2. 论文逐段精读

2.1. Abstract

        ①Challenges: suboptimal performance, limited task variety, and poor generalization across subjects

2.2. Introduction

        ①Design and implement of MindLLM:

        ②Responsive voxel selected will cause different voxel number when brings higher performance. Pooling or sampling them to the same number may cause loss of information

        ③Their method aims to complete tasks of perception & scene understanding, memory & knowledge retrieval, language & symbolic processing, and complex reasoning

prosthesis  n.假体(如假肢、假眼或假牙)

2.3. Related Works

        ①⭐VQA responds answers which is not relevant to β value

        ②⭐Cross-subject methods did not deal well with voxel differentiation, flattening or samling may cause spatial/individual information loss:

        ③Designing different encoder for different person actually limits. And caption annotation only is also a limitation

2.4. Method

2.4.1. Method Overview

        ①Overall framework of MindLLM:

where LLM is Vicuna-7b(适合开放对话??长文本理解??)

        ②Input brain signal of each subject: \boldsymbol{v}=[v_{1},v_{2},\cdots,v_{N}]\in\mathbb{R}^{N}N \in \left [ 12682,17907 \right ] denotes voxels

        ③fMRI encoder f_\theta encodes \boldsymbol{v} to fMRI tokens X_{v}=[\boldsymbol{x}_{v,1},\boldsymbol{x}_{v,2},\cdots,\boldsymbol{x}_{v,L}]\in\mathbb{R}^{d\times L} with d dimension and L tokens

2.4.2. fMRI Encoder

        ①在注意力里面,V是某个体素激活,K是那个体素的傅里叶坐标和很多个属于不同脑图谱ROI的区域嵌入:

k_i=k_i^\mathrm{pos}\|k_i^\mathrm{reg,}\mathcal{P}^1\|k_i^\mathrm{reg,}\mathcal{P}^2\|\cdots

        ②z_q\in\mathbb{R}^{N_q} is the output of attention layer and then employed a MLP:

X_{v}=\mathrm{reshape}\left(\mathrm{MLP}(\{\boldsymbol{z}_{q}\})\right)\in\mathbb{R}^{L\times d}

2.4.3. Brain Instruction Tuning (BIT)

        ①Tasks of MindLLM:

signifier  n.能指(语言符号的形式)

        ②Multi-run conversation X_{t}=(X_u^1,X_a^1,\cdots,X_u^T,X_a^T) with T\geq1 number of runs, a message from the assistant and u message is from the user for each sample \boldsymbol{v}

        ③Training object:

\arg\max_\theta p(X_a|X_v,X_{\mathrm{inst}})=\prod_{t=1}^Tp(X_a^t|X_u^{\leq t},X_a^{\leq t},X_{\mathrm{inst}},X_v)

        ④Examples of Q&A:

2.5. Experiments

2.5.1. Settings

        ①Datasets: NSD and other downstream datasets

2.5.2. Brain Captioning

        ①Captioning performance:

where CIDEr is scaled by a factor of 100

2.5.3. Versatile Decoding

        ①Performance of versatile decoding:

2.5.4. Unseen Subject Generalization

        ①Train on 1~7 subjects but evaluate on the 8:

2.5.5. Adapting to New Tasks

        ①Performance on sentiment understanding and utility/affordance tasks:

2.5.6. Ablation Study

        ①Ablation of position encoding:
 

2.5.7. Visualizations and Interpretations

        ①Attention of brain voxels:
 

2.6. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值