[ICML 2025]MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding_github mindllm: a subject-agnostic and versatile m-CSDN博客

②Responsive voxel selected will cause different voxel number when brings higher performance. Pooling or sampling them to the same number may cause loss of information

③Their method aims to complete tasks of perception & scene understanding, memory & knowledge retrieval, language & symbolic processing, and complex reasoning

prosthesis n.假体(如假肢、假眼或假牙)

2.3. Related Works

①⭐VQA responds answers which is not relevant to β value

②⭐Cross-subject methods did not deal well with voxel differentiation, flattening or samling may cause spatial/individual information loss:

③Designing different encoder for different person actually limits. And caption annotation only is also a limitation

2.4. Method

2.4.1. Method Overview

①Overall framework of MindLLM:

where LLM is Vicuna-7b（适合开放对话？？长文本理解？？）

②Input brain signal of each subject: $\boldsymbol{v}=[v_{1},v_{2},\cdots,v_{N}]\in\mathbb{R}^{N}$ , $N \in \left [ 12682,17907 \right ]$ denotes voxels

③fMRI encoder $f_\theta$ encodes $\boldsymbol{v}$ to fMRI tokens $X_{v}=[\boldsymbol{x}_{v,1},\boldsymbol{x}_{v,2},\cdots,\boldsymbol{x}_{v,L}]\in\mathbb{R}^{d\times L}$ with $d$ dimension and $L$ tokens

2.4.2. fMRI Encoder

①在注意力里面，V是某个体素激活，K是那个体素的傅里叶坐标和很多个属于不同脑图谱ROI的区域嵌入：

$k_i=k_i^\mathrm{pos}\|k_i^\mathrm{reg,}\mathcal{P}^1\|k_i^\mathrm{reg,}\mathcal{P}^2\|\cdots$

② $z_q\in\mathbb{R}^{N_q}$ is the output of attention layer and then employed a MLP:

$X_{v}=\mathrm{reshape}\left(\mathrm{MLP}(\{\boldsymbol{z}_{q}\})\right)\in\mathbb{R}^{L\times d}$

2.4.3. Brain Instruction Tuning (BIT)

①Tasks of MindLLM:

signifier n.能指(语言符号的形式)

②Multi-run conversation $X_{t}=(X_u^1,X_a^1,\cdots,X_u^T,X_a^T)$ with $T\geq1$ number of runs, $a$ message from the assistant and $u$ message is from the user for each sample $\boldsymbol{v}$