[IEEE ICIP 2020]Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised Text

论文网址:Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised Text Latent Space | IEEE Conference Publication | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Generation of Viewed Image Captions From Human Brain Activity

2.3.1. Construction of Image Captioning Model

2.3.2. Conversion of fMRI data into Text Features

2.3.3. Text Feature Transformation with unlabeled images

2.4. Experiments

2.4.1. Experimental Settings

2.4.2. Experimental Results

2.5. Conclusion

3. Reference

1. 心得

(1)最近看的几篇怎么写作上这么抽象

2. 论文逐段精读

2.1. Abstract

        ①They combine more semantic feature to generate caption

2.2. Introduction

        ①Due to limited data, they applied unsupervised way to capture feature from unlabelled data

2.3. Generation of Viewed Image Captions From Human Brain Activity

        ①The overview of their method:

2.3.1. Construction of Image Captioning Model

        ①For image I^{i}(i=1,2,...,N_{c}), the image embedding v^{i}\in\mathbb{R}^{D_{v}} can be obtained by pretrained DNN(图像到底在主图的哪里表现出来??DNN在哪里?线性层又在哪里?):

v^i=\mathrm{DNN}(I^i)

        ②Reducing the dimension of v^i by linear layer and D_{v}^{\prime}<D_{v}:

v^{\prime i}=W_\mathrm{linear}v^i

        ③Image captioning network consists of LSTM units l^{j}(\cdot)(j=0,1,...,N_{l}), and words are converted to vector by Word2vec. For n words \begin{array} {ccc}S_n^i & (n & = \end{array}0,1,...,N_{s}^{i}), the caption is generated by:

t^i=l^0(v^{\prime i},\mathrm{word}2\mathrm{vec}(S_0))

        ④Loss and optimizer in caption training: CE loss and Adam

2.3.2. Conversion of fMRI data into Text Features

       ①The previous method used fMRI to convert image features into text, but the authors believe that two-stage conversion is too cumbersome and loses information, so only one-stage conversion is used

        ②For fMRI data x^{l}\in\mathbb{R}^{D_{f}}(l=1,2,...,N_{f}), the text feature is:

t^l=l^0(v^{\prime l},S_0)

        ③The regression process:

t^l=W^\top x^l+b

(怎么又是t^l?啥玩意?这俩公式也不一样啊)

        ④Optimization:

\min_{\boldsymbol{W},\boldsymbol{b}}\sum_{l=1}^{N_f}||\boldsymbol{t}^l-(\boldsymbol{W}^\top\boldsymbol{x}^l+\boldsymbol{b})||_2^2+\alpha||\boldsymbol{W}||_2^2

(啊认真的吗??左边那俩东西不是相等吗)

2.3.3. Text Feature Transformation with unlabeled images

        ①Extract text features \tilde{t}^{m}\in{\mathbb{R}}^{D_{t}} from unlabeled images \tilde{I}^{m}(m=1,2,...,N_{a})

        ②fMRI embedding:

z=W^\top x_{\mathrm{test}}+b

        ③Calculate the Euclidean distance d^m between \tilde{t}^{m} and z, and choose the nearest k neighbors to get new feature:

y=\beta z+\frac{(1-\beta)}{k}\sum_{m=1}^k\tilde{t}^m

2.4. Experiments

2.4.1. Experimental Settings

        ①Dataset:

T. Horikawa and Y. Kamitani, "Generic decoding of seen and imagined objects using hierarchical visual features", Nature communications, vol. 8, pp. 15037, 2017.

        ②Image categories: 150/ new 50 for tr/test

        ③Image number: 1200/50 for tr/test

        ④Unlabeled images: 38,532

        ⑤Image description: MSCOCO

        ⑥Caption evaluation: cosine similarity by Sent2Vec

2.4.2. Experimental Results

        ①Example of generated caption:

        ②Performance comparison:

2.5. Conclusion

        ~

3. Reference

@INPROCEEDINGS{9191262,
  author={Takada, Saya and Togo, Ren and Ogawa, Takahiro and Haseyama, Miki},
  booktitle={2020 IEEE International Conference on Image Processing (ICIP)}, 
  title={Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised Text Latent Space}, 
  year={2020},
  volume={},
  number={},
  pages={2521-2525},
  keywords={Functional magnetic resonance imaging;Semantics;Training;Feature extraction;Brain modeling;Computer architecture;Image captioning;deep neural network (DNN);neuroscience;functional magnetic resonance imaging (fMRI).},
  doi={10.1109/ICIP40778.2020.9191262}}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值