英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
2.3. Generation of Viewed Image Captions From Human Brain Activity
2.3.1. Construction of Image Captioning Model
2.3.2. Conversion of fMRI data into Text Features
2.3.3. Text Feature Transformation with unlabeled images
1. 心得
(1)最近看的几篇怎么写作上这么抽象
2. 论文逐段精读
2.1. Abstract
①They combine more semantic feature to generate caption
2.2. Introduction
①Due to limited data, they applied unsupervised way to capture feature from unlabelled data
2.3. Generation of Viewed Image Captions From Human Brain Activity
①The overview of their method:
2.3.1. Construction of Image Captioning Model
①For image , the image embedding
can be obtained by pretrained DNN(图像到底在主图的哪里表现出来??DNN在哪里?线性层又在哪里?):
②Reducing the dimension of by linear layer and
:
③Image captioning network consists of LSTM units , and words are converted to vector by Word2vec. For
words
, the caption is generated by:
④Loss and optimizer in caption training: CE loss and Adam
2.3.2. Conversion of fMRI data into Text Features
①The previous method used fMRI to convert image features into text, but the authors believe that two-stage conversion is too cumbersome and loses information, so only one-stage conversion is used
②For fMRI data , the text feature is:
③The regression process:
(怎么又是?啥玩意?这俩公式也不一样啊)
④Optimization:
(啊认真的吗??左边那俩东西不是相等吗)
2.3.3. Text Feature Transformation with unlabeled images
①Extract text features from unlabeled images
②fMRI embedding:
③Calculate the Euclidean distance between
and
, and choose the nearest
neighbors to get new feature:
2.4. Experiments
2.4.1. Experimental Settings
①Dataset:
T. Horikawa and Y. Kamitani, "Generic decoding of seen and imagined objects using hierarchical visual features", Nature communications, vol. 8, pp. 15037, 2017.
②Image categories: 150/ new 50 for tr/test
③Image number: 1200/50 for tr/test
④Unlabeled images: 38,532
⑤Image description: MSCOCO
⑥Caption evaluation: cosine similarity by Sent2Vec
2.4.2. Experimental Results
①Example of generated caption:
②Performance comparison:
2.5. Conclusion
~
3. Reference
@INPROCEEDINGS{9191262,
author={Takada, Saya and Togo, Ren and Ogawa, Takahiro and Haseyama, Miki},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={Generation of Viewed Image Captions From Human Brain Activity Via Unsupervised Text Latent Space},
year={2020},
volume={},
number={},
pages={2521-2525},
keywords={Functional magnetic resonance imaging;Semantics;Training;Feature extraction;Brain modeling;Computer architecture;Image captioning;deep neural network (DNN);neuroscience;functional magnetic resonance imaging (fMRI).},
doi={10.1109/ICIP40778.2020.9191262}}