Visual Question Answering in Tensorflow实战

最新推荐文章于 2024-07-17 00:30:02 发布

置顶法相

最新推荐文章于 2024-07-17 00:30:02 发布

阅读量594

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.csdn.net/weixin_38569817/article/details/82659122

深度学习专栏收录该内容

108 篇文章

订阅专栏

本文记录了在Windows环境下搭建基于TensorFlow的视觉问答(VQA)项目的全过程，包括解决下载数据脚本缺失、调整Python版本兼容性和修改源代码等问题，为读者提供了一套完整的解决方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

主要参考：https://github.com/paarthneekhara/neural-vqa-tensorflow
论文地址：https://arxiv.org/abs/1505.02074

项目百度网盘地址：https://pan.baidu.com/s/1d47Hxu5Xl71UYniKWPDfpQ
开始爬坑：
首先服务器是linux，但是不能连接网络。因此只能在windows下进行。
按照步骤，首先就是获取数据。

坑1：Download the MSCOCO train+val images and VQA data using Data/download_data.sh. Extract all the downloaded zip files inside the Data folder.
可以看到，download_data.sh为空。解决办法：另一个Torch implementation of neural-VQA版本中有download_data.sh。打开复制链接，进行下载。

坑2：
Extract the fc-7 image features using:
python extract_fc7.py –split=train
python extract_fc7.py –split=val
因为我在windows下的python版本是3。需要修改源代码。

坑3：
data_loader.py：
if name == “main“:
prepare_training_data()
增加上面代码，执行数据预处理程序。
extract_fc7.py：
all_data = data_loader.load_questions_answers(args)
->all_data = data_loader.load_questions_answers()
predict.py：
vocab_data = data_loader.get_question_answer_vocab(args.data_dir)
->vocab_data = data_loader.get_question_answer_vocab()
predict.py：parser.add_argument(‘–model_path’, type=str, default = ‘Data/Models/model133.ckpt’,目录要写死，不然读取模型文件失败。
evaluate.py：parser.add_argument(‘–model_path’, type=str, default = ‘Data/Models/model133.ckpt’目录要写死，不然读取模型文件失败。

python predict.py –image_path=”Data/test/8.jpg” –question=”What are they doing?”