本文为RTX 3070显卡运行开源项目FastReID的记录。
针对的代码commit号为:c9bc3ceb2f7a6438b62fb515ea3df6d1e999e95d
部分内容参考自文章行人识别fastreid项目官方数据训练测试,FastReID使用教程、踩坑记录
环境配置
按照官网提供的python 3.7和pytorch 1.6.0的版本会提示不支持该显卡
UserWarning: NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with
the current PyTorch installation. The current PyTorch install supports CUDA
capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the
NVIDIA GeForce RTX 3070 GPU with PyTorch, please check the instructions at
https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, "
".join(arch_list), device_name))
于是按照如下指令创建conda环境
conda create -n FastReID python=3.8
conda activate FastReID
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r docs/requirements.txt
测试
官方提供的测试demo指令为
python demo/visualize_result.py --config-file logs/dukemtmc/mgn_R50-ibn/config.yaml \
--parallel --vis-label --dataset-name DukeMTMC --output logs/mgn_duke_vis \
--opts MODEL.WEIGHTS logs/dukemtmc/mgn_R50-ibn/model_final.pth
其中存在几个问题
- config文件位置不对,config文件实际存储在了configs/中
- 模型位置不对:如果不自己进行训练的话,将模型的位置修改为自己下载的模型位置即可
- 缺少排序设置:该部分内容参考文章行人识别fastreid项目官方数据训练测试
官方代码默认将预测结果按预测结果升序排列,即给定一个查询集之后,默认按照查询精度最低的样本-精度最高的样本进行排列,这样会导致咱们看到的查询结果都是错的。可以通过如下几个参数进行设置
--num-vis N # 对N个查询样本的查询结果进行可视化
--rank-sort descending --label-sort descending # 查询结果按照精度降序排列,而非升序
--max-rank 10 # 每个查询样本 查询多少个结果
在使用这几个参数时,可能会报错
Traceback (most recent call last):
File "/work/2.ChiPeak/3.ReidAbout/1.PesonReid/1.fast_reid/1.fast-reid-init-master/demo/visualize_result.py", line 139, in
args.rank_sort, args.label_sort, args.max_rank)
File "/work/2.ChiPeak/3.ReidAbout/1.PesonReid/1.fast_reid/1.fast-reid-init-master/fastreid/utils/visualizer.py", line 158, in vis_rank_list
query_indices = query_indices[:num_vis]
TypeError: slice indices must be integers or None or have an __index__ method
在visualize_result.py
文件中对num_vis
,max_rank
两个参数都加入int定义即可。
# 两句源码都添加了type=int
parser.add_argument( "--num-vis",type=int,default=100,help="number of query images to be visualized")
parser.add_argument("--max-rank",type=int,default=10,help="maximum number of rank list to be visualized")
- 预训练模型缺少参数
直接运行官方预训练好的模型,会提示参数不对
Some model parameters or buffers are not found in the checkpoint:
heads.bottleneck.0.{running_mean, running_var, bias, weight} heads.weight
[06/30 21:46:18 fastreid.utils.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
pixel_mean pixel_std heads.bnneck.{weight, bias, running_mean, running_var, num_batches_tracked} heads.classifier.weight
是因为官方在后续又对代码进行了修改,而没有提供修改后对应的预训练模型。自己按照官方文件GETTING_STARTED.md
给的训练方式再训练一个使用就行,实测在我的3070显卡上训练只需要1个小时左右
训练报错
训练时可能报错AssertionError: No inf checks were recorded for this optimizer.
,在engine/defaults.py
中,找到build_optimizer
函数,将其返回值由return build_optimizer(cfg, model)
修改为return build_optimizer(cfg, model, contiguous = False)