Vision Mamba项目链接:
1.1 服务器
AutoDL算力云 | 弹性、好用、省钱。租GPU就上AutoDL
1.2 克隆项目到服务器
打开服务器终端,切换目录
cd autodl-tmp/
git命令:
git clone https://github.com/hustvl/Vim.git
1.3 配置环境
(1)创建虚拟环境
conda create -n vim python=3.10.13
conda init bash && source /root/.bashrc #刷新环境变量
conda activate vim
(2)安装torch-cuda 11.8
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 -i https://mirrors.aliyun.com/pypi/simple/
(3) 安装其他依赖
pip install -r vim/vim_requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
(4)安装causal_conv1d 和 mamba,在终端启用加速:source /etc/network_turbo
wget https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.1.3.post1/causal_conv1d-1.1.3.post1+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
wget https://github.com/state-spaces/mamba/releases/download/v1.1.1/mamba_ssm-1.1.1+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install causal_conv1d-1.1.3.post1+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install mamba_ssm-1.1.1+cu118torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
(5)用官方的mamba_ssm 替换虚拟环境 vim 中对应的包(注意你的虚拟环境是否是vim)
cp -rf mamba-1p1p1/mamba_ssm /root/miniconda3/envs/vim/lib/python3.10/site-packages
1.4 测试能否运行
找到Vim/vim/datasets.py,在CIFAR100数据集后面加一个download=True(自动下载)
CUDA_VISIBLE_DEVICES=0 torchrun --master_port=6666 --nproc_per_node=1 main.py \
--model vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --batch-size 2 \
--drop-path 0.05 --weight-decay 0.05 --lr 1e-3 --num_workers 1 \
--data-set CIFAR \
--data-path /media/amax/c08a625b-023d-436f-b33e-9652dc1bc7c0/DATA/liyuehang/Vim-main/vim/cifar-100-python \
--output_dir ./output/vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 \
--no_amp
如果有数据集,--data-path后写本地数据集路径
CUDA_VISIBLE_DEVICES=0 torchrun --master_port=6688 --nproc_per_node=1 main.py --model vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --batch-size 2 --drop-path 0.05 --weight-decay 0.05 --lr 1e-3 --num_workers 1 --data-path /root/autodl-tmp/Vim/cifar-100-python --output_dir ./output/vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2 --no_amp
数据集上传,用FileZilla
如果已经手动下载了数据集,可以直接将下载的文件上传到指定路径,并解压。
# 使用 wget 下载数据集
wget https://mirrors.tuna.tsinghua.edu.cn/cifar-100-python.tar.gz -P /path/to/save/dataset
# 假设已经下载了 CIFAR-100 数据集 tar.gz 文件
tar -zxvf /path/to/downloaded/cifar-100-python.tar.gz -C /path/to/dataset/folder
验证目录结构: 解压后,检查 /root/autodl-tmp/Vim/cifar-100-python/
目录,确认其中包含以下内容:
/root/autodl-tmp/Vim/cifar-100-python/
├── train
├── test
└── meta
train
目录应该包含 100 个子目录,每个子目录代表一个类别,并且每个子目录中包含相应的图像文件。
如果 train
目录中只有二进制文件(如 train
, test
),而没有按照类别分组的文件夹,你可以使用以下 Python 脚本将数据从二进制格式转换为标准的 ImageFolder
目录结构。这个脚本会为每个类别创建一个子文件夹并将对应的图像保存到这些子文件夹中。
import os
import pickle
from PIL import Image
from tqdm import tqdm
# CIFAR-100 数据集路径
data_path = '/root/autodl-tmp/Vim/cifar-100-python'
# 加载 CIFAR-100 训练集数据
with open(os.path.join(data_path, 'train'), 'rb') as f:
data_dict = pickle.load(f, encoding='latin1')
# 获取图像数据和标签
images = data_dict['data']
labels = data_dict['fine_labels']
# 创建类别文件夹
output_dir = os.path.join(data_path, 'train_images')
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# 获取类别名称
with open(os.path.join(data_path, 'meta'), 'rb') as f:
meta_dict = pickle.load(f, encoding='latin1')
class_names = meta_dict['fine_label_names']
# 创建每个类别的子目录
for class_name in class_names:
class_dir = os.path.join(output_dir, class_name)
if not os.path.exists(class_dir):
os.makedirs(class_dir)
# 保存图像文件
for i in tqdm(range(len(images))):
image = images[i]
label = labels[i]
# 将图像从一维数组转换为二维数组(32x32x3)
image = image.reshape(3, 32, 32).transpose(1, 2, 0)
# 将图像保存到相应类别的文件夹
image_path = os.path.join(output_dir, class_names[label], f'{i}.png')
Image.fromarray(image).save(image_path)
运行上面的脚本后,检查输出目录 (train_images
) 是否包含以下结构:
/root/autodl-tmp/Vim/cifar-100-python/train_images/
├── apple
├── orange
├── pear
├── ...
2.1 VsCode调试
在 .vscode/launch.json
中添加调试配置 打断点debug
{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debug: torchrun with Custom Args",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/main.py",
"module": "torch.distributed.run",
"env": {
"CUDA_VISIBLE_DEVICES": "0",
"RANK": "0",
"WORLD_SIZE": "1",
"LOCAL_RANK": "0"
},
"args": [
"--nproc_per_node", "1",
"--master_port", "6688",
"--model", "vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2",
"--batch-size", "2",
"--drop-path", "0.05",
"--weight-decay", "0.05",
"--lr", "1e-3",
"--num_workers", "1",
"--data-path", "/root/autodl-tmp/Vim/cifar-100-python",
"--output_dir", "./output/vim_small_patch16_224_bimambav2_final_pool_mean_abs_pos_embed_with_midclstok_div2",
"--no_amp"
],
"console": "integratedTerminal",
"subProcess": true
}
]
}
AttributeError: 'Namespace' object has no attribute 'gpu'
错误表明,在代码执行时,args
对象缺少 gpu
属性。不知道为什么debug会出错,要加一默认值
def init_distributed_mode(args):
if not hasattr(args, 'gpu'):
args.gpu = 0 # 设置默认值
if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
观察输入输出shape
if __name__ == "__main__":
import torch
# 确保CUDA可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 定义输入张量的尺寸
batch_size, seq_length, d_model = 2, 197, 384
# 创建输入张量,并将其发送到CUDA设备
x = torch.randn(batch_size, seq_length, d_model, device=device)
# 实例化 Mamba 模型,确保将其放置在CUDA设备上
# 注意:这里的参数需要根据实际情况进行调整
mamba_model = Mamba(
d_model=d_model,
d_state=16,
d_conv=4,
expand=2,
dt_rank="auto",
dt_min=0.001,
dt_max=0.1,
dt_init="random",
dt_scale=1.0,
dt_init_floor=1e-4,
conv_bias=True,
bias=False,
use_fast_path=True,
layer_idx=None,
device=device,
dtype=torch.float32, # 确保数据类型与设备兼容
bimamba_type="v2",
if_devide_out=False,
init_layer_scale=None,
).to(device)
# 运行前向传播
output = mamba_model(x)
# 打印输出张量的大小
print(output.shape)