记录一下用BLIP2跑image caption和VQA任务baseline的过程。
GitHub repo: salesforce/LAVIS (as of 2024.12.17)
环境安装
Readme里面的installation已经很久没更新了,按照上面的指示没办法把环境装好,踩了好久的坑。。。
- 创建虚拟环境
conda create -n lavis python=3.10
conda activate lavis
- 开发者模式安装lavis包
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
- open3d报错
【报错】
ERROR: Could not find a version that satisfies the requirement open3d==0.13.0 (from salesforce-lavis) (from versions: 0.16.0, 0.17.0, 0.18.0)
ERROR: No matching distribution found for open3d==0.13.0
【解决】(refer to https://github.com/salesforce/LAVIS/issues/641)
pip install open3d>=0.13.0
- opencv-python和numpy不兼容
【报错】
RuntimeError: module compiled against ABI version 0x1000009 but this version of numpy is 0x2000000
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/LAVIS/lavis/__init__.py", line 15, in <module>
from lavis.datasets.builders import *
File "/home/LAVIS/lavis/datasets/builders/__init__.py", line 8, in <module>
from lavis.datasets.builders.base_dataset_builder import load_dataset_config
File "/home/LAVIS/lavis/datasets/builders/base_dataset_builder.py", line 17, in <module>
from lavis.datasets.data_utils import extract_archive
File "/home/LAVIS/lavis/datasets/data_utils.py", line 14, in <module>
import cv2
File "/home/.conda/envs/lavis/lib/python3.10/site-packages/cv2/__init__.py", line 8, in <module>
from .cv2 import *
ImportError: numpy.core.multiarray failed to import
【解决】重装opencv-python以兼容numpy 2.2.0 (refer to https://github.com/salesforce/LAVIS/issues/762)
pip uninsatll opencv-python-headless
pip cache purge # 清空package缓存
pip install opencv-python # 自动装最新版,可以和numpy 2.2.0兼容
- moviepy报错
【报错】
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/LAVIS/lavis/__init__.py", line 15, in <module>
from lavis.datasets.builders import *
File "/home/LAVIS/lavis/datasets/builders/__init__.py", line 8, in <module>
from lavis.datasets.builders.base_dataset_builder import load_dataset_config
File "/home/LAVIS/lavis/datasets/builders/base_dataset_builder.py", line 18, in <module>
from lavis.processors.base_processor import BaseProcessor
File "/home/LAVIS/lavis/processors/__init__.py", line 29, in <module>
from lavis.processors.audio_processors import BeatsAudioProcessor
File "/home/LAVIS/lavis/processors/audio_processors.py", line 11, in <module>
from moviepy.editor import VideoFileClip
ModuleNotFoundError: No module named 'moviepy.editor'
【解决】requirements.txt中的moviepy没有指定版本,碰到以下报错,发现是moviepy在最新的版本中进行了重大更新(refer to MoviePy documentation — MoviePy documentation),移除了editor。解决方式为在重装低版本的moviepy(也可以对应在requirements.txt中修改)。(refer to https://zulko.github.io/moviepy/)
pip install moviepy==1.0.3 --force-reinstall
- transformers, peft, diffusers, huggingface_hub不兼容
用hugging_face的transformers、diffusers包来调用LLM、VLM甚至是diffusion models来进行推理或者微调时,常用到的库包括transformers、huggingface_hub、diffusers、peft(用来微调),LAVIS在requirements.txt中指定了transformers(4.33.2)和diffusers(<)的版本,但没有指定peft和huggingface_hub的,由于peft和huggingface_hub的快速更新导致装的最新版本和分别指定的transformers和diffusers冲突,所以需要downgrade peft和huggingface_hub的版本,至于应该降到哪个版本,目前采取的方法是根据报错去看源码,找到可以兼容的源码对应的版本。。。还有一个比较蛋疼的点是,downgrade peft的时候,装peft又会自动根据peft的依赖把transformers更新到最新。。。所以 downgrade完peft又要重装指定版本的transformers。
(refere to https://stackoverflow.com/questions/79273647/cannot-import-name-encoderdecodercache-from-transformers, …)
(salesforce你们真的没有人来维护一下这个repo的dependencies吗!!!愤怒!!!💢)
pip install peft==0.10.0 --force-reinstall
pip install transformers==4.33.2 --force-reinstall # check了源码,这个版本的transformers要求huggingface_hub >= 0.15.0就行
pip install huggingface_hub==0.22.0 --force-reinstall
至此,运行下面的代码成功打印model_zoo,这个破环境终于算是装好了。
from lavis.models import model_zoo
print(model_zoo)
# ==================================================
# Architectures Types
# ==================================================
# albef_classification ve
# albef_feature_extractor base
# albef_nlvr nlvr
# albef_pretrain base
# albef_retrieval coco, flickr
# albef_vqa vqav2
# alpro_qa msrvtt, msvd
# alpro_retrieval msrvtt, didemo
# blip_caption base_coco, large_coco
# blip_classification base
# blip_feature_extractor base
# blip_nlvr nlvr
# blip_pretrain base
# blip_retrieval coco, flickr
# blip_vqa vqav2, okvqa, aokvqa
# clip_feature_extractor ViT-B-32, ViT-B-16, ViT-L-14, ViT-L-14-336, RN50
# clip ViT-B-32, ViT-B-16, ViT-L-14, ViT-L-14-336, RN50
# gpt_dialogue base
数据准备
to fill
模型训练及推理
to fill
模型评估
to fill