使用vllm部署qwen2-vl
时间: 2025-03-26 19:47:52 浏览: 67
### 如何使用vLLM部署Qwen2-VL模型的开发者而言,可以参考一系列详尽的学习资源和教程[^1]。这些资料不仅涵盖了理论知识还提供了实际操作指南。
#### 安装依赖库
首先需安装必要的Python包以支持后续工作:
```bash
pip install torch transformers vllm
```
#### 加载预训练模型
通过Hugging Face Transformers加载指定版本的Qwen2-VL模型:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "Qwen/Qwen-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
```
#### 初始化vLLM引擎并启动服务端口
创建基于所选配置文件初始化vLLM实例,并设置HTTP服务器监听特定地址与端口号以便接收请求:
```python
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
app = FastAPI()
engine_args = AsyncEngineArgs(model=model_name_or_path)
engine = await AsyncLLMEngine.from_engine_args(engine_args)
class Request(BaseModel):
prompt: str
max_tokens: int
@app.post("/generate")
async def generate(request: Request):
output = await engine.generate(prompt=request.prompt, max_tokens=request.max_tokens)
return {"result": output}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
```
此段代码展示了如何构建一个简单的Web API接口用于远程调用生成功能[^2]。
阅读全文
相关推荐

















