目录
报错NameError: name 'torch' is not defined
步骤6:安装vllm,目标是部署成接口模式,方便集成到项目中
步骤1:安装环境
pip install modelscope
步骤2:下载模型
modelscope download --model deepseek-ai/deepseek-coder-6.7b-instruct --local_dir /root/autodl-tmp/deepseek-ai/deepseek-coder-6.7b-instruct
步骤3:安装依赖
pip install transformers peft diffusers
步骤4:运行模型
#代码文件start-deepseek-coder-6.7b.py
import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages=[
{ 'role': '用户', 'content': "用python写个快速排序的算法."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
python start-deepseek-coder-6.7b.py
报错NameError: name 'torch' is not defined
引入下即可
import torch
步骤5:运行结果
步骤6:安装vllm,目标是部署成接口模式,方便集成到项目中
pip install vllm
步骤7:vllm方式启动deepseek模型
nohup python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 2701 --model /root/autodl-tmp/deepseek-ai/deepseek-coder-6.7b-instruct --trust-remote-code --gpu-memory-utilization 0.8 --max_model_len=10000 --api-key 123456 --served-model-name 6.7Bcoder > vllm.log &
步骤8: 测试接口
方法 1:使用 curl
测试基础连通性
curl -X GET "http://localhost:2701/v1/models" \ -H "Authorization: Bearer 123456"
预期响应:
{ "object": "list", "data": [{"id": "6.7Bcoder", "object": "model"}] }
方法 2:测试文本补全(Completion)
curl -X POST "http://localhost:2701/v1/completions" \ -H "Authorization: Bearer 123456" \ -H "Content-Type: application/json" \ -d '{ "model": "6.7Bcoder", "prompt": "def fibonacci(n):", "max_tokens": 100, "temperature": 0.2 }'
方法 3:测试聊天接口(Chat)
curl -X POST "http://localhost:2701/v1/chat/completions" \ -H "Authorization: Bearer 123456" \ -H "Content-Type: application/json" \ -d '{ "model": "6.7Bcoder", "messages": [ {"role": "user", "content": "用 Python 实现快速排序"} ], "temperature": 0.7 }'
方法 4:使用 Python 脚本测试
import openai client = openai.OpenAI( base_url="http://localhost:2701/v1", api_key="123456" ) response = client.chat.completions.create( model="6.7Bcoder", messages=[{"role": "user", "content": "解释量子纠缠"}] ) print(response.choices[0].message.content)
步骤9:映射到业务服务器
映射端口:
ssh -CNg -L 2701:127.0.0.1:2701 root@connect.westb.seetacloud.com -p 23799
密码:123456
测试映射端口效果:
curl -X GET "http://localhost:2701/v1/models" \
-H "Authorization: Bearer 123456"
步骤10:绑定域名
server {
listen 80;
server_name deepseekcoder.guifanhua.com;
location / {
proxy_pass http://localhost:2701;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_cache_bypass $http_upgrade;
proxy_read_timeout 3600s;
}
}
步骤11:测试域名访问接口
curl -X GET "http://deepseekcoder.guifanhua.com/v1/models" \
-H "Authorization: Bearer 123456"
curl -X POST "http://deepseekcoder.guifanhua.com/v1/chat/completions" \
-H "Authorization: Bearer 123456" \
-H "Content-Type: application/json" \
-d '{
"model": "6.7Bcoder",
"messages": [
{"role": "user", "content": "用 Python 实现快速排序"}
],
"temperature": 0.7
}'
{"id":"chatcmpl-814e1e194d9d4c6e9dce8196d97bb572","object":"chat.completion","created":1751867668,"model":"6.7Bcoder","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"快速排序是一种分治算法,它通过选择一个元素作为枢轴,将要排序的数据分成两部分,然后对这两部分进行递归处理。\n\n以下是一个 Python 实现的快速排序:\n\n```python\ndef quick_sort(arr):\n if len(arr) <= 1:\n return arr\n else:\n pivot = arr[len(arr) // 2]\n left = [x for x in arr if x < pivot]\n middle = [x for x in arr if x == pivot]\n right = [x for x in arr if x > pivot]\n return quick_sort(left) + middle + quick_sort(right)\n```\n\n在这个算法中,我们首先检查输入的数组(或列表)是否只包含少于或等于 1 个元素。如果是,我们直接返回这个数组,因为它已经是排序好的。\n\n如果数组包含多个元素,我们选择数组的中间元素作为枢轴。然后,我们创建两个新的列表:left(存放小于枢轴的元素)和 right(存放大于枢轴的元素)。\n\n最后,我们递归地对 left 和 right 列表进行快速排序,并将结果合并。\n","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":75,"total_tokens":379,"completion_tokens":304,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}root@flexusx-f89b:/opt#