langchain链LCEL方法实现流式输出
sync stream 和 async astream:流式处理的默认实现,从链中流式传输最终输出。
基本步骤:使用VLLM部署本地开源模型,部署成功后,使用openAI形式访问服务,使用langchain加入prompt和model链,实现流式输出。
第一步:部署模型
模型部署:需要把模型部署成openAI范式server,可以采样ollama和vllm方式进行模型部署。
python -m vllm.entrypoints.openai.api_server --model modelscope/hub/qwen/Qwen2-7B-Instruct --gpu_memory_utilization=0.3
部署成功
Uvicorn running on http://0.0.0.0:8000
部署后,测试一下效果
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "modelscope/hub/qwen/Qwen2-7B-Instruct",
"messages": [
{"role": "system", "content": "你是一个AI营养师,专门为大家提供专业的营养健康建议"},
{"role": "user", "content": "我缺少维生素D,给我2个简短的建议"}
],
"temperature": 0.7,
"top_p": 0.8,
"repetition_penalty": 1.05,
"max_tokens": 512
}'
第二步:以openAI形式访问VLLM部署的模型服务
第一种格式访问(langchain适用这种形式)
import os
from langchain.chat_models import ChatOpenAI
os.environ['OPENAI_API_KEY'] = 'none'
os.environ['OPENAI_BASE_URL'] = 'http://localhost:8000/v1'
# 使用qwen2-7b(实际为in4量化版本)
llm = ChatOpenAI(temperature=0, model='modelscope/hub/qwen/Qwen2-7B-Instruct')
from langchain.chains.llm import LLMChain
from langchain_core.prompts import PromptTemplate
_prompt = """ 你是一个发言友好的AI助理。现在回答用户的提问:{question}。"""
prompt = PromptTemplate.from_template(_prompt)
chat_chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
q = "你好,你有什么功能?"
response = chat_chain.run(question=q,verbose=True) # 终端用户的提问字符串
print(response)
第二种形式
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
chat_response = client.chat.completions.create(
model="Qwen/Qwen2.5-7B-Instruct",
messages=[
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": "Tell me something about large language models."},
],
temperature=0.7,
top_p=0.8,
max_tokens=512,
extra_body={
"repetition_penalty": 1.05,
},
)
print("Chat response:", chat_response)
第三步:加载模型后,添加chain,应用流式输出
prompt = ChatPromptTemplate.from_template("讲一个关于 {topic}的笑话")
output_parser = StrOutputParser()
chain = prompt | model | output_parser
import asyncio
# 定义一个异步函数来处理异步生成器
async def main():
async for chunk in chain.astream("冰淇淋"):
print(chunk, end="|", flush=True)
# 运行异步主函数
asyncio.run(main())
或者
prompt = ChatPromptTemplate.from_template("讲一个关于 {topic}的笑话")
output_parser = StrOutputParser()
chain = prompt | model | output_parser
for chunk in chain.stream("冰淇淋"):
print(chunk, end="|", flush=True)
完整的代码如下:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.chat_models import ChatOpenAI
import os
from langchain.chat_models import ChatOpenAI
os.environ['OPENAI_API_KEY'] = 'none'
os.environ['OPENAI_BASE_URL'] = 'http://localhost:8000/v1'
model = ChatOpenAI(temperature=0, model='modelscope/hub/qwen/Qwen2-7B-Instruct')
prompt = ChatPromptTemplate.from_template("讲一个关于 {topic}的笑话")
output_parser = StrOutputParser()
chain = prompt | model | output_parser
import asyncio
# # 定义一个异步函数来处理异步生成器
# async def main():
# async for chunk in chain.astream("冰淇淋"):
# print(chunk, end="|", flush=True)
# # 运行异步主函数
# asyncio.run(main())
#
for chunk in chain.stream("冰淇淋"):
print(chunk, end="|", flush=True)