基于智谱AI和本地小模型使用Ragas评估Rag结果(不使用默认的OpenAI模型)

xlxxlup

已于 2025-05-11 17:35:17 修改

阅读量283

点赞数 2

CC 4.0 BY-SA版权

文章标签：人工智能语言模型

于 2025-05-11 17:33:32 首次发布

本文链接：https://blog.csdn.net/OHFOWEHFOW/article/details/147875247

我们在使用Ragas评价Rag结果时，由于Ragas默认使用的是gpt模型，对于国内用户来说难免比较麻烦。于是我就结合Ragas和LangChain的官方手册对这位大佬的代码进行了一定的修改,只需要使用免费的智谱AI Key以及很小的本地embedding模型来替换原来默认的gpt模型，照样实现原有功能(当然效果肯定会差一些)。

强烈建议可以先看看这个视频，下面的大部分代码都来自这位博主(这位大佬的视频质量非常高)：

【如何利用RAGAs评估RAG系统的好坏】https://www.bilibili.com/video/BV1Jz421Q7Lw?vd_source=c5c396652c0c83be15efe54e0c348c90

接下来就来到了正题，我就直接给出我修改完的代码了(看不懂的先看看上面链接的视频)。

from langchain.document_loaders import ArxivLoader
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.embeddings import Embeddings
from sentence_transformers import SentenceTransformer
from typing import List
from langchain import PromptTemplate
from openai import OpenAI
from datasets import Dataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from langchain_community.chat_models import ChatZhipuAI
import pandas as pd

import os

os.environ["ZHIPUAI_API_KEY"] = "###################################" # 替换为你自己的智谱API密钥(如果不知道怎么得到自己的Key，可以看我上一篇文章)

paper_docs = ArxivLoader(query="2309.15217", load_max_docs=1).load()


# 这里由于直接加载的本地embedding模型，所以需添加两个函数
class HuggingFaceEmbeddings(Embeddings):
    def __init__(self, model_path: str):
        self.model = SentenceTransformer(model_path)

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        return self.model.encode(texts).tolist()

    def embed_query(self, text: str) -> List[float]:
        return self.model.encode([text])[0].tolist()


text_splitter = RecursiveCharacterTextSplitter(chunk_size=600)

docs = text_splitter.split_documents(paper_docs)

# 这里需要去HuggingFace下载对应的模型,模型很小不用担心
model_path = "E:\HuggingFace\sentence-transformers\sentence-t5-large"
embeddings = HuggingFaceEmbeddings(model_path)

vectorstore = Chroma.from_documents(docs, embeddings)

base_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

relevant_docs = base_retriever.get_relevant_documents("What is Retrieval Augmented Generation?")


template = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 

Question: {question} 

Context: {context} 

Answer:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)


client = OpenAI(
    api_key="###################################",  # 替换为你自己的智谱API密钥(如果不知道怎么得到自己的Key，可以看我上一篇文章)
    base_url="https://open.bigmodel.cn/api/paas/v4/"  # 这里的url不用改
)

# Ragas 数据集格式要求  ['question', 'answer', 'contexts', 'ground_truths']
'''
{
    "question": [], <-- 问题基于Context的
    "answer": [], <-- 答案基于LLM生成的
    "contexts": [], <-- context
    "ground_truths": [] <-- 标准答案
}
'''

questions = ["What is faithfulness ?",
             "How many pages are included in the WikiEval dataset, and which years do they cover information from?",
             "Why is evaluating Retrieval Augmented Generation (RAG) systems challenging?",
             ]

ground_truths = ["Faithfulness refers to the idea that the answer should be grounded in the given context.",
                 "To construct the dataset, we first selected 50 Wikipedia pages covering events that have happened since the start of 2022.",
                 "Evaluating RAG architectures is, however, challenging because there are several dimensions to consider: the ability of the retrieval system to identify relevant and focused context passages, the ability of the LLM to exploit such passages in a faithful way, or the quality of the generation itself."]

answers = []
contexts = []

# 生成答案
for query in questions:
    context = base_retriever.get_relevant_documents(query)
    enhanced_prompt = prompt.format(question=query, context=context)
    # 生成请求，这里不明白的可以去看智谱的官方手册
    completion = client.chat.completions.create(
        model="GLM-4-Flash-250414",
        messages=[
            {
                "role": "system",
                "content": "你是一个根据用户要求回答问题得机器人助手。"
            },
            {
                "role": "user",
                "content": enhanced_prompt
            }
        ],
        top_p=0.7,
        temperature=0.9,
        max_tokens=1000
    )
    answers.append(completion.choices[0].message.content)
    contexts.append([docs.page_content for docs in base_retriever.get_relevant_documents(query)])

# 构建数据
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truth": ground_truths
}
dataset = Dataset.from_dict(data)


llm = ChatZhipuAI(
    model="GLM-4-Flash-250414",
    temperature=0.5,
)

evaluator_llm = LangchainLLMWrapper(llm)

# 这里如果不指定metrics,默认就是metrics = [answer_relevancy, context_precision, faithfulness, context_recall]，不清楚的可以看看evaluate源码，写的很清楚
# 这里需要传入我们自定义的llm，以及自定义的embedding模型(不指定会报错的，因为有些评价指标需要指定)。读者也可以看看evaluate源码，就明白了。
# 然后有小伙伴可能会好奇我是怎么知道要怎么修改的，可以看看https://docs.ragas.io/en/latest/getstarted/rag_eval/#collect-evaluation-data，Ragas的官方手册，里面写的很清楚。
#这里的embedding模型是不是需要和我传入的llm模型的embedding模型匹配呢？有了解的可以评论一下，解答疑惑。
result = evaluate(dataset=dataset, llm=evaluator_llm, embeddings=embeddings)

pd.set_option("display.max_colwidth", None)

df = result.to_pandas()

print(df)

运行结果如下(这里有一个警告，不用管)