{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "OsXAs2gcIpbC" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "7ZX50cNFOFBt" }, "source": [ " # Evaluate LangChain | Gen AI Evaluation SDK Tutorial\n", "\n", " \n", " \n", " \n", " \n", " \n", "
\n", " \n", " \"Google
Open in Colab\n", "
\n", "
\n", " \n", " \"Google
Open in Colab Enterprise\n", "
\n", "
\n", " \n", " \"Vertex
Open in Workbench\n", "
\n", "
\n", " \n", " \"GitHub
View on GitHub\n", "
\n", "
\n", "\n", "
\n", "\n", "Share to:\n", "\n", "\n", " \"LinkedIn\n", "\n", "\n", "\n", " \"Bluesky\n", "\n", "\n", "\n", " \"X\n", "\n", "\n", "\n", " \"Reddit\n", "\n", "\n", "\n", " \"Facebook\n", " " ] }, { "cell_type": "markdown", "metadata": { "id": "usd0d_LiOFBt" }, "source": [ "| | |\n", "|-|-|\n", "|Author(s) | [Elia Secchi](https://github.com/eliasecchig) |" ] }, { "cell_type": "markdown", "metadata": { "id": "MjDmmmDaOFBt" }, "source": [ "## Overview\n", "\n", "With this tutorial, you learn how to evaluate the performance of a conversational LangChain chain using the *Vertex AI Python SDK for Gen AI Evaluation Service*. The notebook utilizes a dummy chatbot designed to provide recipe suggestions.\n", "\n", "The tutorial goes trough:\n", "1. Data preparation\n", "2. Setting up the LangChain chain\n", "3. Set-up a custom metric\n", "4. Run evaluation with a combination of custom metrics and built-in metrics.\n", "5. Log results into an experiment run and analyze different runs.\n", "\n", "### Costs\n", "\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "- Vertex AI\n", "\n", "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "w-OcPSC8_FUX" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "-7Jso8-FO4N8" }, "source": [ "### Install Vertex AI SDK for Rapid Evaluation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tUat7NRq5JDC" }, "outputs": [], "source": [ "%pip install --upgrade --user --quiet langchain-core langchain-google-vertexai langchain\n", "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[evaluation]\"" ] }, { "cell_type": "markdown", "metadata": { "id": "R5Xep4W9lq-Z" }, "source": [ "### Restart runtime\n", "\n", "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n", "\n", "The restart might take a minute or longer. After it's restarted, continue to the next step." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XRvKdaPDTznN" }, "outputs": [], "source": [ "import IPython\n", "\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "SbmM4z7FOBpM" }, "source": [ "
\n", "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "id": "dmWOrTJ3gx13" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you're running this notebook on Google Colab, run the cell below to authenticate your environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NyKGtVQjgx13" }, "outputs": [], "source": [ "# import sys\n", "\n", "# if \"google.colab\" in sys.modules:\n", "# from google.colab import auth\n", "\n", "# auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "DF4l8DTdWgPY" }, "source": [ "### Set Google Cloud project information and initialize Vertex AI SDK\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", "\n", "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Nqwi-5ufWp_B" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "LOCATION = \"us-central1\" # @param {type:\"string\"}\n", "\n", "\n", "import vertexai\n", "\n", "vertexai.init(project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "dvhI92xhQTzk" }, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qP4ihOCkEBje" }, "outputs": [], "source": [ "from concurrent.futures import ThreadPoolExecutor\n", "from functools import partial\n", "import json\n", "import logging\n", "from typing import Any\n", "\n", "from google.cloud import aiplatform\n", "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n", "from langchain_google_vertexai import ChatVertexAI\n", "\n", "# Main\n", "import pandas as pd\n", "from tqdm import tqdm\n", "import vertexai\n", "from vertexai.evaluation import CustomMetric, EvalTask\n", "\n", "# General\n", "import yaml" ] }, { "cell_type": "markdown", "metadata": { "id": "l26gX-cHOFBu" }, "source": [ "### Helper functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gT_OJBHfCg4Q" }, "outputs": [], "source": [ "def generate_multiturn_history(df: pd.DataFrame):\n", " \"\"\"Processes a DataFrame of messages to add conversation history for each message.\n", "\n", " This function takes a DataFrame containing message data and iterates through each row.\n", " For each message in a row, it constructs the conversation history up to that point by\n", " accumulating previous user and AI messages. This conversation history is then added\n", " to the message data, and the processed messages are returned as a new DataFrame.\n", "\n", " Args:\n", " df: A DataFrame containing message data. It is expected to have a column named\n", " \"messages\" where each entry is a list of dictionaries representing messages in\n", " a conversation. Each message dictionary should have \"user\" and \"reference\" keys.\n", "\n", " Returns:\n", " A DataFrame with the processed messages. Each message dictionary will now have an\n", " additional \"conversation_history\" key containing a list of tuples representing the\n", " conversation history leading up to that message. The tuples are of the form\n", " (\"user\", message_text) or (\"ai\", message_text).\n", " \"\"\"\n", " processed_messages = []\n", " for i, row in df.iterrows():\n", " conversation_history = []\n", " for message in row[\"messages\"]:\n", " message[\"conversation_history\"] = conversation_history\n", " processed_messages.append(message)\n", " conversation_history = conversation_history + [\n", " (\"user\", message[\"user\"]),\n", " (\"ai\", message[\"reference\"]),\n", " ]\n", " return pd.DataFrame(processed_messages)\n", "\n", "\n", "def pairwise(iterable):\n", " \"\"\"Creates an iterable with tuples paired together\n", " e.g s -> (s0, s1), (s2, s3), (s4, s5), ...\n", " \"\"\"\n", " a = iter(iterable)\n", " return zip(a, a)\n", "\n", "\n", "def batch_generate_message(row: dict, callable: Any) -> dict:\n", " \"\"\"\n", " Predicts a response from a chat agent.\n", "\n", " Args:\n", " callable (ChatAgent): A chat agent.\n", " row (dict): A message.\n", " Returns:\n", " dict: The predicted response.\n", " \"\"\"\n", " index, message = row\n", "\n", " messages = []\n", " for user_message, ground_truth in pairwise(message.get(\"conversation_history\", [])):\n", " messages.append((\"user\", user_message))\n", " messages.append((\"ai\", ground_truth))\n", " messages.append((\"user\", message[\"user\"]))\n", " input_callable = {\"messages\": messages, **message.get(\"callable_kwargs\", {})}\n", " response = callable.invoke(input_callable)\n", " message[\"response\"] = response.content\n", " message[\"response_obj\"] = response\n", " return message\n", "\n", "\n", "def batch_generate_messages(\n", " messages: pd.DataFrame, callable: Any, max_workers: int = 4\n", ") -> pd.DataFrame:\n", " \"\"\"\n", " Generates AI-powered responses to a series of user messages using a provided callable.\n", "\n", " This function efficiently processes a Pandas DataFrame containing user-AI message pairs,\n", " utilizing the specified callable (either a LangChain Chain or a custom class with an\n", " `invoke` method) to predict AI responses in parallel.\n", "\n", " Args:\n", " callable (callable): A callable object (e.g., LangChain Chain, custom class) used\n", " for response generation. Must have an `invoke(messages: dict) ->\n", " langchain_core.messages.ai.AIMessage` method.\n", " The `messages` dict follows this structure:\n", " {\"messages\" [(\"user\", \"first\"),(\"ai\", \"a response\"), (\"user\", \"a follow up\")]}\n", "\n", " messages (pd.DataFrame): A DataFrame with one column named 'messages' containing\n", " the list of user-AI message pairs as described above.\n", "\n", " max_workers (int, optional): The number of worker processes to use for parallel\n", " prediction. Defaults to the number of available CPU cores.\n", "\n", " Returns:\n", " pd.DataFrame: A DataFrame containing the original messages and a new column with the predicted AI responses.\n", "\n", " Example:\n", " ```python\n", " messages_df = pd.DataFrame({\n", " \"messages\": [\n", " [{\"user\": \"What's the weather today?\", \"reference\": \"It's sunny.\"}],\n", " [{\"user\": \"Tell me a joke.\", \"reference\": \"Why did the scarecrow win an award?...\n", " Because he was outstanding in his field!\"}]\n", " ]\n", " })\n", "\n", " responses_df = batch_generate_messages(my_callable, messages_df)\n", " ```\n", " \"\"\"\n", " logging.info(\"Executing batch scoring\")\n", " predicted_messages = []\n", " with ThreadPoolExecutor(max_workers) as pool:\n", " partial_func = partial(batch_generate_message, callable=callable)\n", " for message in tqdm(\n", " pool.map(partial_func, messages.iterrows()), total=len(messages)\n", " ):\n", " predicted_messages.append(message)\n", " return pd.DataFrame(predicted_messages)" ] }, { "cell_type": "markdown", "metadata": { "id": "SZMXiJhKD2TY" }, "source": [ "## Import ground truth data for evaluation\n", "\n", "In this sample, we will use 2 conversations as a ground truth. Every message in the conversations, along with the relative history, will be used to produce a response with a foundational model.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jwqlusT7OJ4J" }, "outputs": [], "source": [ "%%writefile chats.yaml\n", "- messages:\n", " - user: Hi\n", " reference: Hi, how can I help you?\n", " - user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n", " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n", " - user: I'm not vegetarian or vegan, but I am gluten-free.\n", " reference: 'Okay, I ll keep that in mind. Here are a few recipes that I think you might like:\n", " * **Grilled Salmon with Roasted Vegetables:** This is a delicious and healthy recipe that is perfect for a weeknight meal. The salmon is grilled to perfection and the roasted vegetables add a touch of sweetness.\n", " * **Chicken Stir-Fry:** This is a quick and easy stir-fry that is perfect for busy weeknights. The chicken is cooked with vegetables and a light sauce.\n", " * **Lentil Soup:** This is a hearty and healthy soup that is perfect for a cold winter day. The lentils are packed with protein and fiber, and the soup is also gluten-free.'\n", " - user: Those all sound great! I think I'm going to try the grilled salmon with roasted vegetables.\n", " reference: That's a great choice! I hope you enjoy it.\n", " - user: Thanks for your help!\n", " reference: You're welcome! Is there anything else I can help you with today?\n", " - user: No, that's all. Thanks again!\n", " reference: You're welcome! Have a great day!\n", "- messages:\n", " - user: Hi\n", " reference: Hi, how can I help you?\n", " - user: I'm looking for a recipe for a romantic dinner. Do you have any recommendations?\n", " reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n", " - user: I'm vegetarian.\n", " reference: 'Sure, I can help you find a healthy vegetarian dinner recipe. Here are a few ideas:\n", " * **Burnt aubergine veggie chilli:** This is a hearty and flavorful dish that is packed with nutrients. The roasted aubergine gives it a smoky flavor, and the lentils and beans add protein and fiber.\n", " * **Simple mushroom curry:** This is a quick and easy curry that is perfect for a weeknight meal. The mushrooms are cooked in a creamy sauce with spices, and the whole dish is ready in under 30 minutes.\n", " * **Vegetarian enchiladas:** This is a classic Mexican dish that is easy to make vegetarian. The enchiladas are filled with a variety of vegetables, and they are topped with a delicious sauce.\n", " * **Braised sesame tofu:** This is a flavorful and satisfying dish that is perfect for a cold night. The tofu is braised in a sauce with sesame, ginger, and garlic, and it is served over rice or noodles.\n", " * **Roast garlic & tahini spinach:** This is a light and healthy dish that is perfect for a spring or summer meal. The spinach is roasted with garlic and tahini, and it is served with a side of pita bread.\n", "\n", " These are just a few ideas to get you started. There are many other great vegetarian dinner recipes out there, so you are sure to find something that you will enjoy.'\n", " - user: Those all sound great! I like the Burnt aubergine veggie chilli\n", " reference: That's a great choice! I hope you enjoy it." ] }, { "cell_type": "markdown", "metadata": { "id": "ie7KTd_OjwdE" }, "source": [ "Let's now load all the messages into a Pandas DataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "j0eSy_avGCW7" }, "outputs": [], "source": [ "y = yaml.safe_load(open(\"chats.yaml\"))\n", "df = pd.DataFrame(y)\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "xA7G0bDTj84z" }, "source": [ "**Decomposing the chat message input/output pairs**\n", "\n", "We are now ready for decomposing multi-turn history. This is essential to enable batch prediction.\n", "\n", "We decompose the `messages` list into single input/output pairs. The input is always composed by `user message`, `reference message` and `conversation_history`.\n", "\n", "**Example:**\n", "\n", "Given the following chat:\n", "\n", "```yaml\n", "- messages:\n", "- user: Hi\n", "reference: Hi, how can I help you?\n", "- user: I'm looking for a recipe for a healthy dinner. Do you have any recommendations?\n", "reference: Sure, I can help you with that. What are your dietary restrictions? Are you vegetarian, vegan, gluten-free, or anything else?\n", "```\n", "\n", "We can generate these two input/output samples:\n", "\n", "```yaml\n", "- user: Hi\n", " reference: Hi, how can I help you?\n", " conversation_history: []\n", "\n", "- user: I'm looking for a recipe for a healthy dinner....\n", " reference: Sure, I can help you with that. What are your ...\n", " conversation_history:\n", " - user: Hi\n", " reference: Hi, how can I help you?\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1FL0R3YXPj3e" }, "outputs": [], "source": [ "df_processed = generate_multiturn_history(df)\n", "df_processed" ] }, { "cell_type": "markdown", "metadata": { "id": "8OhQLR1lGsEq" }, "source": [ "## Let's define our LangChain chain!\n", "\n", "We now need to define our LangChain Chain. For this tutorial, we will create a simple conversational chain capable of producing cooking recipes for users." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "u5B0ufczjfA_" }, "outputs": [], "source": [ "llm = ChatVertexAI(model_name=\"gemini-2.0-flash\", temperature=0)\n", "\n", "template = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\"\"\",\n", " ),\n", " MessagesPlaceholder(variable_name=\"messages\"),\n", " ]\n", ")\n", "\n", "chain = template | llm" ] }, { "cell_type": "markdown", "metadata": { "id": "MS9rlK6eq5qY" }, "source": [ "We can test our chain:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qzMotN92qrqv" }, "outputs": [], "source": [ "chain.invoke([(\"human\", \"Hi there!\")])" ] }, { "cell_type": "markdown", "metadata": { "id": "NqdVpTMkq_pc" }, "source": [ "## Batch scoring\n", "\n", "We are now ready to perform batch scoring. To perform batch scoring we will leverage the utility function `batch_generate_messages`\n", "\n", "Have a look at the definition to see the expected input format." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Qc4DiVYOKk6H" }, "outputs": [], "source": [ "help(batch_generate_messages)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Pip0WjopXMsu" }, "outputs": [], "source": [ "scored_data = batch_generate_messages(messages=df_processed, callable=chain)\n", "scored_data" ] }, { "cell_type": "markdown", "metadata": { "id": "7Tf1bssiSBk7" }, "source": [ "## Evaluation\n", "\n", "We'll utilize [Vertex AI Rapid Evaluation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/rapid-evaluation) to assess our generative AI model's performance. This service within Vertex AI streamlines the evaluation process, integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments) for tracking, and offers a range of [pre-built metrics](https://cloud.google.com/vertex-ai/generative-ai/docs/models/determine-eval#task-and-metrics) and the capability to define custom ones.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5ZSzjLI_rBSp" }, "source": [ "#### Define a CustomMetric using Gemini model\n", "\n", "Define a customized Gemini model-based metric function, with explanations for the score. The registered custom metrics are computed on the client side, without using online evaluation service APIs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aT0uclHrSBlC" }, "outputs": [], "source": [ "evaluator_llm = ChatVertexAI(\n", " model_name=\"gemini-2.0-flash\",\n", " temperature=0,\n", " response_mime_type=\"application/json\",\n", ")\n", "\n", "\n", "def custom_faithfulness(instance):\n", " prompt = f\"\"\"You are examining written text content. Here is the text:\n", "************\n", "Written content: {instance[\"response\"]}\n", "************\n", "Original source data: {instance[\"reference\"]}\n", "\n", "Examine the text and determine whether the text is faithful or not.\n", "Faithfulness refers to how accurately a generated summary reflects the essential information and key concepts present in the original source document.\n", "A faithful summary stays true to the facts and meaning of the source text, without introducing distortions, hallucinations, or information that wasn't originally there.\n", "\n", "Your response must be an explanation of your thinking along with single integer number on a scale of 0-5, 0\n", "the least faithful and 5 being the most faithful.\n", "\n", "Produce results in JSON\n", "\n", "Expected format:\n", "\n", "```json\n", "{{\n", " \"explanation\": \"< your explanation>\",\n", " \"custom_faithfulness\": \n", "}}\n", "```\n", "\"\"\"\n", "\n", " result = evaluator_llm.invoke([(\"human\", prompt)])\n", " result = json.loads(result.content)\n", " return result\n", "\n", "\n", "# Register Custom Metric\n", "custom_faithfulness_metric = CustomMetric(\n", " name=\"custom_faithfulness\",\n", " metric_function=custom_faithfulness,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "JpdRqYlLq53t" }, "source": [ "### Run Evaluation with CustomMetric" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "w4iU_mIhoY93" }, "outputs": [], "source": [ "experiment_name = \"rapid-eval-langchain-eval\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "U3r3NiKNqn3u" }, "source": [ "We are now ready to run the evaluation. We will use different metrics, combining the custom metric we defined above with some pre-built metrics.\n", "\n", "Results of the evaluation will be automatically tagged into the experiment_name we defined.\n", "\n", "You can click `View Experiment`, to see the experiment in Google Cloud Console." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zeVra-g1rAFV" }, "outputs": [], "source": [ "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n", "\n", "eval_task = EvalTask(\n", " dataset=scored_data,\n", " metrics=metrics,\n", " experiment=experiment_name,\n", " metric_column_mapping={\"prompt\": \"user\"},\n", ")\n", "eval_result = eval_task.evaluate()" ] }, { "cell_type": "markdown", "metadata": { "id": "ka3tZCL_uurD" }, "source": [ "Once an eval result is produced, we are able to display summary metrics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KheOvIvtiRlz" }, "outputs": [], "source": [ "eval_result.summary_metrics" ] }, { "cell_type": "markdown", "metadata": { "id": "JcALGGlwu0p_" }, "source": [ "We are also able to display a pandas dataframe containing a detailed summary of how our eval dataset performed and relative granular metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9zJ686YYiWJC" }, "outputs": [], "source": [ "eval_result.metrics_table" ] }, { "cell_type": "markdown", "metadata": { "id": "L1NsUKA3vEu8" }, "source": [ "## Iterating over the prompt\n", "\n", "Let's perform some simple changes to our chain to see how our evaluation results change." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "O_8PnvLSv7Nu" }, "outputs": [], "source": [ "template = ChatPromptTemplate.from_messages(\n", " [\n", " (\n", " \"system\",\n", " \"\"\"You are a conversational bot that produce nice recipes for users based on a question.\n", "Before suggesting a recipe, you should ask for the dietary requirements..\"\"\",\n", " ),\n", " MessagesPlaceholder(variable_name=\"messages\"),\n", " ]\n", ")\n", "\n", "new_chain = template | llm" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JF_Po81twPbr" }, "outputs": [], "source": [ "scored_data = batch_generate_messages(messages=df_processed, callable=new_chain)\n", "scored_data.rename(columns={\"text\": \"response\"}, inplace=True)\n", "scored_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "snIQ0itfwUZa" }, "outputs": [], "source": [ "metrics = [\"fluency\", \"coherence\", \"safety\", custom_faithfulness_metric]\n", "eval_task = EvalTask(\n", " dataset=scored_data,\n", " metrics=metrics,\n", " experiment=experiment_name,\n", " metric_column_mapping={\"prompt\": \"user\"},\n", ")\n", "eval_result = eval_task.evaluate()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CT6Ma5FLwfbF" }, "outputs": [], "source": [ "eval_result.summary_metrics" ] }, { "cell_type": "markdown", "metadata": { "id": "b42l5juJsyyY" }, "source": [ "#### Let's compare both eval results\n", "\n", "We can do that by using the method `display_runs` for a given `eval task` object to see which prompt template performed best on our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HP0Zcm1yvh95" }, "outputs": [], "source": [ "df = vertexai.preview.get_experiment_df(experiment_name).T\n", "df" ] }, { "cell_type": "markdown", "metadata": { "id": "TpV-iwP9qw9c" }, "source": [ "## Cleaning up\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sx_vKniMq9ZX" }, "outputs": [], "source": [ "import os\n", "\n", "# Delete Experiments\n", "delete_experiments = True\n", "if delete_experiments or os.getenv(\"IS_TESTING\"):\n", " experiments_list = aiplatform.Experiment.list()\n", " for experiment in experiments_list:\n", " experiment.delete()" ] } ], "metadata": { "colab": { "name": "evaluate_langchain_chains.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }