{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# RAG Fusion\n", "\n", "You can also run this notebook online [at Noteable.io](https://app.noteable.io/published/d9902d51-c5e9-4d89-bcb1-f82521ab4497/rag_fusion).\n", "\n", "This notebook shows off a LangChain JS port of [this Github repo](https://github.com/Raudaschl/rag-fusion) - all credit to the original author!\n", "\n", "> RAG-Fusion, a search methodology that aims to bridge the gap between traditional search paradigms and the multifaceted dimensions of human queries. Inspired by the capabilities of Retrieval Augmented Generation (RAG), this project goes a step further by employing multiple query generation and Reciprocal Rank Fusion to re-rank search results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "For this example we'll use an in memory store as our vectorstore/retriever, and some fake data. You can swap out the vectorstore for your [preferred LangChain.js option](https://js.langchain.com/docs/integrations/vectorstores) later.\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "// Deno.env.set(\"OPENAI_API_KEY\", \"\");\n", "\n", "import { OpenAIEmbeddings } from \"npm:langchain@0.0.172/embeddings/openai\";\n", "import { MemoryVectorStore } from \"npm:langchain@0.0.172/vectorstores/memory\";" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "/** Define our fake data */\n", "const allDocuments = [\n", " { id: \"doc1\", text: \"Climate change and economic impact.\" },\n", " { id: \"doc2\", text: \"Public health concerns due to climate change.\" },\n", " { id: \"doc3\", text: \"Climate change: A social perspective.\" },\n", " { id: \"doc4\", text: \"Technological solutions to climate change.\" },\n", " { id: \"doc5\", text: \"Policy changes needed to combat climate change.\" },\n", " { id: \"doc6\", text: \"Climate change and its impact on biodiversity.\" },\n", " { id: \"doc7\", text: \"Climate change: The science and models.\" },\n", " { id: \"doc8\", text: \"Global warming: A subset of climate change.\" },\n", " { id: \"doc9\", text: \"How climate change affects daily weather.\" },\n", " { id: \"doc10\", text: \"The history of climate change activism.\" },\n", "];" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "/** Initialize our vector store with the fake data and OpenAI embeddings. */\n", "const vectorStore = await MemoryVectorStore.fromTexts(\n", " allDocuments.map(({ text }) => text),\n", " allDocuments.map(({ id }) => ({ id })),\n", " new OpenAIEmbeddings()\n", ");\n", "/** Create the retriever */\n", "const retriever = vectorStore.asRetriever();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the Query Generator\n", "\n", "We will now define a chain to do the query generation\n", "This chain [pulls a prompt](https://smith.langchain.com/hub/langchain-ai/rag-fusion-query-generation) from the [LangChain Hub](https://smith.langchain.com/hub) that when provided a query, it tasks the model to generate multiple search queries related to the original. In our case, we're asking for 4 additional queries." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import { ChatOpenAI } from \"npm:langchain@0.0.172/chat_models/openai\";\n", "import { pull } from \"npm:langchain@0.0.172/hub\";\n", "import { StringOutputParser } from \"npm:langchain@0.0.172/schema/output_parser\";\n", "import { RunnableLambda, RunnableSequence } from \"npm:langchain@0.0.172/schema/runnable\";" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "/** Define the chat model */\n", "const model = new ChatOpenAI({\n", " temperature: 0,\n", "});" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "/** Pull a prompt from the hub */\n", "const prompt = await pull(\"langchain-ai/rag-fusion-query-generation\");\n", "// const prompt = ChatPromptTemplate.fromMessages([\n", "// [\"system\", \"You are a helpful assistant that generates multiple search queries based on a single input query.\"],\n", "// [\"user\", \"Generate multiple search queries related to: {original_query}\"],\n", "// [\"user\", \"OUTPUT (4 queries):\"],\n", "// ]);" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "/** Define our chain for generating queries */\n", "const generateQueries = RunnableSequence.from([\n", " prompt,\n", " model,\n", " new StringOutputParser(),\n", " (output) => output.split(\"\\n\"),\n", "]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Construct the Reciprocal Rank Fusion function\n", "This function is used for combining the results of multiple search queries to produce a single ranked list of results. This is a common technique in information retrieval known as data fusion or result merging." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import { Document } from \"npm:langchain@0.0.172/document\";" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "const reciprocalRankFusion = (results: Document[][], k = 60) => {\n", " const fusedScores: Record = {};\n", " for (const result of results) {\n", " // Assumes the docs are returned in sorted order of relevance\n", " result.forEach((item, index) => {\n", " const docString = item.pageContent;\n", " if (!(docString in fusedScores)) {\n", " fusedScores[docString] = 0;\n", " }\n", " fusedScores[docString] += 1 / (index + k);\n", " });\n", " }\n", "\n", " const rerankedResults = Object.entries(fusedScores)\n", " .sort((a, b) => b[1] - a[1])\n", " .map(\n", " ([doc, score]) => new Document({ pageContent: doc, metadata: { score } })\n", " );\n", " return rerankedResults;\n", "};" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Define the full chain\n", "Now we can put all our pieces together in one chain.\n", "The chain performs the following steps:\n", "\n", "1. Generate 4 search queries based on the original query\n", "2. Perform lookups with the retriever for each generated query\n", "3. Pass the results of the vector store lookup to the `reciprocalRankFusion` function\n", "\n", "The `.map()` step on the retriever runs it on each query generated by `generateQueries`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "const chain = RunnableSequence.from([\n", " generateQueries,\n", " retriever.map(),\n", " reciprocalRankFusion,\n", "]);" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[\n", " Document {\n", " pageContent: \"Climate change and economic impact.\",\n", " metadata: { score: 0.06558258417063283 }\n", " },\n", " Document {\n", " pageContent: \"Climate change: A social perspective.\",\n", " metadata: { score: 0.06400409626216078 }\n", " },\n", " Document {\n", " pageContent: \"How climate change affects daily weather.\",\n", " metadata: { score: 0.04787506400409626 }\n", " },\n", " Document {\n", " pageContent: \"Climate change and its impact on biodiversity.\",\n", " metadata: { score: 0.03306010928961749 }\n", " },\n", " Document {\n", " pageContent: \"Public health concerns due to climate change.\",\n", " metadata: { score: 0.016666666666666666 }\n", " },\n", " Document {\n", " pageContent: \"Technological solutions to climate change.\",\n", " metadata: { score: 0.016666666666666666 }\n", " },\n", " Document {\n", " pageContent: \"Policy changes needed to combat climate change.\",\n", " metadata: { score: 0.01639344262295082 }\n", " }\n", "]\n" ] } ], "source": [ "const originalQuery = \"impact of climate change\";\n", "\n", "const result = await chain.invoke({\n", " original_query: originalQuery,\n", "});\n", "\n", "console.log(result);" ] } ], "metadata": { "kernelspec": { "display_name": "Deno", "language": "typescript", "name": "deno" }, "language_info": { "file_extension": ".ts", "mimetype": "text/x.typescript", "name": "typescript", "nb_converter": "script", "pygments_lexer": "typescript", "version": "5.2.2" } }, "nbformat": 4, "nbformat_minor": 2 }