{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chatbot Example with Self Query Retriever\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)\n",
    "\n",
    "This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert a question into a structured query and apply structured query to Elasticsearch index. \n",
    "\n",
    "Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.\n",
    "\n",
    "\n",
    "We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages and import modules\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.1\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!python3 -m pip install -qU lark elasticsearch langchain langchain-elasticsearch openai\n",
    "\n",
    "from langchain.schema import Document\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain_elasticsearch import ElasticsearchStore\n",
    "from langchain.llms import OpenAI\n",
    "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
    "from langchain.chains.query_constructor.base import AttributeInfo\n",
    "from getpass import getpass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create documents \n",
    "Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` .\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = [\n",
    "    Document(\n",
    "        page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
    "        metadata={\n",
    "            \"year\": 1993,\n",
    "            \"rating\": 7.7,\n",
    "            \"genre\": \"science fiction\",\n",
    "            \"director\": \"Steven Spielberg\",\n",
    "            \"title\": \"Jurassic Park\",\n",
    "        },\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
    "        metadata={\n",
    "            \"year\": 2010,\n",
    "            \"director\": \"Christopher Nolan\",\n",
    "            \"rating\": 8.2,\n",
    "            \"title\": \"Inception\",\n",
    "        },\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
    "        metadata={\n",
    "            \"year\": 2006,\n",
    "            \"director\": \"Satoshi Kon\",\n",
    "            \"rating\": 8.6,\n",
    "            \"title\": \"Paprika\",\n",
    "        },\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
    "        metadata={\n",
    "            \"year\": 2019,\n",
    "            \"director\": \"Greta Gerwig\",\n",
    "            \"rating\": 8.3,\n",
    "            \"title\": \"Little Women\",\n",
    "        },\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Toys come alive and have a blast doing so\",\n",
    "        metadata={\n",
    "            \"year\": 1995,\n",
    "            \"genre\": \"animated\",\n",
    "            \"director\": \"John Lasseter\",\n",
    "            \"rating\": 8.3,\n",
    "            \"title\": \"Toy Story\",\n",
    "        },\n",
    "    ),\n",
    "    Document(\n",
    "        page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
    "        metadata={\n",
    "            \"year\": 1979,\n",
    "            \"rating\": 9.9,\n",
    "            \"director\": \"Andrei Tarkovsky\",\n",
    "            \"genre\": \"science fiction\",\n",
    "            \"rating\": 9.9,\n",
    "            \"title\": \"Stalker\",\n",
    "        },\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to Elasticsearch\n",
    "\n",
    "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. \n",
    "\n",
    "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
    "\n",
    "\n",
    "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily.  We would also send list of documents that we created in the previous step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
    "\n",
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
    "\n",
    "# https://platform.openai.com/api-keys\n",
    "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n",
    "\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n",
    "\n",
    "\n",
    "vectorstore = ElasticsearchStore.from_documents(\n",
    "    docs,\n",
    "    embeddings,\n",
    "    index_name=\"elasticsearch-self-query-demo\",\n",
    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup query retriever\n",
    "\n",
    "Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. \n",
    "\n",
    "We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add details about metadata fields\n",
    "metadata_field_info = [\n",
    "    AttributeInfo(\n",
    "        name=\"genre\",\n",
    "        description=\"The genre of the movie. Can be either 'science fiction' or 'animated'.\",\n",
    "        type=\"string or list[string]\",\n",
    "    ),\n",
    "    AttributeInfo(\n",
    "        name=\"year\",\n",
    "        description=\"The year the movie was released\",\n",
    "        type=\"integer\",\n",
    "    ),\n",
    "    AttributeInfo(\n",
    "        name=\"director\",\n",
    "        description=\"The name of the movie director\",\n",
    "        type=\"string\",\n",
    "    ),\n",
    "    AttributeInfo(\n",
    "        name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
    "    ),\n",
    "]\n",
    "\n",
    "document_content_description = \"Brief summary of a movie\"\n",
    "\n",
    "# Set up openAI llm with sampling temperature 0\n",
    "llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n",
    "\n",
    "# instantiate retriever\n",
    "retriever = SelfQueryRetriever.from_llm(\n",
    "    llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Question Answering with Self-Query Retriever\n",
    "\n",
    "We will now demonstrate how to use self-query retriever for RAG."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='Inception (2010)')"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.schema.runnable import RunnableParallel, RunnablePassthrough\n",
    "from langchain.prompts import ChatPromptTemplate, PromptTemplate\n",
    "from langchain.schema import format_document\n",
    "\n",
    "LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(\n",
    "    \"\"\"\n",
    "Use the following context movies that matched the user question. Use the movies below only to answer the user's question.\n",
    "\n",
    "If you don't know the answer, just say that you don't know, don't try to make up an answer.\n",
    "\n",
    "----\n",
    "{context}\n",
    "----\n",
    "Question: {question}\n",
    "Answer:\n",
    "\"\"\"\n",
    ")\n",
    "\n",
    "DOCUMENT_PROMPT = PromptTemplate.from_template(\n",
    "    \"\"\"\n",
    "---\n",
    "title: {title}                                                                                   \n",
    "year: {year}  \n",
    "director: {director}    \n",
    "---\n",
    "\"\"\"\n",
    ")\n",
    "\n",
    "\n",
    "def _combine_documents(\n",
    "    docs, document_prompt=DOCUMENT_PROMPT, document_separator=\"\\n\\n\"\n",
    "):\n",
    "    doc_strings = [format_document(doc, document_prompt) for doc in docs]\n",
    "    return document_separator.join(doc_strings)\n",
    "\n",
    "\n",
    "_context = RunnableParallel(\n",
    "    context=retriever | _combine_documents,\n",
    "    question=RunnablePassthrough(),\n",
    ")\n",
    "\n",
    "chain = _context | LLM_CONTEXT_PROMPT | llm\n",
    "\n",
    "chain.invoke(\n",
    "    \"What movies are about dreams and was released after the year 2007 but before 2012?\"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.11.4 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.3"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}