# Chatbot Example with Self Query Retriever
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/self-query-retriever-examples/chatbot-example.ipynb)

This workbook demonstrates example of Elasticsearch's [Self-query retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html) to convert a question into a structured query and apply structured query to Elasticsearch index. 

Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.


We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.


## Install packages and import modules


In [30]:
!python3 -m pip install -qU lark elasticsearch langchain langchain-elasticsearch openai

from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_elasticsearch import ElasticsearchStore
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from getpass import getpass


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Create documents 
Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` .



In [67]:
docs = [
 Document(
 page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
 metadata={
 "year": 1993,
 "rating": 7.7,
 "genre": "science fiction",
 "director": "Steven Spielberg",
 "title": "Jurassic Park",
 },
 ),
 Document(
 page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
 metadata={
 "year": 2010,
 "director": "Christopher Nolan",
 "rating": 8.2,
 "title": "Inception",
 },
 ),
 Document(
 page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
 metadata={
 "year": 2006,
 "director": "Satoshi Kon",
 "rating": 8.6,
 "title": "Paprika",
 },
 ),
 Document(
 page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
 metadata={
 "year": 2019,
 "director": "Greta Gerwig",
 "rating": 8.3,
 "title": "Little Women",
 },
 ),
 Document(
 page_content="Toys come alive and have a blast doing so",
 metadata={
 "year": 1995,
 "genre": "animated",
 "director": "John Lasseter",
 "rating": 8.3,
 "title": "Toy Story",
 },
 ),
 Document(
 page_content="Three men walk into the Zone, three men walk out of the Zone",
 metadata={
 "year": 1979,
 "rating": 9.9,
 "director": "Andrei Tarkovsky",
 "genre": "science fiction",
 "rating": 9.9,
 "title": "Stalker",
 },
 ),
]

## Connect to Elasticsearch

ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial. 

We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.


We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step.

In [68]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic Api Key: ")

# https://platform.openai.com/api-keys
OPENAI_API_KEY = getpass("OpenAI API key: ")

embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)


vectorstore = ElasticsearchStore.from_documents(
 docs,
 embeddings,
 index_name="elasticsearch-self-query-demo",
 es_cloud_id=ELASTIC_CLOUD_ID,
 es_api_key=ELASTIC_API_KEY,
)

## Setup query retriever

Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document. 

We will then instantiate retriever with [SelfQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.self_query.base.SelfQueryRetriever.html)

In [80]:
# Add details about metadata fields
metadata_field_info = [
 AttributeInfo(
 name="genre",
 description="The genre of the movie. Can be either 'science fiction' or 'animated'.",
 type="string or list[string]",
 ),
 AttributeInfo(
 name="year",
 description="The year the movie was released",
 type="integer",
 ),
 AttributeInfo(
 name="director",
 description="The name of the movie director",
 type="string",
 ),
 AttributeInfo(
 name="rating", description="A 1-10 rating for the movie", type="float"
 ),
]

document_content_description = "Brief summary of a movie"

# Set up openAI llm with sampling temperature 0
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

# instantiate retriever
retriever = SelfQueryRetriever.from_llm(
 llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

# Question Answering with Self-Query Retriever

We will now demonstrate how to use self-query retriever for RAG.

In [77]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import format_document

LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template(
 """
Use the following context movies that matched the user question. Use the movies below only to answer the user's question.

If you don't know the answer, just say that you don't know, don't try to make up an answer.

----
{context}
----
Question: {question}
Answer:
"""
)

DOCUMENT_PROMPT = PromptTemplate.from_template(
 """
---
title: {title} 
year: {year} 
director: {director} 
---
"""
)


def _combine_documents(
 docs, document_prompt=DOCUMENT_PROMPT, document_separator="\n\n"
):
 doc_strings = [format_document(doc, document_prompt) for doc in docs]
 return document_separator.join(doc_strings)


_context = RunnableParallel(
 context=retriever | _combine_documents,
 question=RunnablePassthrough(),
)

chain = _context | LLM_CONTEXT_PROMPT | llm

chain.invoke(
 "What movies are about dreams and was released after the year 2007 but before 2012?"
)

AIMessage(content='Inception (2010)')