{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IQt5lMKvxios"
   },
   "source": [
    "# Use Amazon Bedrock\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/amazon-bedrock/langchain-qa-example.ipynb)\n",
    "\n",
    "This workbook demonstrates how to work with Langchain [Amazon Bedrock](https://aws.amazon.com/bedrock/). Amazon Bedrock is a managed service that makes foundation models from leading AI startup and Amazon's own Titan models available through APIs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "fWuHgEHjyRMt"
   },
   "source": [
    "## Install packages and import modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7byqCX6VyWYW"
   },
   "outputs": [],
   "source": [
    "# install packages\n",
    "!python3 -m pip install -qU langchain langchain-elasticsearch langchain_community boto3 tiktoken\n",
    "\n",
    "# import modules\n",
    "from getpass import getpass\n",
    "from urllib.request import urlopen\n",
    "from langchain_elasticsearch import ElasticsearchStore\n",
    "from langchain_community.embeddings.bedrock import BedrockEmbeddings\n",
    "from langchain.llms import Bedrock\n",
    "from langchain.chains import RetrievalQA\n",
    "import boto3\n",
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bWCXMAi58M3G"
   },
   "source": [
    "Note: boto3 is part of AWS SDK for Python and is required to use Bedrock LLM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "F84cH96QqG6_"
   },
   "source": [
    "## Init Bedrock client\n",
    "\n",
    "To authorize in AWS service we can use `~/.aws/config` file with [configuring credentials](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials) or pass `AWS_ACCESS_KEY`, `AWS_SECRET_KEY`, `AWS_REGION` to boto3 module.\n",
    "\n",
    "We're using second approach for our example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kG76APtmp6dH"
   },
   "outputs": [],
   "source": [
    "default_region = \"us-east-1\"\n",
    "AWS_ACCESS_KEY = getpass(\"AWS Acces key: \")\n",
    "AWS_SECRET_KEY = getpass(\"AWS Secret key: \")\n",
    "AWS_REGION = input(f\"AWS Region [default: {default_region}]: \") or default_region\n",
    "\n",
    "bedrock_client = boto3.client(\n",
    "    service_name=\"bedrock-runtime\",\n",
    "    region_name=AWS_REGION,\n",
    "    aws_access_key_id=AWS_ACCESS_KEY,\n",
    "    aws_secret_access_key=AWS_SECRET_KEY,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Utg-ZqS_QS1G"
   },
   "source": [
    "## Connect to Elasticsearch\n",
    "\n",
    "ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.\n",
    "\n",
    "We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
    "\n",
    "\n",
    "We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment. This would help create and index data easily. In the ElasticsearchStore instance, will set embedding to [BedrockEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.bedrock.BedrockEmbeddings.html) to embed the texts and elasticsearch index name that will be used in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "idJiMEZpQfP7"
   },
   "outputs": [],
   "source": [
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
    "\n",
    "# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
    "ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
    "\n",
    "bedrock_embedding = BedrockEmbeddings(client=bedrock_client)\n",
    "\n",
    "vector_store = ElasticsearchStore(\n",
    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    "    index_name=\"workplace_index\",\n",
    "    embedding=bedrock_embedding,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qAkNwd_lQ7HZ"
   },
   "source": [
    "## Download the dataset\n",
    "\n",
    "Let's download the sample dataset and deserialize the document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sjwpw_IxQ72L"
   },
   "outputs": [],
   "source": [
    "url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/example-apps/chatbot-rag-app/data/data.json\"\n",
    "\n",
    "response = urlopen(url)\n",
    "\n",
    "workplace_docs = json.loads(response.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YWCTPOgnRHiZ"
   },
   "source": [
    "## Split Documents into Passages\n",
    "\n",
    "We’ll chunk documents into passages in order to improve the retrieval specificity and to ensure that we can provide multiple passages within the context window of the final question answering prompt.\n",
    "\n",
    "Here we are chunking documents into 500 token passages with an overlap of 0 tokens.\n",
    "\n",
    "Here we are using a simple splitter but Langchain offers more advanced splitters to reduce the chance of context being lost."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "mAtGD7GjRIIf"
   },
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "metadata = []\n",
    "content = []\n",
    "\n",
    "for doc in workplace_docs:\n",
    "    content.append(doc[\"content\"])\n",
    "    metadata.append(\n",
    "        {\n",
    "            \"name\": doc[\"name\"],\n",
    "            \"summary\": doc[\"summary\"],\n",
    "            \"rolePermissions\": doc[\"rolePermissions\"],\n",
    "        }\n",
    "    )\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
    "    chunk_size=512, chunk_overlap=256\n",
    ")\n",
    "docs = text_splitter.create_documents(content, metadatas=metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MRXt1tnXRK_M"
   },
   "source": [
    "## Index data into elasticsearch\n",
    "\n",
    "Next, we will index data to elasticsearch using [ElasticsearchStore.from_documents](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents). We will use Cloud ID,  Password and Index name values set in the `Create cloud deployment` step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-T2P8_ltRNgy"
   },
   "outputs": [],
   "source": [
    "documents = vector_store.from_documents(\n",
    "    docs,\n",
    "    es_cloud_id=ELASTIC_CLOUD_ID,\n",
    "    es_api_key=ELASTIC_API_KEY,\n",
    "    index_name=\"workplace_index\",\n",
    "    embedding=bedrock_embedding,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "azqaOaChswVv"
   },
   "source": [
    "## Init Bedrock LLM\n",
    "\n",
    "Next, we will initialize Bedrock LLM. In the Bedrock instance, will pass `bedrock_client` and specific `model_id`: `amazon.titan-text-express-v1`, `ai21.j2-ultra-v1`, `anthropic.claude-v2`, `cohere.command-text-v14` or etc. You can see list of available base models on [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids-arns.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fRtZ_dfXsjaL"
   },
   "outputs": [],
   "source": [
    "default_model_id = \"amazon.titan-text-express-v1\"\n",
    "AWS_MODEL_ID = input(f\"AWS model [default: {default_model_id}]: \") or default_model_id\n",
    "llm = Bedrock(client=bedrock_client, model_id=AWS_MODEL_ID)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9UZskhJRRTQV"
   },
   "source": [
    "## Asking a question\n",
    "Now that we have the passages stored in Elasticsearch and llm is initialized, we can now ask a question to get the relevant passages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gWGfbz2TkuJt",
    "outputId": "12af9c94-9113-4f34-b9de-f60681951206"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question: What is our work from home policy?\n",
      "\n",
      "\u001b[92m ---- Answer ---- \u001b[0m\n",
      " We have a full-time work from home policy that provides guidelines and support for employees to work remotely, ensuring the continuity and productivity of business operations during the COVID-19 pandemic and beyond.\n",
      "\n",
      "\u001b[94m ---- Sources ---- \u001b[0m\n",
      "Name: Work From Home Policy\n",
      "Content: Effective: March 2020\n",
      "Purpose\n",
      "\n",
      "The purpose of this full-time work-from-home policy is to provide guidelines and support for employees to conduct their work remotely, ensuring the continuity and productivity of business operations during the COVID-19 pandemic and beyond.\n",
      "Scope\n",
      "\n",
      "This policy applies to all employees who are eligible for remote work as determined by their role and responsibilities. It is designed to allow employees to work from home full time while maintaining the same level of performance and collaboration as they would in the office.\n",
      "Eligibility\n",
      "\n",
      "Employees who can perform their work duties remotely and have received approval from their direct supervisor and the HR department are eligible for this work-from-home arrangement.\n",
      "Equipment and Resources\n",
      "-------\n",
      "\n",
      "Name: Work From Home Policy\n",
      "Content: The company encourages employees to prioritize their health and well-being while working from home. This includes taking regular breaks, maintaining a work-life balance, and seeking support from supervisors and colleagues when needed.\n",
      "Policy Review and Updates\n",
      "\n",
      "This work-from-home policy will be reviewed periodically and updated as necessary, taking into account changes in public health guidance, business needs, and employee feedback.\n",
      "Questions and Concerns\n",
      "\n",
      "Employees are encouraged to direct any questions or concerns about this policy to their supervisor or the HR department.\n",
      "-------\n",
      "\n",
      "Name: Work From Home Policy\n",
      "Content: Employees are required to accurately track their work hours using the company's time tracking system. Non-exempt employees must obtain approval from their supervisor before working overtime.\n",
      "Confidentiality and Data Security\n",
      "\n",
      "Employees must adhere to the company's confidentiality and data security policies while working from home. This includes safeguarding sensitive information, securing personal devices and internet connections, and reporting any security breaches to the IT department.\n",
      "Health and Well-being\n",
      "\n",
      "The company encourages employees to prioritize their health and well-being while working from home. This includes taking regular breaks, maintaining a work-life balance, and seeking support from supervisors and colleagues when needed.\n",
      "Policy Review and Updates\n",
      "-------\n",
      "\n",
      "Name: Work From Home Policy\n",
      "Content: Employees who can perform their work duties remotely and have received approval from their direct supervisor and the HR department are eligible for this work-from-home arrangement.\n",
      "Equipment and Resources\n",
      "\n",
      "The necessary equipment and resources will be provided to employees for remote work, including a company-issued laptop, software licenses, and access to secure communication tools. Employees are responsible for maintaining and protecting the company's equipment and data.\n",
      "Workspace\n",
      "\n",
      "Employees working from home are responsible for creating a comfortable and safe workspace that is conducive to productivity. This includes ensuring that their home office is ergonomically designed, well-lit, and free from distractions.\n",
      "Communication\n",
      "-------"
     ]
    }
   ],
   "source": [
    "retriever = vector_store.as_retriever()\n",
    "\n",
    "qa = RetrievalQA.from_llm(llm=llm, retriever=retriever, return_source_documents=True)\n",
    "\n",
    "questions = [\n",
    "    \"What is the nasa sales team?\",\n",
    "    \"What is our work from home policy?\",\n",
    "    \"Does the company own my personal project?\",\n",
    "    \"What job openings do we have?\",\n",
    "    \"How does compensation work?\",\n",
    "]\n",
    "question = questions[1]\n",
    "print(f\"Question: {question}\")\n",
    "\n",
    "ans = qa({\"query\": question})\n",
    "\n",
    "print(\"\\033[92m ---- Answer ---- \\033[0m\")\n",
    "print(ans[\"result\"] + \"\\n\")\n",
    "print(\"\\033[94m ---- Sources ---- \\033[0m\")\n",
    "for doc in ans[\"source_documents\"]:\n",
    "    print(\"Name: \" + doc.metadata[\"name\"])\n",
    "    print(\"Content: \" + doc.page_content)\n",
    "    print(\"-------\")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}