0% found this document useful (0 votes)
151 views

RAG and LangChain

The document discusses using LangChain and OpenAI to perform retrieval question answering (RetrieverQA) on PDF documents. It covers loading documents, chunking text, storing chunks in a vector database, performing similarity search on the database, and using different 'chain types' to pass retrieved chunks to an LLM for question answering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views

RAG and LangChain

The document discusses using LangChain and OpenAI to perform retrieval question answering (RetrieverQA) on PDF documents. It covers loading documents, chunking text, storing chunks in a vector database, performing similarity search on the database, and using different 'chain types' to pass retrieved chunks to an LLM for question answering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

RAG_and_LangChain_RetrievalQA

December 7, 2023

1 Install libs
[ ]: !pip install langchain
!pip install pypdf
!pip install openai

[3]: from google.colab import userdata


openai_api_key = userdata.get('OPENAI_API_KEY')

2 Loading PDFs
[4]: from langchain.document_loaders import PyPDFLoader

# I will load this summary of "Deep Work" book:


# https://briefer.com › books › deep-work › pdf
pdf1 = path+"Deep_Work_summary.pdf"

# and also RAG paper, to diversify source documents


pdf2 = "https://arxiv.org/pdf/2005.11401.pdf"

loaders = [
# Duplicate documents on purpose - messy data
PyPDFLoader(pdf1),
PyPDFLoader(pdf1),
PyPDFLoader(pdf2),
]

docs = []
for i, loader in enumerate(loaders):
pages = loader.load()
print(f"For doc = {i}, number of pages: {len(pages)}")
docs.extend(loader.load())

print(f" length of docs {len(docs)}")

For doc = 0, number of pages: 8


For doc = 1, number of pages: 8

1
For doc = 2, number of pages: 19
length of docs 35

3 Chunking documents
[5]: from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1500,
chunk_overlap = 150,
separators=['. ']
)

chunks = text_splitter.split_documents(docs)
len(chunks)

[5]: 80

4 Storing docs using Vectorestores + Embedding


[ ]: !pip install chromadb
!pip install tiktoken

[9]: from langchain.vectorstores import Chroma


from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key)

[12]: persist_directory = '/content/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/


↪chroma/'

# !rm -rf persist_directory # remove old database files if any

vectordb = Chroma.from_documents(
documents=chunks,
embedding=embedding,
persist_directory=persist_directory
)

[13]: print(vectordb._collection.count())

80

5 RetrieverQA
5.1 Retriever
Simple retriever to test our question/vectorestore

2
[14]: question = "What is a deep work"
docs_similarity_search = vectordb.similarity_search(question, k=3)

for doc in docs_similarity_search:


print(doc.page_content[:200], f"==> metadata = {doc.metadata}")

All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

5.2 Initialize LLM using GPT-3.5-Turbo


We initialize the LLMs that we’re using to answer the question
[18]: from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name='gpt-3.5-turbo', openai_api_key=openai_api_key)

5.3 RetrievalQA chain: inlcude chunks in the context window for QA


This method allows to perform question-answering chain by retrieving data from the vectorestore
and passing it through our LLMs.
There are different ways to send (chaining) the docs to the LLMs: chain_type:
• stuff : the base chain
• map_reduce
• refine
• map_rerank
dict_keys(['stuff', 'map_reduce', 'refine', 'map_rerank'])

5.4 1- Base chain: Include the whole context in the query to the LLM
By default, the base chain is stuff
It processes a list of documents (in our case 4) by combining them into a single prompt and then
submits that combined prompt to a language model.
It’s well-suited for applications where documents are small.

3
[16]: from langchain.chains import RetrievalQA

Base retriever
[19]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
return_source_documents=True,
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It is a term coined by author and professor Cal
Newport in his book "Deep Work: Rules for Focused Success in a Distracted
World." Deep work involves eliminating distractions, such as social media or
constant interruptions, and dedicating uninterrupted time to work on tasks that
require intense focus and cognitive effort. The goal of deep work is to maximize
productivity, creativity, and the quality of work output.
If you take a closer look at the result object: we have 3 keys: * Query * Result * Source_documents:
which contain the context from the retriever
[20]: # if we take more closer look on the "source_documents" ==> There are 4␣
↪documents

result
for doc in result['source_documents']:
print(doc.page_content[:200], f"==> metadata = {doc.metadata}\n")

All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

4
We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

One can see that there are redundants documents, that you don’t want to pass through the LLM,
you’ll pay for it. We can avoid this by using MMR retriever as explained in the last posted notebook,
which gives more diversified chunks to use in the context.

MMR retriever
[21]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(search_type = "mmr"),
return_source_documents=True,
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It is the ability to work in a state of flow, where
one can fully immerse themselves in their work and produce high-quality,
valuable output. Deep work requires eliminating distractions, such as social
media or interruptions, and dedicating uninterrupted time to engage in intense
cognitive activities. It is contrasted with shallow work, which consists of low-
value, easily replicable tasks that can be done while distracted. Deep work is
considered crucial for producing meaningful and impactful work.

[22]: # if we take more closer look on the "source_documents" ==> There are 4␣
↪documents

result
for doc in result['source_documents']:
print(doc.page_content[:200], f"==> metadata = {doc.metadata}\n")

All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

Cal Newport, Associate Professor in computer science, popular author,


and social media avoider, delves into the world of work, focus, and
productivity. By distinguishing the two fundamental types of w ==> metadata =

5
{'page': 1, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

. This kind of work


means we don't create anything of value. So why is it that we gravitate
towards shallow work?
The truth is that shallow work is easy, and deep work is difficult.
Furthermore, shall ==> metadata = {'page': 1, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

. Living in the digital age means that


we're hyper-connected, but ironically, this can disconnect us from
completing the essential tasks at hand. ==> metadata = {'page': 2, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}

I let you compare the LLM’s answer from both queries: result[‘result’]

[33]: qa_chain = RetrievalQA.from_chain_type(


llm,
retriever=vectordb.as_retriever(search_type = "mmr"),
return_source_documents=True,
chain_type="stuff"
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to the ability to focus without distraction on a cognitively
demanding task. It is a state of flow where you can fully immerse yourself in
your work and produce high-quality and valuable output. Deep work requires
extended periods of uninterrupted concentration and intense focus, allowing you
to push your cognitive abilities to their limits. Unlike shallow work, which
consists of mundane and easily replicable tasks, deep work involves tackling
complex problems, generating new ideas, and producing meaningful work that
requires deep thinking and creativity.
Base chain:
Deep work refers to a state of focused and uninterrupted concentration on a cognitively demanding
task. It is the ability to work in a state of flow, where one can fully immerse themselves in their
work and produce high-quality, valuable output. Deep work requires eliminating distractions, such
as social media or interruptions, and dedicating uninterrupted time to engage in intense cognitive
activities. It is contrasted with shallow work, which consists of low-value, easily replicable tasks
that can be done while distracted. Deep work is considered crucial for producing meaningful and
impactful work.
Stuff

6
Deep work refers to the ability to focus without distraction on a cognitively demanding task. It
is a state of flow where you can fully immerse yourself in your work and produce high-quality and
valuable output. Deep work requires extended periods of uninterrupted concentration and intense
focus, allowing you to push your cognitive abilities to their limits. Unlike shallow work, which
consists of mundane and easily replicable tasks, deep work involves tackling complex problems,
generating new ideas, and producing meaningful work that requires deep thinking and creativity.
==> we have almost the same results

5.5 2-Map-reduce chain:


Each individual chunk is sent to the LLM, to get a base answer. Then those answers are composed
to get the final answer
As you already see that at each time the retriever get 4 source documents.
So the RetrievalQA using mapreduce will make 4 calls to the openAI model, each call corresponds
to a document:
Than, it gathers a summary of the 4 calls, to make a final call:
**Inputs:**
*
**System**: Given the following extracted parts of a long document and a question, create a fin
If you don't know the answer, just say that you don't know. Don't try to make up an answer.*

*********************
>\<summary of question made to doc 1\>
>\<summary of question made to doc 2\>
>\<summary of question made to doc 3\>
>\<summary of question made to doc 4\>
*********************

**Human**: *What is a deep work*


than we can get the model outputs:
**ASSISTANT**:
*There is no clear answer to this question....*
[3]: from IPython import display
display.Image(path_image)
#source from LangChain documentation
[3]:

7
[25]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
chain_type="map_reduce"
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It involves working on a task without any
distractions or interruptions, allowing for maximum productivity and high-
quality output. Deep work requires a state of flow, where the individual is
fully immersed in the task at hand and able to work at their highest level of
cognitive ability. This type of work is often associated with creativity,
problem-solving, and producing high-value work.
In the result object ==> no source_documents ==> only the answer
[26]: result

[26]: {'query': 'What is a deep work',


'result': 'Deep work refers to a state of focused and uninterrupted
concentration on a cognitively demanding task. It involves working on a task
without any distractions or interruptions, allowing for maximum productivity and
high-quality output. Deep work requires a state of flow, where the individual is
fully immersed in the task at hand and able to work at their highest level of
cognitive ability. This type of work is often associated with creativity,
problem-solving, and producing high-value work.'}

[27]: qa_chain_mr = RetrievalQA.from_chain_type(


llm,
retriever=vectordb.as_retriever(search_type = "mmr"),

8
chain_type="map_reduce"
)

question = "What is a deep work"


result = qa_chain_mr({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to the ability to focus without distraction on a cognitively
demanding task. It is a state of flow where one can fully engage in meaningful
work, free from interruptions and distractions. Deep work requires intense
concentration and can lead to high-quality outputs and significant progress in
one's work.
cons of map_reduce ==>
When using map_reduce, since we send each chunk separately to the LLM, there is a possibility
that our question’s answer might be divided between 2 different chuncks (at the end of one chunk
and the beginning of another). This could result in the LLM being unable to find a relevant answer,
leading to responses like “I don’t know”…

5.6 3- Refine chain


We can improve map-reduce results, by using this other chain type “refine”, which makes sequen-
tial calls to the OpenAI API.
With this chain, we also call 4 times the OpenAI Chat API, but with different way than
map_reduce:
At each time we call the LLM, we give: * The current document + * the LLM’s answer from the
previous call with the previous document + * Adapt the prompt template to ask explicitly the
LLM to refine the answer with the new context (current document)
Here are the steps:
FIRST CALL
SYSTEM: Context information is below ****<doc1>***
Given the context information and not prior knowledge, answer any questions
HUMAN:“What is a deep work”

Model ouput:
ASSISTANT: answer1
SECOND CALL: Second a sequence of messages, that contained the former answer from the
model:
HUMAN: “What is a deep work”
AI (could ne assistant role): answer1

9
HUMAN (could be system role): We have the opportunity to refine the existing answer (only if
needed) with some more context below.
****<doc2>****
Given the new context, refine the original answer to better answer the question. if the context isn’t
useful, return the original answer.

Model output:
ASSISTANT: answer2
THIRD CALL: third sequence of messages, that contained the former answer from the model:
HUMAN: “What is a deep work”
AI (could be assistant role): answer2
HUMAN (could be system role): We have the opportunity to refine the existing answer (only if
needed) with some more context below.
****<doc3>****
Given the new context, refine the original answer to better answer the question. if the context isn’t
useful, return the original answer.

Model output:
ASSISTANT: answer3

[5]: from IPython import display
display.Image(path_image)
#source from LangChain documentation
[5]:

[28]: qa_chain = RetrievalQA.from_chain_type(


llm,

10
retriever=vectordb.as_retriever(),
chain_type="refine"
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work, as described by Cal Newport in his book "Deep Work," is a concept
that emphasizes the importance of focused attention and eliminating distractions
to produce high-quality and creative work. It encourages individuals to work
smarter rather than harder by prioritizing deep, concentrated work over shallow,
easily interruptible tasks. Newport provides practical tips to boost focus and
productivity, such as making deep work a routine, scheduling dedicated time for
it, finding a distraction-free environment, and practicing digital minimalism.
By incorporating deep work into their routine and creating a dedicated space,
individuals can enhance their ability to produce meaningful work and maximize
their output.
Refine gives better answer than map reduce. This is because we incorporate at each call the answers
coming from the previous context, which transfers information through the chain.

5.7 3- Map rerank


With map rerank, we also call the LLMs multiple times (=number of documents). The difference
with the other methods, is that we specify in the prompt to answer the question and also , specif-
ically, score the answer “How certain is the LMM in its answer”; Then the answer with
the highest score is returned as final answer.
[7]: from IPython import display
display.Image(path_image)
#source from LangChain documentation
[7]:

11
[32]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
chain_type="map_rerank"
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

/usr/local/lib/python3.10/dist-packages/langchain/chains/llm.py:344:
UserWarning: The apply_and_parse method is deprecated, instead pass an output
parser directly to LLMChain.
warnings.warn(
Answer:
Deep Work is a guide that helps individuals regain control of their time,
eliminate distractions, and improve their overall focus. It emphasizes the
importance of clear focus and careful attention in producing the best and most
creative work. Deep Work suggests that by practicing deep work and incorporating
restorative rest, individuals can enhance their ability to do meaningful work.
The book emphasizes that focus, not time, is the key to accomplishing important
tasks.
Here are the different results:
map_reduce: base retriever:
Deep work refers to a state of focused and uninterrupted concentration on a cognitively demanding
task. It involves working on a task without any distractions or interruptions, allowing for maximum
productivity and high-quality output. Deep work requires a state of flow, where the individual is
fully immersed in the task at hand and able to work at their highest level of cognitive ability. This
type of work is often associated with creativity, problem-solving, and producing high-value work.
map_reduce: MMR:
Deep work refers to the ability to focus without distraction on a cognitively demanding task. It
is a state of flow where one can fully engage in meaningful work, free from interruptions and
distractions. Deep work requires intense concentration and can lead to high-quality outputs and
significant progress in one’s work.
refine:
Deep work, as described by Cal Newport in his book “Deep Work,” is a concept that emphasizes the
importance of focused attention and eliminating distractions to produce high-quality and creative
work. It encourages individuals to work smarter rather than harder by prioritizing deep, concen-
trated work over shallow, easily interruptible tasks. Newport provides practical tips to boost focus
and productivity, such as making deep work a routine, scheduling dedicated time for it, finding
a distraction-free environment, and practicing digital minimalism. By incorporating deep work
into their routine and creating a dedicated space, individuals can enhance their ability to produce
meaningful work and maximize their output.

12
map_rerank:
Deep Work is a guide that helps individuals regain control of their time, eliminate distractions, and
improve their overall focus. It emphasizes the importance of clear focus and careful attention in
producing the best and most creative work. Deep Work suggests that by practicing deep work and
incorporating restorative rest, individuals can enhance their ability to do meaningful work. The
book emphasizes that focus, not time, is the key to accomplishing important tasks.

6 Prompt Template: Under the hood


LangChain uses a prompt that takes into account the question and the context retrieved from
vectorestore. Here is an example how you can use your own with RetrievalQA
[23]: from langchain.prompts import PromptTemplate

template = """Use the provided context to respond to the question posed at the␣
↪end.

If you're unsure of the answer, please feel free to acknowledge that you don't␣
↪know rather than attempting to provide a fabricated response.

Please provide a brief and concise response.


{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
QA_CHAIN_PROMPT

[23]: PromptTemplate(input_variables=['context', 'question'], template="Use the


provided context to respond to the question posed at the end. \nIf you're unsure
of the answer, please feel free to acknowledge that you don't know rather than
attempting to provide a fabricated response.\nPlease provide a brief and concise
response.\n{context}\nQuestion: {question}\nHelpful Answer:")

Use this template to ask question to the LLM


[24]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
return_source_documents=True,
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

question = "What is a deep work"


result = qa_chain({"query": question})
print(f"Answer:\n {result['result']}")

Answer:
Deep work refers to a state of focused and uninterrupted work that allows for

13
maximum productivity and creativity. It involves eliminating distractions and
dedicating substantial time and effort to tasks that require deep concentration
and attention.
You can see that the answer is more concise than the other examples.

14

You might also like