RAG and LangChain
RAG and LangChain
December 7, 2023
1 Install libs
[ ]: !pip install langchain
!pip install pypdf
!pip install openai
2 Loading PDFs
[4]: from langchain.document_loaders import PyPDFLoader
loaders = [
# Duplicate documents on purpose - messy data
PyPDFLoader(pdf1),
PyPDFLoader(pdf1),
PyPDFLoader(pdf2),
]
docs = []
for i, loader in enumerate(loaders):
pages = loader.load()
print(f"For doc = {i}, number of pages: {len(pages)}")
docs.extend(loader.load())
1
For doc = 2, number of pages: 19
length of docs 35
3 Chunking documents
[5]: from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1500,
chunk_overlap = 150,
separators=['. ']
)
chunks = text_splitter.split_documents(docs)
len(chunks)
[5]: 80
vectordb = Chroma.from_documents(
documents=chunks,
embedding=embedding,
persist_directory=persist_directory
)
[13]: print(vectordb._collection.count())
80
5 RetrieverQA
5.1 Retriever
Simple retriever to test our question/vectorestore
2
[14]: question = "What is a deep work"
docs_similarity_search = vectordb.similarity_search(question, k=3)
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
5.4 1- Base chain: Include the whole context in the query to the LLM
By default, the base chain is stuff
It processes a list of documents (in our case 4) by combining them into a single prompt and then
submits that combined prompt to a language model.
It’s well-suited for applications where documents are small.
3
[16]: from langchain.chains import RetrievalQA
Base retriever
[19]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
return_source_documents=True,
)
Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It is a term coined by author and professor Cal
Newport in his book "Deep Work: Rules for Focused Success in a Distracted
World." Deep work involves eliminating distractions, such as social media or
constant interruptions, and dedicating uninterrupted time to work on tasks that
require intense focus and cognitive effort. The goal of deep work is to maximize
productivity, creativity, and the quality of work output.
If you take a closer look at the result object: we have 3 keys: * Query * Result * Source_documents:
which contain the context from the retriever
[20]: # if we take more closer look on the "source_documents" ==> There are 4␣
↪documents
result
for doc in result['source_documents']:
print(doc.page_content[:200], f"==> metadata = {doc.metadata}\n")
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
4
We've all heard the phrase, "work smarter, not harder." It's a big
adjustment to make, because we've put so much value into working
longer hours. Just because you're spending more time at the office,
==> metadata = {'page': 4, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
One can see that there are redundants documents, that you don’t want to pass through the LLM,
you’ll pay for it. We can avoid this by using MMR retriever as explained in the last posted notebook,
which gives more diversified chunks to use in the context.
MMR retriever
[21]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(search_type = "mmr"),
return_source_documents=True,
)
Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It is the ability to work in a state of flow, where
one can fully immerse themselves in their work and produce high-quality,
valuable output. Deep work requires eliminating distractions, such as social
media or interruptions, and dedicating uninterrupted time to engage in intense
cognitive activities. It is contrasted with shallow work, which consists of low-
value, easily replicable tasks that can be done while distracted. Deep work is
considered crucial for producing meaningful and impactful work.
[22]: # if we take more closer look on the "source_documents" ==> There are 4␣
↪documents
result
for doc in result['source_documents']:
print(doc.page_content[:200], f"==> metadata = {doc.metadata}\n")
All of the best, and most creative work, emerges from a state of clear
focus and careful attention. So, perhaps deep work, along with restorative
rest is just the antidote we need. Deep Work is a guid ==> metadata = {'page':
7, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
5
{'page': 1, 'source':
'/drive/MyDrive/02-Articles_ChatGPT/03_notebooks/data/Deep_Work_summary.pdf'}
I let you compare the LLM’s answer from both queries: result[‘result’]
Answer:
Deep work refers to the ability to focus without distraction on a cognitively
demanding task. It is a state of flow where you can fully immerse yourself in
your work and produce high-quality and valuable output. Deep work requires
extended periods of uninterrupted concentration and intense focus, allowing you
to push your cognitive abilities to their limits. Unlike shallow work, which
consists of mundane and easily replicable tasks, deep work involves tackling
complex problems, generating new ideas, and producing meaningful work that
requires deep thinking and creativity.
Base chain:
Deep work refers to a state of focused and uninterrupted concentration on a cognitively demanding
task. It is the ability to work in a state of flow, where one can fully immerse themselves in their
work and produce high-quality, valuable output. Deep work requires eliminating distractions, such
as social media or interruptions, and dedicating uninterrupted time to engage in intense cognitive
activities. It is contrasted with shallow work, which consists of low-value, easily replicable tasks
that can be done while distracted. Deep work is considered crucial for producing meaningful and
impactful work.
Stuff
6
Deep work refers to the ability to focus without distraction on a cognitively demanding task. It
is a state of flow where you can fully immerse yourself in your work and produce high-quality and
valuable output. Deep work requires extended periods of uninterrupted concentration and intense
focus, allowing you to push your cognitive abilities to their limits. Unlike shallow work, which
consists of mundane and easily replicable tasks, deep work involves tackling complex problems,
generating new ideas, and producing meaningful work that requires deep thinking and creativity.
==> we have almost the same results
*********************
>\<summary of question made to doc 1\>
>\<summary of question made to doc 2\>
>\<summary of question made to doc 3\>
>\<summary of question made to doc 4\>
*********************
7
[25]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
chain_type="map_reduce"
)
Answer:
Deep work refers to a state of focused and uninterrupted concentration on a
cognitively demanding task. It involves working on a task without any
distractions or interruptions, allowing for maximum productivity and high-
quality output. Deep work requires a state of flow, where the individual is
fully immersed in the task at hand and able to work at their highest level of
cognitive ability. This type of work is often associated with creativity,
problem-solving, and producing high-value work.
In the result object ==> no source_documents ==> only the answer
[26]: result
8
chain_type="map_reduce"
)
Answer:
Deep work refers to the ability to focus without distraction on a cognitively
demanding task. It is a state of flow where one can fully engage in meaningful
work, free from interruptions and distractions. Deep work requires intense
concentration and can lead to high-quality outputs and significant progress in
one's work.
cons of map_reduce ==>
When using map_reduce, since we send each chunk separately to the LLM, there is a possibility
that our question’s answer might be divided between 2 different chuncks (at the end of one chunk
and the beginning of another). This could result in the LLM being unable to find a relevant answer,
leading to responses like “I don’t know”…
Model ouput:
ASSISTANT: answer1
SECOND CALL: Second a sequence of messages, that contained the former answer from the
model:
HUMAN: “What is a deep work”
AI (could ne assistant role): answer1
9
HUMAN (could be system role): We have the opportunity to refine the existing answer (only if
needed) with some more context below.
****<doc2>****
Given the new context, refine the original answer to better answer the question. if the context isn’t
useful, return the original answer.
Model output:
ASSISTANT: answer2
THIRD CALL: third sequence of messages, that contained the former answer from the model:
HUMAN: “What is a deep work”
AI (could be assistant role): answer2
HUMAN (could be system role): We have the opportunity to refine the existing answer (only if
needed) with some more context below.
****<doc3>****
Given the new context, refine the original answer to better answer the question. if the context isn’t
useful, return the original answer.
Model output:
ASSISTANT: answer3
…
[5]: from IPython import display
display.Image(path_image)
#source from LangChain documentation
[5]:
10
retriever=vectordb.as_retriever(),
chain_type="refine"
)
Answer:
Deep work, as described by Cal Newport in his book "Deep Work," is a concept
that emphasizes the importance of focused attention and eliminating distractions
to produce high-quality and creative work. It encourages individuals to work
smarter rather than harder by prioritizing deep, concentrated work over shallow,
easily interruptible tasks. Newport provides practical tips to boost focus and
productivity, such as making deep work a routine, scheduling dedicated time for
it, finding a distraction-free environment, and practicing digital minimalism.
By incorporating deep work into their routine and creating a dedicated space,
individuals can enhance their ability to produce meaningful work and maximize
their output.
Refine gives better answer than map reduce. This is because we incorporate at each call the answers
coming from the previous context, which transfers information through the chain.
11
[32]: qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(),
chain_type="map_rerank"
)
/usr/local/lib/python3.10/dist-packages/langchain/chains/llm.py:344:
UserWarning: The apply_and_parse method is deprecated, instead pass an output
parser directly to LLMChain.
warnings.warn(
Answer:
Deep Work is a guide that helps individuals regain control of their time,
eliminate distractions, and improve their overall focus. It emphasizes the
importance of clear focus and careful attention in producing the best and most
creative work. Deep Work suggests that by practicing deep work and incorporating
restorative rest, individuals can enhance their ability to do meaningful work.
The book emphasizes that focus, not time, is the key to accomplishing important
tasks.
Here are the different results:
map_reduce: base retriever:
Deep work refers to a state of focused and uninterrupted concentration on a cognitively demanding
task. It involves working on a task without any distractions or interruptions, allowing for maximum
productivity and high-quality output. Deep work requires a state of flow, where the individual is
fully immersed in the task at hand and able to work at their highest level of cognitive ability. This
type of work is often associated with creativity, problem-solving, and producing high-value work.
map_reduce: MMR:
Deep work refers to the ability to focus without distraction on a cognitively demanding task. It
is a state of flow where one can fully engage in meaningful work, free from interruptions and
distractions. Deep work requires intense concentration and can lead to high-quality outputs and
significant progress in one’s work.
refine:
Deep work, as described by Cal Newport in his book “Deep Work,” is a concept that emphasizes the
importance of focused attention and eliminating distractions to produce high-quality and creative
work. It encourages individuals to work smarter rather than harder by prioritizing deep, concen-
trated work over shallow, easily interruptible tasks. Newport provides practical tips to boost focus
and productivity, such as making deep work a routine, scheduling dedicated time for it, finding
a distraction-free environment, and practicing digital minimalism. By incorporating deep work
into their routine and creating a dedicated space, individuals can enhance their ability to produce
meaningful work and maximize their output.
12
map_rerank:
Deep Work is a guide that helps individuals regain control of their time, eliminate distractions, and
improve their overall focus. It emphasizes the importance of clear focus and careful attention in
producing the best and most creative work. Deep Work suggests that by practicing deep work and
incorporating restorative rest, individuals can enhance their ability to do meaningful work. The
book emphasizes that focus, not time, is the key to accomplishing important tasks.
template = """Use the provided context to respond to the question posed at the␣
↪end.
If you're unsure of the answer, please feel free to acknowledge that you don't␣
↪know rather than attempting to provide a fabricated response.
Answer:
Deep work refers to a state of focused and uninterrupted work that allows for
13
maximum productivity and creativity. It involves eliminating distractions and
dedicating substantial time and effort to tasks that require deep concentration
and attention.
You can see that the answer is more concise than the other examples.
14