0% found this document useful (0 votes)
314 views42 pages

Enhancing RAG Systems with LangChain

Uploaded by

luckymishra0734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
314 views42 pages

Enhancing RAG Systems with LangChain

Uploaded by

luckymishra0734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Improving Real-World RAG Systems

Key Challenges & Practical Solutions

Dipanjan (DJ) Sarkar


Head of Community & Principal AI Scientist at Analytics Vidhya
Published Author, Google Developer Expert & Cloud Champion Innovator
Slides & Code
[Link]
Understanding RAG Systems
What is a RAG System?

APIs

Response

Raw Files
User

Vector Stores
Query

Databases
RAG System Architecture - Data Indexing
RAG System Architecture - Search and Generation
RAG System Challenges & Practical Solutions
Key Failure or Pain Points in a RAG System

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Missing Content

• Missing Content means the relevant context


to answer the question is not present in the
database

• Leads to the model giving a wrong answer


and hallucinating

• End users end up being frustrated with


irrelevant or wrong responses

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Solutions for Missing Content

Better data cleaning using tools like Better prompting to constrain the Agentic RAG with search tools to get
[Link] to ensure we extract model to NOT answer the question if live information for question with no
good quality data the context is irrelevant context data

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
• Better Data Cleaning

• Better Prompting

Hands-on Demo • Agentic RAG with Tools

• Get the notebook from HERE

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Missed Top Ranked

• Missed Top Ranked means context


documents don’t appear in the top retrieval
results

• Leads to the model not able to answer the


question

• Documents to answer the question are


present but failed to get retrieved due to
poor retrieval strategy

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Not in Context

• Not in Context means documents with the


answer are present during initial retrieval
but did not make it into the final context for
generating an answer

• Bad retrieval, reranking and consolidation


strategies lead to missing out on the right
documents in context

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Not Extracted

• Not extracted means the LLM struggles to


extract the correct answer from the
provided context even if it has the answer

• This occurs when there is too much


unnecessary information, noise or
contradicting information in the context

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Incorrect Specificity

• Output response is too vague and is not


detailed or specific enough
• Vague or generic queries might lead to not
getting the right context and response

• Wrong chunking or bad retrieval can lead to


this problem

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
• Use Better Chunking Strategies

• Hyperparameter Tuning - Chunking & Retrieval

Solutions for Missed • Use Better Embedder Models


Top Ranked, Not in
Context & Incorrect • Use Advanced Retrieval Strategies
Specificity
• Use Context Compression Strategies

• Use Better Reranker Models

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Experiment with Various Chunking Strategies

Splitter Type Description

RecursiveCharacterText Recursively splits text into larger chunks based on several defined
characters. Tries to keep related pieces of text next to each other.
Splitter LangChain’s recommended way to start splitting text
CharacterTextSplitter Splits text based on a user defined character. One of the simpler text splitters

tiktoken Splits text based on tokens using trained LLM tokenizers like GPT-4
spaCy Splits text using the tokenizer from the popular NLP library - spaCy
Splits text based on tokens using trained open LLM tokenizers available
SentenceTransformers
from the popular sentence-transformers library
The unstructured library allows various splitting and chunking strategies
[Link]
including splitting text based on key sections and titles
Hyperparameter Tuning - Chunking & Retrieval

Generate
LLM
Answer

Context:
-----------

Question: Chunk Top_K=K Eval


----------- Vector DB
Size = C Sim_thresh = S Metrics
Answer
-----------

C Value K Value S Value Space


Space Space
500 5 0.2
1,000 8 0.3
2,000 10 0.5
Better Embedder Models - MTEB Leaderboard
Better Embedder Models - Experiment Yourself

Embedding
hello world -0.027 -0.001 -0.020 ....... -0.023
models
Text Text as vector

• Newer Embedder Models will be trained on more data and often better

• Don’t just go by benchmarks, use and experiment on your data

• Do not use commercial models if data privacy is important


Advanced Retrieval Strategies

• Semantic Similarity Thresholding

• Multi-query Retrieval

• Hybrid Search (Keyword + Semantic)

• Reranking

• Chained Retrieval
Better Reranker Models

• Rerankers are fine-tuned cross-encoder


transformer models

• These models take in a pair of documents


(Query, Document) and return back a
relevance score

• Models fine-tuned on more pairs and


released recently will usually be better
Context Compression Strategies

• LLM prompt-based Context


Compression
• Extractor: Filters out content from context document
not related to query
• Filter: Filters out whole context documents not
related to query

• Microsoft LLMLingua Prompt


Compression
Solutions for Missed Top Ranked, Not in Context,
Not Extracted & Incorrect Specificity

• Effect of Embedder Models

• Advanced Retrieval Strategies


Hands-on Demo
• Chained Retrieval with Rerankers

• Context Compression Strategies

• Get the notebook from HERE

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Wrong Format

• The output response is in the wrong format

• It happens when you tell the LLM to return the


response in a specific format e.g JSON and it fails to
do so

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Solutions for Wrong Format

Powerful LLMs have native support for Better Prompting and Output Parsers Structured Output Frameworks
response formats e.g OpenAI supports
JSON outputs

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Solutions for Wrong Format - Native LLM Support
Solutions for Wrong Format - Output Parsers & Better Prompting

• LangChain allows to convert the raw LLM


response into a more consumable format by
using Output Parsers.

• There exists a variety of parsers including:


• String parser
• CSV parser
• Pydantic parser
• JSON parser
Solutions for Wrong Format - Structured Output Frameworks
Solutions for Wrong Format

• Native LLM Support


Hands-on Demo • Output Parsers

• Get the notebook from HERE

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Problem: Incomplete

• Incomplete means the generated response is


incomplete

• This could be because of poorly worded questions,


lack of right context retrieved, bad reasoning Reader

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
Solutions for Incomplete

Use Better LLMs like GPT-4o, Claude 3.5 or Build Agentic Systems with Tool Use if
Gemini 1.5 necessary

Use Advanced Prompting Techniques like Rewrite User Query and Improve Retrieval -
Chain-of-Thought, Self-Consistency HyDE

Source: Seven Failure Points When Engineering a Retrieval Augmented Generation System
HyDE - Hypothetical Document Embedding
Other Practical Solutions from recent Research Papers
which actually work!
RAG vs. Long Context LLMs

• Long Context LLMs often outperform RAG but are very


expensive in terms of computing and cost

• Hybrid approach where you can use an LLM to reflect


and see if the RAG answer is good enough or route to
Long Context LLM
RAG vs Long Context LLMs - Self-Router RAG
Agentic Corrective RAG
• Step 1:
• Retrieve context documents from vector database from the input query

• Step 2:
• Use an LLM to check if retrieved documents are relevant to input
question

• Step 3:
• If all documents are relevant (Correct), no specific action needed

• Step 4:
• If some or all documents are not relevant (Ambiguous OR Incorrect),
rephrase the query and search the web to get relevant context
information

• Step 5:
• Send rephrased query and context documents or information to the
LLM for response generation

Source: Corrective Retrieval Augmented Generation; [Link]


Agentic Corrective RAG

Source: [Link]
Agentic Self-Reflection RAG

Source: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection; [Link]
Retrieval Augmented Fine-tuning (RAFT)

Source: RAFT: Adapting Language Model to Domain Specific RAG; [Link]


Recent LLMs to Check Hallucinations

• GPT-4o from OpenAI

• Lynx from PatronusAI

Source: Lynx: An Open Source Hallucination Evaluation Model; [Link]


Build an evaluation dataset and
RAG is still very much a
always evaluate your RAG
retrieval problem
system

Explore various chunking and Even with Long Context LLMs,


retrieval strategies, don’t stick RAG isn’t going anywhere (for
Key Takeaways to default settings now)

Agentic RAG systems and


domain-specific fine-tuned
RAG systems are the future

You might also like