Building Agentic RAG System using LlamaIndex
Last Updated :
01 Sep, 2025
We are building an Agentic RAG using LlamaIndex which is a system that allows an autonomous agent to retrieve relevant information from a set of documents and generate accurate responses. This system combines the retrieval capabilities of LlamaIndex with the reasoning and decision-making capabilities of agents. Here we will use:
- Agentic RAG: A system where the agent can autonomously decide how to retrieve and generate answers using multiple sources/tools.
- LlamaIndex: A Python framework for building vector-based knowledge indices, allowing LLMs to retrieve relevant information from documents.
Working of Agentic RAG System
Let's see how our system will be working:
Workflow of Agentic RAG System1. User Query: Everything starts with a user question. The query flows into the central component i.e the Agent.
2. The Agent: The Agent acts as the "brain" of the system. Its job is to analyze the query and decide which specialized tool should handle different parts of the request. Instead of a fixed path, the agent makes dynamic decisions based on what the query needs.
3. Decision-Making and Tools: Depending on the query, the agent can choose between several tools:
- DocumentRetriever Tool: Finds and fetches relevant documents for context.
- Calculator Tool: Handles mathematical or computational questions.
- Wikipedia Tool: Searches for factual knowledge directly from Wikipedia.
The agent can also call tools multiple timesor use a combination, depending on the task complexity.
4. LlamaIndex Query Engine: Some tools like DocumentRetriever or Calculator Tool, feed their results into the LlamaIndex Query Engine (a specialized search and synthesis engine). LlamaIndex processes and combines information from those tools to create a detailed and accurate answer.
5. Final Output: Once the agent is satisfied with the results, it sends the answer back to the user.
Note: Instead of a simple pipeline, this system lets the agent make smart, context-aware decisions about which tools or data sources to use and when to use, mimicking reasoning and planning making it an Agentic system
Step-by-Step Implementation
Let's build our Agentic RAG system which uses Llama-index:
Step 1: Install Dependencies
We will install the required packages and libraries for our system,
- llama-index: For document retrieval and embeddings.
- langchain: For agent and tool management.
- langchain_community: Required for ChatOpenAI in LangChain 0.3.x.
- openai: For LLM API access.
- wikipedia: Optional tool for agent to search Wikipedia.
Python
!pip install llama-index==0.9.41 langchain==0.3.27 langchain_community openai==1.101.0 wikipedia
Step 2: Upload Documents and OpenAI API Key
We will upload some documents and files which our model can use. Files we are using here can be dowloded from here.
- Creates a docs/ folder to store our knowledge documents.
- Users can upload .txt files.
- Example content can include notes, articles or any text relevant to queries.
To know how to extract OpenAI API key refer to: How to find and Use API Key of OpenAI.
Python
import os
from google.colab import files
os.makedirs("docs", exist_ok=True)
uploaded = files.upload()
for filename in uploaded.keys():
os.rename(filename, f"docs/{filename}")
print("Uploaded files:", os.listdir("docs"))
os.environ["OPENAI_API_KEY"] = "your_key_here"
Step 3: Import Libraries
We will import the required libraries for system,
- SimpleDirectoryReader: Load documents.
- GPTVectorStoreIndex: Create vector-based index for retrieval.
- LLMPredictor & ServiceContext: Wrap LLM for LlamaIndex.
- ChatOpenAI: OpenAI GPT model for text generation.
- Tool, initialize_agent, AgentType: Build agentic reasoning system.
- ConversationBufferMemory: Maintain past conversation context for agent.
- wikipedia: Tool for retrieving general knowledge.
Python
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, ServiceContext
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.memory import ConversationBufferMemory
import wikipedia
Step 4: Build the LlamaIndex Retrieval System
We will build the LlamaIndex Retrieval System in which,
- SimpleDirectoryReader: Reads all uploaded documents.
- LLMPredictor: Wraps GPT-3.5-turbo to work with LlamaIndex.
- GPTVectorStoreIndex: Converts documents into embeddings stored in a vector store.
- query_engine: Returns top 3 most relevant documents for any query.
Python
documents = SimpleDirectoryReader("docs/").load_data()
llm_predictor = LLMPredictor(llm=ChatOpenAI(
temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
index = GPTVectorStoreIndex.from_documents(
documents, service_context=service_context)
query_engine = index.as_query_engine(
retriever_mode="default", similarity_top_k=3)
We will define the tools that are available for the agent to use:
- DocumentRetriever: Uses LlamaIndex to fetch relevant docs.
- Calculator: Handles numeric queries.
- Wikipedia: Fetches general knowledge not in uploaded docs.
- Tools: Each tool is callable by the agent automatically.
Python
def retrieve_docs(query: str) -> str:
response = query_engine.query(query)
return str(response)
def calculator(query: str) -> str:
try:
return str(eval(query))
except:
return "Cannot calculate that."
def wiki_search(query: str) -> str:
try:
return wikipedia.summary(query, sentences=2)
except:
return "No Wikipedia info found."
tools = [
Tool(name="DocumentRetriever", func=retrieve_docs,
description="Retrieve answers from uploaded documents."),
Tool(name="Calculator", func=calculator,
description="Solve mathematical expressions."),
Tool(name="Wikipedia", func=wiki_search,
description="Get Wikipedia summaries.")
]
Step 6: Initialize Agent with Memory
We will initialize ConversationBufferMemory which is a short-term, in-session memory to the agent:
- ConversationBufferMemory: Keeps track of past queries and responses.
- AgentType.ZERO_SHOT_REACT_DESCRIPTION: Agent decides which tool to call without pre-training.
- verbose=True: Shows reasoning steps in the output.
Python
memory = ConversationBufferMemory(memory_key="chat_history")
agent = initialize_agent(
tools=tools,
llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
memory=memory,
verbose=True
)
Step 7: Run the System
We run our system in which:
- Users can interact with the agent.
- Agent decides which tool to use, retrieves relevant info and generates answers.
- Supports document queries, math calculations and Wikipedia search.
Python
print("Agentic RAG is ready! Type 'exit' to stop.")
while True:
query = input("You: ")
if query.lower() in ["exit", "quit"]:
break
response = agent.run(query)
print("Agent:", response)
Output:
OutputComplete source code can be downloaded from here.
Advantages
Let's see the advantages which our system holds:
- Autonomous Reasoning: Agent decides which tool to use for each query.
- Accurate Responses: LlamaIndex retrieves relevant documents before generating answers.
- Multi-Tool Support: Can handle document retrieval, calculations and Wikipedia queries.
- Context-Aware: Conversation memory allows follow-up questions.
- Scalable & Modular: Tools and knowledge sources can be added or updated easily.
- User-Friendly: Generates natural language answers interactively.
Explore
Introduction to AI
AI Concepts
Machine Learning in AI
Robotics and AI
Generative AI
AI Practice