Building Agentic RAG System using LlamaIndex

Last Updated : 01 Sep, 2025

We are building an Agentic RAG using LlamaIndex which is a system that allows an autonomous agent to retrieve relevant information from a set of documents and generate accurate responses. This system combines the retrieval capabilities of LlamaIndex with the reasoning and decision-making capabilities of agents. Here we will use:

Agentic RAG: A system where the agent can autonomously decide how to retrieve and generate answers using multiple sources/tools.
LlamaIndex: A Python framework for building vector-based knowledge indices, allowing LLMs to retrieve relevant information from documents.

Working of Agentic RAG System

Let's see how our system will be working:

workflow — Workflow of Agentic RAG System

1. User Query: Everything starts with a user question. The query flows into the central component i.e the Agent.

2. The Agent: The Agent acts as the "brain" of the system. Its job is to analyze the query and decide which specialized tool should handle different parts of the request. Instead of a fixed path, the agent makes dynamic decisions based on what the query needs.

3. Decision-Making and Tools: Depending on the query, the agent can choose between several tools:

DocumentRetriever Tool: Finds and fetches relevant documents for context.
Calculator Tool: Handles mathematical or computational questions.
Wikipedia Tool: Searches for factual knowledge directly from Wikipedia.

The agent can also call tools multiple timesor use a combination, depending on the task complexity.

4. LlamaIndex Query Engine: Some tools like DocumentRetriever or Calculator Tool, feed their results into the LlamaIndex Query Engine (a specialized search and synthesis engine). LlamaIndex processes and combines information from those tools to create a detailed and accurate answer.

5. Final Output: Once the agent is satisfied with the results, it sends the answer back to the user.

Note: Instead of a simple pipeline, this system lets the agent make smart, context-aware decisions about which tools or data sources to use and when to use, mimicking reasoning and planning making it an Agentic system

Step-by-Step Implementation

Let's build our Agentic RAG system which uses Llama-index:

Step 1: Install Dependencies

We will install the required packages and libraries for our system,

llama-index: For document retrieval and embeddings.
langchain: For agent and tool management.
langchain_community: Required for ChatOpenAI in LangChain 0.3.x.
openai: For LLM API access.
wikipedia: Optional tool for agent to search Wikipedia.

Python

!pip install llama-index==0.9.41 langchain==0.3.27 langchain_community openai==1.101.0 wikipedia

Step 2: Upload Documents and OpenAI API Key

We will upload some documents and files which our model can use. Files we are using here can be dowloded from here.

Creates a docs/ folder to store our knowledge documents.
Users can upload .txt files.
Example content can include notes, articles or any text relevant to queries.

To know how to extract OpenAI API key refer to: How to find and Use API Key of OpenAI.

Python

import os
from google.colab import files

os.makedirs("docs", exist_ok=True)

uploaded = files.upload()
for filename in uploaded.keys():
    os.rename(filename, f"docs/{filename}")

print("Uploaded files:", os.listdir("docs"))

os.environ["OPENAI_API_KEY"] = "your_key_here"

Step 3: Import Libraries

We will import the required libraries for system,

SimpleDirectoryReader: Load documents.
GPTVectorStoreIndex: Create vector-based index for retrieval.
LLMPredictor & ServiceContext: Wrap LLM for LlamaIndex.
ChatOpenAI: OpenAI GPT model for text generation.
Tool, initialize_agent, AgentType: Build agentic reasoning system.
ConversationBufferMemory: Maintain past conversation context for agent.
wikipedia: Tool for retrieving general knowledge.

Python

from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, LLMPredictor, ServiceContext
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.memory import ConversationBufferMemory
import wikipedia

Step 4: Build the LlamaIndex Retrieval System

We will build the LlamaIndex Retrieval System in which,

SimpleDirectoryReader: Reads all uploaded documents.
LLMPredictor: Wraps GPT-3.5-turbo to work with LlamaIndex.
GPTVectorStoreIndex: Converts documents into embeddings stored in a vector store.
query_engine: Returns top 3 most relevant documents for any query.

Python

documents = SimpleDirectoryReader("docs/").load_data()

llm_predictor = LLMPredictor(llm=ChatOpenAI(
    temperature=0, model_name="gpt-3.5-turbo"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

index = GPTVectorStoreIndex.from_documents(
    documents, service_context=service_context)

query_engine = index.as_query_engine(
    retriever_mode="default", similarity_top_k=3)

Step 5: Define Tools for Agent

We will define the tools that are available for the agent to use:

DocumentRetriever: Uses LlamaIndex to fetch relevant docs.
Calculator: Handles numeric queries.
Wikipedia: Fetches general knowledge not in uploaded docs.
Tools: Each tool is callable by the agent automatically.

Python

def retrieve_docs(query: str) -> str:
    response = query_engine.query(query)
    return str(response)

def calculator(query: str) -> str:
    try:
        return str(eval(query))
    except:
        return "Cannot calculate that."

def wiki_search(query: str) -> str:
    try:
        return wikipedia.summary(query, sentences=2)
    except:
        return "No Wikipedia info found."

tools = [
    Tool(name="DocumentRetriever", func=retrieve_docs,
         description="Retrieve answers from uploaded documents."),
    Tool(name="Calculator", func=calculator,
         description="Solve mathematical expressions."),
    Tool(name="Wikipedia", func=wiki_search,
         description="Get Wikipedia summaries.")
]

Step 6: Initialize Agent with Memory

We will initialize ConversationBufferMemory which is a short-term, in-session memory to the agent:

ConversationBufferMemory: Keeps track of past queries and responses.
AgentType.ZERO_SHOT_REACT_DESCRIPTION: Agent decides which tool to call without pre-training.
verbose=True: Shows reasoning steps in the output.

Python

memory = ConversationBufferMemory(memory_key="chat_history")

agent = initialize_agent(
    tools=tools,
    llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"),
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)

Step 7: Run the System

We run our system in which:

Users can interact with the agent.
Agent decides which tool to use, retrieves relevant info and generates answers.
Supports document queries, math calculations and Wikipedia search.

Python

print("Agentic RAG is ready! Type 'exit' to stop.")

while True:
    query = input("You: ")
    if query.lower() in ["exit", "quit"]:
        break
    response = agent.run(query)
    print("Agent:", response)

Output:

Complete source code can be downloaded from here.

Advantages

Let's see the advantages which our system holds:

Autonomous Reasoning: Agent decides which tool to use for each query.
Accurate Responses: LlamaIndex retrieves relevant documents before generating answers.
Multi-Tool Support: Can handle document retrieval, calculations and Wikipedia queries.
Context-Aware: Conversation memory allows follow-up questions.
Scalable & Modular: Tools and knowledge sources can be added or updated easily.
User-Friendly: Generates natural language answers interactively.

mohammap46h

Improve

Article Tags :

Building Agentic RAG System using LlamaIndex

Working of Agentic RAG System

Step-by-Step Implementation

Step 1: Install Dependencies

Step 2: Upload Documents and OpenAI API Key

Step 3: Import Libraries

Step 4: Build the LlamaIndex Retrieval System

Step 5: Define Tools for Agent

Step 6: Initialize Agent with Memory

Step 7: Run the System

Advantages

Explore

Introduction to AI

AI Concepts

Machine Learning in AI

Robotics and AI

Generative AI

AI Practice

Thank You!

What kind of Experience do you want to share?