Open In App

Knowledge Graphs for RAG

Last Updated : 10 Nov, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A Knowledge Graph in RAG (Retrieval-Augmented Generation) is a structured representation of information where entities (nodes) and their relationships (edges) are explicitly modeled. It allows a RAG system to retrieve relevant knowledge, understand context, and perform inferential reasoning, enabling the language model to generate more accurate, coherent, and explainable responses.

rag_retriever
KG Rag

RAG applications rely on retrieving relevant information to improve the quality of generated responses. Knowledge graphs bring several advantages to this process.

  1. Structured Knowledge Representation : Entities and relationships are explicitly model, making it easier to retrieve relevant information.
  2. Contextual Understanding : Knowledge Graphs capture relationships between entities, providing deeper context for responses.
  3. Inferential Reasoning : Traversing Knowledge Graphs enables RAG systems to infer information not explicitly stated.
  4. Knowledge Integration : It can combine information from multiple sources into a unified, structured format.
  5. Explainability and Transparency : The reasoning behind AI responses is clear, traceable to the graph structure.

How Knowledge Graphs Work in RAG Applications

  1. Data Collection and Preprocessing : Data is collected from various sources, cleaned, and processed to identify key entities and relationships for graph creation.
  2. Knowledge Graph Construction : Entities turn into nodes and relationships into edges, each with unique IDs and properties to enable efficient querying and reasoning.
  3. Storage in a Graph Database : The graph is stored in databases like Neo4j, enabling fast searches and traversals, with indexes speeding up keyword and property-based queries.
  4. Querying and Traversal : Queries explore the graph by following relationships, helping the system uncover hidden connections and infer new knowledge.
  5. Integration with RAG : The knowledge graph gives structured context to the language model, which can be combined with embeddings or other data to improve answer accuracy.
  6. Response Generation : Language model uses the retrieved knowledge to generate clear, fact-based responses, with the graph ensuring accuracy, context, and explainability.

Step-By-Step Implementation

Here load Wikipedia documents, split them into chunks, convert them into a knowledge graph in Neo4j, and then build a retrieval-augmented generation pipeline that queries both structured (graph) and unstructured (vector embeddings) data to answer questions.

Step 1: Environment Setup

  • Set Neo4j connection details as environment variables.
  • Ensures LangChain’s Neo4jGraph and Neo4jVector can connect automatically.
Python
import os
from google.colab import userdata

GOOGLE_API_KEY = userdata.get("YOUR_API_KEY")

os.environ["GOOGLE_API_KEY"] = "GOOGLE_API_KEY"
os.environ["NEO4J_URI"] = "YOUR_NEO4J_URI"
os.environ["NEO4J_USERNAME"] = "YOURS_NEO4J_USERNAME"
os.environ["NEO4J_PASSWORD"] = "YOUR_NEO4J_PASSWORD"

Step 2: Load and Split Documents

  • Load Wikipedia articles as raw text.
  • Split text into chunks to improve graph representation and embedding quality.
  • Each chunk can later become a Neo4j node or vector.
Python
from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import TokenTextSplitter

raw_docs = WikipediaLoader(query="Elizabeth I").load()

splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
documents = splitter.split_documents(raw_docs[:3])

Step 3: Initialize LLM and Graph Transformer

  • Use ChatGoogleGenerativeAI for all LLM operations.
  • Convert document chunks into graph-compatible documents using LLMGraphTransformer.
  • Graph documents will later be added to Neo4j.
Python
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_community.graphs import Neo4jGraph

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash", 
    temperature=0
)

llm_transformer = LLMGraphTransformer(llm=llm)
graph_docs = llm_transformer.convert_to_graph_documents(documents)

graph = Neo4jGraph()
graph.add_graph_documents(graph_docs, baseEntityLabel=True, include_source=True)

Step 4: Create Vector Index

  • Build a hybrid semantic search index over the Neo4j graph using embeddings.
  • This allows retrieval from both graph nodes and semantic similarity.
  • Use GoogleGenerativeAIEmbeddings instead of OpenAI embeddings.
Python
from langchain_community.vectorstores import Neo4jVector
from langchain_google_genai import GoogleGenerativeAIEmbeddings

vector_index = Neo4jVector.from_existing_graph(
    GoogleGenerativeAIEmbeddings(),
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

Step 5: Entity Extraction

  • Define structured output model to extract persons and organizations.
  • Use ChatGoogleGenerativeAI via a prompt chain for extraction.
  • Ensures consistent format for entities to use in structured queries.
Python
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate

class Entities(BaseModel):
    names: list[str] = Field(..., description="Person/organization entities in text")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are extracting organization and person entities from the text."),
    ("human", "Use the given format to extract info: {question}"),
])

entity_chain = prompt | llm.with_structured_output(Entities)

Step 6: Structured Retrieval

  • Generate full-text search queries safely by removing special characters.
  • Retrieve entities and their relationships from Neo4j graph using fulltext.queryNodes.
  • Each entity is expanded with connections (!MENTIONS) for context.
Python
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars

def generate_full_text_query(input: str) -> str:
    words = [el for el in remove_lucene_chars(input).split() if el]
    return " AND ".join(f"{w}~2" for w in words)

def structured_retriever(question: str) -> str:
    result = ""
    entities = entity_chain.invoke({"question": question}).names
    for entity in entities:
        resp = graph.query("""
            CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})
            YIELD node, score
            CALL {
              WITH node
              MATCH (node)-[r:!MENTIONS]->(neighbor)
              RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
              UNION ALL
              WITH node
              MATCH (node)<-[r:!MENTIONS]-(neighbor)
              RETURN neighbor.id + ' - ' + type(r) + ' -> ' + node.id AS output
            }
            RETURN output LIMIT 50
        """, {"query": generate_full_text_query(entity)})
        result += "\n".join([el['output'] for el in resp])
    return result

Step 7: Combined Retriever

  • Combine structured graph results with unstructured vector search results.
  • Provides complete context for the RAG model.
  • Keeps results in a clear format for LLM consumption.
Python
def retriever(question: str):
    structured = structured_retriever(question)
    unstructured = [d.page_content for d in vector_index.similarity_search(question)]
    return f"Structured data:\n{structured}\nUnstructured data:\n{'#Document '.join(unstructured)}"

Step 8: Condense Follow-Up Questions

  • Check if chat history exists.
  • Convert follow-up questions into standalone questions using LLM.
  • Ensures context is preserved for multi-turn conversations.
Python
from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnableBranch, RunnablePassthrough
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.output_parsers import StrOutputParser

CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(
    """Given chat history and follow-up question, rewrite it as a standalone question.
    Chat History: {chat_history}
    Follow Up Input: {question}
    Standalone question:"""
)

def _format_chat_history(chat_history):
    buffer = []
    for human, ai in chat_history:
        buffer.append(HumanMessage(content=human))
        buffer.append(AIMessage(content=ai))
    return buffer

_search_query = RunnableBranch(
    (RunnableLambda(lambda x: bool(x.get("chat_history"))),
     RunnablePassthrough.assign(chat_history=lambda x: _format_chat_history(x["chat_history"]))
     | CONDENSE_QUESTION_PROMPT | llm | StrOutputParser()),
    RunnableLambda(lambda x: x["question"])
)

Step 9: RAG QA Chain

  • Retrieve context using _search_query and combined retriever.
  • Generate concise answer with ChatGoogleGenerativeAI.
  • Supports multi-turn conversations with condensed questions.
Python
from langchain_core.prompts.prompt import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel

template = ChatPromptTemplate.from_template(
    """Answer the question based only on the following context:
    {context}
    Question: {question}
    Use natural language and be concise.
    Answer:"""
)

chain = RunnableParallel({"context": _search_query | retriever, "question": RunnablePassthrough()}) | template | llm | StrOutputParser()

# Example queries
print(chain.invoke({"question": "Which house did Elizabeth I belong to?"}))
print(chain.invoke({
    "question": "When was she born?", 
    "chat_history": [("Which house did Elizabeth I belong to?", "House Of Tudor")]
}))

Output:

KG_RAG
Output

You can download full code from here.

Applications

  • Question Answering Systems : Retrieve precise answers by traversing entities and relationships.
  • Context-Aware Summarization : Generate summaries that incorporate connected knowledge, not just isolated facts.
  • Inferential Reasoning : Deduce implicit information by analyzing relationships between nodes.
  • Explainable AI : Provide transparent reasoning paths for AI-generated responses.
  • Multi-Source Knowledge Integration : Combine information from different datasets into a unified graph.
  • Domain Specific Assistants : Support specialized applications like finance, healthcare, or legal systems by modeling entities and relationships relevant to the domain.

Challenges and Limitations

While integrating Knowledge Graphs into RAG has challenges:

  • Scalability: Managing large graphs can be resource-intensive.
  • Entity Disambiguation: Similar entity names cause confusion.
  • Graph Maintenance: Updating relationships in real time is complex.
  • Integration Complexity: Combining multiple databases and APIs can slow development.
  • LLM Alignment: Ensuring LLM correctly interprets graph structure remains an active research area.

Difference Between RAG and Knowledge Graph RAG

Here we compare Traditional RAG with Knowledge Graph RAG.

Parameters

RAG (Retrieval-Augmented Generation)

Knowledge Graph Enhanced RAG

Core Approach

Retrieve information from unstructured text and generate responses

Combine Knowledge Graphs with RAG to enable structured

Knowledge Type

Works with unstructured documents or text chunks

Uses both structured and unstructured information

Retrieval Method

Performs vector similarity search using embeddings

Uses hybrid retrieval with graph queries and vector search

Data Representation

Stores text in chunks without explicit relationships

Represents data as nodes and edges

Context Understanding

Limited to text similarity and surface-level meaning

Captures deep semantic and relational context between entities.

Reasoning Capability

Retrieves facts but cannot infer new relationships

Enables multi-hop reasoning and logical inference across connected nodes


Explore