GPT Index Readthedocs Io en Latest
GPT Index Readthedocs Io en Latest
Jerry Liu
1 Ecosystem 3
2 Context 5
3 Proposed Solution 7
Index 269
i
ii
LlamaIndex
LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM’s with external data.
• Github: https://github.com/jerryjliu/llama_index
• PyPi:
– LlamaIndex: https://pypi.org/project/llama-index/.
– GPT Index (duplicate): https://pypi.org/project/gpt-index/.
• Twitter: https://twitter.com/gpt_index
• Discord https://discord.gg/dGcwcsnxhU
GETTING STARTED 1
LlamaIndex
2 GETTING STARTED
CHAPTER
ONE
ECOSYSTEM
• LlamaHub: https://llamahub.ai
• LlamaLab: https://github.com/run-llama/llama-lab
1.1 Overview
3
LlamaIndex
4 Chapter 1. Ecosystem
CHAPTER
TWO
CONTEXT
• LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained
on large amounts of publicly available data.
• How do we best augment LLMs with our own private data?
• One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the
input prompt. That way, we take advantage of the LLM’s reasoning capabilities to generate a response.
To perform LLM’s data augmentation in a performant, efficient, and cheap manner, we need to solve two components:
• Data Ingestion
• Data Indexing
5
LlamaIndex
6 Chapter 2. Context
CHAPTER
THREE
PROPOSED SOLUTION
That’s where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and
LLMs. It provides the following tools in an easy-to-use fashion:
• Offers data connectors to your existing data sources and data formats (API’s, PDF’s, docs, SQL, etc.)
• Provides indices over your unstructured and structured data for use with LLM’s. These indices help to abstract
away common boilerplate and pain points for in-context learning:
– Storing context in an easy-to-access format for prompt insertion.
– Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
– Dealing with text splitting.
• Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented
output.
• Offers you a comprehensive toolset trading off cost and performance.
7
LlamaIndex
By default, we use the OpenAI GPT-3 text-davinci-003 model. In order to use this, you must have an OPE-
NAI_API_KEY setup. You can register an API key by logging into OpenAI’s page and creating a new API token.
You can customize the underlying LLM in the Custom LLMs How-To (courtesy of Langchain). You may need additional
environment keys + tokens setup depending on the LLM provider.
Here is a starter example for using LlamaIndex. Make sure you’ve followed the installation steps first.
3.2.1 Download
LlamaIndex examples can be found in the examples folder of the LlamaIndex repository. We first want to download
this examples folder. An easy way to do this is to just clone the repo:
$ cd llama_index
$ ls
LICENSE data_requirements.txt tests/
MANIFEST.in examples/ pyproject.toml
Makefile experimental/ requirements.txt
README.md llama_index/ setup.py
$ cd examples/paul_graham_essay
This contains LlamaIndex examples around Paul Graham’s essay, “What I Worked On”. A comprehensive set of
examples are already provided in TestEssay.ipynb. For the purposes of this tutorial, we can focus on a simple
example of getting LlamaIndex up and running.
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)
This builds an index over the documents in the data folder (which in this case just consists of the essay text). We then
run the following
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
You should get back a response similar to the following: The author wrote short stories and tried to
program on an IBM 1401.
In a Jupyter notebook, you can view info and/or debugging logging using the following snippet:
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
You can set the level to DEBUG for verbose output, or use level=logging.INFO for less.
index.storage_context.persist()
That’s it! For more information on LlamaIndex features, please check out the numerous “Guides” to the left. If you are
interested in further exploring how LlamaIndex works, check out our Primer Guide.
Additionally, if you would like to play around with Example Notebooks, check out this link.
At its core, LlamaIndex contains a toolkit designed to easily connect LLM’s with your external data. LlamaIndex helps
to provide the following:
• A set of data structures that allow you to index your data for various LLM tasks, and remove concerns over
prompt size limitations.
• Data connectors to your common data sources (Google Docs, Slack, etc.).
• Cost transparency + tools that reduce cost while increasing performance.
Each data structure offers distinct use cases and a variety of customizable parameters. These indices can then be queried
in a general purpose manner, in order to achieve any task that you would typically achieve with an LLM:
• Question-Answering
• Summarization
• Text Generation (Stories, TODO’s, emails, etc.)
• and more!
The guides below are intended to help you get the most out of LlamaIndex. It gives a high-level overview of the
following:
1. The general usage pattern of LlamaIndex.
2. Mapping Use Cases to LlamaIndex data Structures
3. How Each Index Works
1. Load in Documents
The first step is to load in data. This data is represented in the form of Document objects. We provide a variety of data
loaders which will load in Documents through the load_data function, e.g.:
documents = SimpleDirectoryReader('data').load_data()
You can also choose to construct documents manually. LlamaIndex exposes the Document struct.
A Document represents a lightweight container around the data source. You can now choose to proceed with one of
the following steps:
1. Feed the Document object directly into the index (see section 3).
2. First convert the Document into Node objects (see section 2).
The next step is to parse these Document objects into Node objects. Nodes represent “chunks” of source Documents,
whether that is a text chunk, an image, or more. They also contain metadata and relationship information with other
nodes and index structures.
Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may
also choose to “parse” source Documents into Nodes through our NodeParser classes.
For instance, you can do
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)
You can also choose to construct Node objects manually and skip the first section. For instance,
3. Index Construction
We can now build an index over these Document objects. The simplest high-level abstraction is to load-in the Document
objects during index initialization (this is relevant if you came directly from step 1 and skipped step 2).
index = GPTVectorStoreIndex.from_documents(documents)
You can also choose to build an index over a set of Node objects directly (this is a continuation of step 2).
index = GPTVectorStoreIndex(nodes)
Depending on which index you use, LlamaIndex may make LLM calls in order to build the index.
If you have multiple Node objects defined, and wish to share these Node objects across multiple index structures, you
can do that. Simply instantiate a StorageContext object, add the Node objects to the underlying DocumentStore, and
pass the StorageContext around.
storage_context = StorageContext.from_defaults()
(continues on next page)
NOTE: If the storage_context argument isn’t specified, then it is implicitly created for each index during index
construction. You can access the docstore associated with a given index through index.storage_context.
You can also take advantage of the insert capability of indices to insert Document objects one at a time instead of
during index construction.
from llama_index import GPTVectorStoreIndex
index = GPTVectorStoreIndex([])
for doc in documents:
index.insert(doc)
If you want to insert nodes on directly you can use insert_nodes function instead.
# nodes: Sequence[Node]
index = GPTVectorStoreIndex([])
index.insert_nodes(nodes)
See the Update Index How-To for details and an example notebook.
Customizing LLM’s
By default, we use OpenAI’s text-davinci-003 model. You may choose to use another LLM when constructing an
index.
from llama_index import LLMPredictor, GPTVectorStoreIndex, PromptHelper, ServiceContext
from langchain import OpenAI
...
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
index = GPTVectorStoreIndex.from_documents(
documents, service_context=service_context
)
Customizing Prompts
Depending on the index used, we used default prompt templates for constructing the index (and also insertion/querying).
See Custom Prompts How-To for more details on how to customize your prompt.
Customizing embeddings
For embedding-based indices, you can choose to pass in a custom embedding model. See Custom Embeddings How-To
for more details.
Cost Predictor
Creating an index, inserting to an index, and querying an index may use tokens. We can track token usage through the
outputs of these operations. When running operations, the token usage will be printed. You can also fetch the token
usage through index.llm_predictor.last_token_usage. See Cost Predictor How-To for more details.
index.storage_context.persist(persist_dir="<persist_dir>")
# load index
index = load_index_from_storage(storage_context)
NOTE: If you had initialized the index with a custom ServiceContext object, you will also need to pass in the same
ServiceContext during load_index_from_storage.
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
...
You can build indices on top of other indices! Composability gives you greater power in indexing your heterogeneous
sources of data. For a discussion on relevant use cases, see our Query Use Cases. For technical details and examples,
see our Composability How-To.
After building the index, you can now query it with a QueryEngine. Note that a “query” is simply an input to an LLM
- this means that you can use the index for question-answering, but you can also do more than that!
High-level API
To start, you can query an index with the default QueryEngine (i.e., using default configs), as follows:
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
print(response)
Low-level API
We also support a low-level composition API that gives you more granular control over the query logic. Below we
highlight a few of the possible customizations.
# build index
index = GPTVectorStoreIndex.from_documents(documents)
# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=2,
)
# query
response = query_engine.query("What did the author do growing up?")
print(response)
You may also add your own retrieval, response synthesis, and overall query logic, by implementing the corresponding
interfaces.
For a full list of implemented components and the supported configurations, please see the detailed reference docs.
In the following, we discuss some commonly used configurations in detail.
Configuring retriever
An index can have a variety of index-specific retrieval modes. For instance, a list index supports the default
ListIndexRetriever that retrieves all nodes, and ListIndexEmbeddingRetriever that retrieves the top-k nodes
by embedding similarity.
For convienience, you can also use the following shorthand:
# ListIndexRetriever
retriever = index.as_retriever(retriever_mode='default')
# ListIndexEmbeddingRetriever
retriever = index.as_retriever(retriever_mode='embedding')
After choosing your desired retriever, you can construct your query engine:
query_engine = RetrieverQueryEngine(retriever)
response = query_engine.query("What did the author do growing up?")
The full list of retrievers for each index (and their shorthand) is documented in the Query Reference.
After a retriever fetches relevant nodes, a ResponseSynthesizer synthesizes the final response by combining the
information.
You can configure it via
index = GPTListIndex.from_documents(documents)
retriever = index.as_retriever()
# default
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='default')
response = query_engine.query("What did the author do growing up?")
# compact
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='compact')
response = query_engine.query("What did the author do growing up?")
# tree summarize
query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='tree_summarize')
response = query_engine.query("What did the author do growing up?")
We also support advanced Node filtering and augmentation that can further improve the relevancy of the retrieved Node
objects. This can help reduce the time/number of LLM calls/cost or improve response quality.
For example:
• KeywordNodePostprocessor: filters nodes by required_keywords and exclude_keywords.
• SimilarityPostprocessor: filters nodes by setting a threshold on the similarity score (thus only supported
by embedding-based retrievers)
• PrevNextNodePostprocessor: augments retrieved Node objects with additional relevant context based on
Node relationships.
The full list of node postprocessors is documented in the Node Postprocessor Reference.
To configure the desired node postprocessors:
node_postprocessors = [
KeywordNodePostprocessor(
(continues on next page)
The object returned is a Response object. The object contains both the response text as well as the “sources” of the
response:
response = query_engine.query("<query_str>")
# get response
# response.response
str(response)
# get sources
response.source_nodes
# formatted sources
response.get_formatted_sources()
This guide describes how each index works with diagrams. We also visually highlight our “Response Synthesis” modes.
Some terminology:
• Node: Corresponds to a chunk of text from a Document. LlamaIndex takes in Document objects and internally
parses/chunks them into Node objects.
• Response Synthesis: Our module which synthesizes a response given the retrieved Node. You can see how to
specify different response modes here. See below for an illustration of how each response mode works.
List Index
Querying
During query time, if no other query parameters are specified, LlamaIndex simply loads all Nodes in the list into our
Response Synthesis module.
The list index does offer numerous ways of querying a list index, from an embedding-based query which will fetch the
top-k neighbors, or with the addition of a keyword filter, as seen below:
The vector store index stores each Node and a corresponding embedding in a Vector Store.
Querying
Querying a vector store index involves fetching the top-k most similar Nodes, and passing those into our Response
Synthesis module.
Tree Index
The tree index builds a hierarchical tree from a set of Nodes (which become leaf nodes in this tree).
Querying
Querying a tree index involves traversing from root nodes down to leaf nodes. By default, (child_branch_factor=1),
a query chooses one child node given a parent node. If child_branch_factor=2, a query chooses two child nodes
per level.
The keyword table index extracts keywords from each Node and builds a mapping from each keyword to the corre-
sponding Nodes of that keyword.
Querying
During query time, we extract relevant keywords from the query, and match those with pre-extracted Node keywords
to fetch the corresponding Nodes. The extracted Nodes are passed to our Response Synthesis module.
Response Synthesis
LlamaIndex offers different methods of synthesizing a response. The way to toggle this can be found in our Usage
Pattern Guide. Below, we visually highlight how each response mode works.
Create and refine is an iterative way of generating a response. We first use the context in the first node, along with the
query, to generate an initial answer. We then pass this answer, the query, and the context of the second node as input
into a “refine prompt” to generate a refined answer. We refine through N-1 nodes, where N is the total number of nodes.
Tree Summarize
Tree summarize is another way of generating a response. We essentially build a tree index over the set of candidate
nodes, with a summary prompt seeded with the query. The tree is built in a bottoms-up fashion, and in the end the root
node is returned as the response.
3.4 Tutorials
This section contains a list of in-depth tutorials on how to best utilize different capabilities of LlamaIndex within your
end-user application.
They include a broad range of LlamaIndex concepts:
• Semantic search
• Structured data support
• Composability/Query Transformation
They also showcase a variety of application settings that LlamaIndex can be used, from a simple Jupyter notebook to
a chatbot to a full-stack web application.
LlamaIndex is an interface between your data and LLM’s; it offers the toolkit for you to setup a query interface around
your data for any downstream task, whether it’s question-answering, summarization, or more.
In this tutorial, we show you how to build a context augmented chatbot. We use Langchain for the underlying
Agent/Chatbot abstractions, and we use LlamaIndex for the data retrieval/lookup/querying! The result is a chatbot
agent that has access to a rich set of “data interface” Tools that LlamaIndex provides to answer queries over your data.
Note: This is a continuation of some initial work building a query interface over SEC 10-K filings - check it out here.
Context
In this tutorial, we build an “10-K Chatbot” by downloading the raw UBER 10-K HTML filings from Dropbox. The
user can choose to ask questions regarding the 10-K filings.
Ingest Data
# NOTE: the code examples assume you're operating within a Jupyter notebook.
# download files
!mkdir data
!wget "https://www.dropbox.com/s/948jr9cfs7fgj99/UBER.zip?dl=1" -O data/UBER.zip
!unzip data/UBER.zip -d data
We use the Unstructured library to parse the HTML files into formatted text. We have a direct integration with Un-
structured through LlamaHub - this allows us to convert any text into a Document format that LlamaIndex can ingest.
loader = UnstructuredReader()
doc_set = {}
all_docs = []
for year in years:
year_docs = loader.load_data(file=Path(f'./data/UBER/UBER_{year}.html'), split_
˓→documents=False)
3.4. Tutorials 27
LlamaIndex
We first setup a vector index for each year. Each vector index allows us to ask questions about the 10-K filing of a given
year.
We build each index and save it to disk.
# initialize simple vector indices + global vector index
service_context = ServiceContext.from_defaults(chunk_size_limit=512)
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults()
cur_index = GPTVectorStoreIndex.from_documents(
doc_set[year],
service_context=service_context,
storage_context=storage_context,
)
index_set[year] = cur_index
storage_context.persist(persist_dir=f'./storage/{year}')
Since we have access to documents of 4 years, we may not only want to ask questions regarding the 10-K document of
a given year, but ask questions that require analysis over all 10-K filings.
To address this, we compose a “graph” which consists of a list index defined over the 4 vector indices. Querying this
graph would first retrieve information from each vector index, and combine information together via the list index.
from llama_index import GPTListIndex, LLMPredictor, ServiceContext, load_graph_from_
˓→storage
# [optional] load from disk, so you don't need to build graph from scratch
graph = load_graph_from_storage(
root_id=root_id,
service_context=service_context,
storage_context=storage_context,
)
We use Langchain to setup the outer chatbot agent, which has access to a set of Tools. LlamaIndex provides some
wrappers around indices and graphs so that they can be easily used within a Tool interface.
# do imports
from langchain.agents import Tool
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent
We want to define a separate Tool for each index (corresponding to a given year), as well as the graph. We can define
all tools under a central LlamaToolkit interface.
Below, we define a IndexToolConfig for our graph. Note that we also import a DecomposeQueryTransform module
for use within each vector index within the graph - this allows us to “decompose” the overall query into a query that
can be answered from each subindex. (see example below).
# define a decompose transform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor, verbose=True
)
custom_query_engines = {}
for index in index_set.values():
query_engine = index.as_query_engine()
(continues on next page)
3.4. Tutorials 29
LlamaIndex
# tool config
graph_config = IndexToolConfig(
query_engine=query_engine,
name=f"Graph Index",
description="useful for when you want to answer queries that require analyzing␣
˓→multiple SEC 10-K documents for Uber.",
tool_kwargs={"return_direct": True}
)
Besides the GraphToolConfig object, we also define an IndexToolConfig corresponding to each index:
# define toolkit
index_configs = []
for y in range(2019, 2023):
query_engine = index_set[y].as_query_engine(
similarity_top_k=3,
)
tool_config = IndexToolConfig(
query_engine=query_engine,
name=f"Vector Index {y}",
description=f"useful for when you want to answer queries about the {y} SEC 10-K␣
˓→for Uber",
tool_kwargs={"return_direct": True}
)
index_configs.append(tool_config)
toolkit = LlamaToolkit(
index_configs=index_configs + [graph_config],
)
Finally, we call create_llama_chat_agent to create our Langchain chatbot agent, which has access to the 5 Tools
we defined above:
memory = ConversationBufferMemory(memory_key="chat_history")
llm=OpenAI(temperature=0)
agent_chain = create_llama_chat_agent(
toolkit,
llm,
memory=memory,
(continues on next page)
agent_chain.run(input="hi, i am bob")
If we test it with a query regarding the 10-k of a given year, the agent will use the relevant vector index Tool.
agent_chain.run(input="What were some of the biggest risk factors in 2020 for Uber?")
Observation:
Risk Factors
The COVID-19 pandemic and the impact of actions to mitigate the pandemic has adversely␣
˓→affected and continues to adversely affect our business, financial condition, and␣
˓→results of operations.
...
'\n\nRisk Factors\n\nThe COVID-19 pandemic and the impact of actions to mitigate the␣
˓→pandemic has adversely affected and continues to adversely affect our business,
Finally, if we test it with a query to compare/contrast risk factors across years, the agent will use the graph index Tool.
cross_query_str = (
"Compare/contrast the risk factors described in the Uber 10-K across years. Give␣
˓→answer in bullet points."
)
agent_chain.run(input=cross_query_str)
3.4. Tutorials 31
LlamaIndex
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2022 fiscal␣
˓→year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2022 fiscal␣
˓→year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2021 fiscal␣
˓→year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2021 fiscal␣
˓→year?
> New query: What are the risk factors described in the Uber 10-K for the 2020 fiscal␣
˓→year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2020 fiscal␣
˓→year?
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
(continues on next page)
> Current query: Compare/contrast the risk factors described in the Uber 10-K across␣
˓→years.
> New query: What are the risk factors described in the Uber 10-K for the 2019 fiscal␣
˓→year?
Observation:
In 2020, the risk factors included the timing of widespread adoption of vaccines against␣
˓→the virus, additional actions that may be taken by governmental authorities, the␣
...
Now that we have the chatbot setup, it only takes a few more steps to setup a basic interactive loop to converse with our
SEC-augmented chatbot!
while True:
text_input = input("User: ")
response = agent_chain.run(input=text_input)
print(f'Agent: {response}')
User: What were some of the legal proceedings against Uber in 2022?
Agent:
In 2022, legal proceedings against Uber include a motion to compel arbitration, an␣
˓→appeal of a ruling that Proposition 22 is unconstitutional, a complaint alleging that␣
˓→drivers are employees and entitled to protections under the wage and labor laws, a␣
˓→employment violations in New York, fraud related to certain deductions, class actions␣
˓→in Australia alleging that Uber entities conspired to injure the group members during␣
˓→the period 2014 to 2017 by either directly breaching transport legislation or␣
˓→and claims of lost income and decreased value of certain taxi. Additionally, Uber is␣
3.4. Tutorials 33
LlamaIndex
˓→Drivers as independent contractors and from violating various wage and hour laws.
User:
Notebook
LlamaIndex is a python library, which means that integrating it with a full-stack web application will be a little different
than what you might be used to.
This guide seeks to walk through the steps needed to create a basic API service written in python, and how this interacts
with a TypeScript+React frontend.
All code examples here are available from the llama_index_starter_pack in the flask_react folder.
The main technologies used in this guide are as follows:
• python3.11
• llama_index
• flask
• typescript
• react
Flask Backend
For this guide, our backend will use a Flask API server to communicate with our frontend code. If you prefer, you can
also easily translate this to a FastAPI server, or any other python server library of your choice.
Setting up a server using Flask is easy. You import the package, create the app object, and then create your endpoints.
Let’s create a basic skeleton for the server first:
app = Flask(__name__)
@app.route("/")
def home():
return "Hello World!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5601)
flask_demo.py
If you run this file (python flask_demo.py), it will launch a server on port 5601. If you visit http://
localhost:5601/, you will see the “Hello World!” text rendered in your browser. Nice!
The next step is deciding what functions we want to include in our server, and to start using LlamaIndex.
To keep things simple, the most basic operation we can provide is querying an existing index. Using the paul graham
essay from LlamaIndex, create a documents folder and download+place the essay text file inside of it.
import os
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, StorageContext
# NOTE: for local testing only, do NOT deploy with your key hardcoded
os.environ['OPENAI_API_KEY'] = "your key here"
index = None
def initialize_index():
global index
storage_context = StorageContext.from_defaults()
if os.path.exists(index_dir):
index = load_index_from_storage(storage_context)
else:
documents = SimpleDirectoryReader("./documents").load_data()
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_
˓→context)
storage_context.persist(index_dir)
This function will initialize our index. If we call this just before starting the flask server in the main function, then our
index will be ready for user queries!
Our query endpoint will accept GET requests with the query text as a parameter. Here’s what the full endpoint function
will look like:
@app.route("/query", methods=["GET"])
def query_index():
global index
query_text = request.args.get("text", None)
if query_text is None:
return "No text found, please include a ?text=blah parameter in the URL", 400
query_engine = index.as_query_engine()
response = query_engine.query(query_text)
return str(response), 200
3.4. Tutorials 35
LlamaIndex
Things are looking pretty cool, but how can we take this a step further? What if we want to allow users to build their
own indexes by uploading their own documents? Have no fear, Flask can handle it all :muscle:.
To let users upload documents, we have to take some extra precautions. Instead of querying an existing index, the
index will become mutable. If you have many users adding to the same index, we need to think about how to handle
concurrency. Our Flask server is threaded, which means multiple users can ping the server with requests which will be
handled at the same time.
One option might be to create an index for each user or group, and store and fetch things from S3. But for this example,
we will assume there is one locally stored index that users are interacting with.
To handle concurrent uploads and ensure sequential inserts into the index, we can use the BaseManager python package
to provide sequential access to the index using a separate server and locks. This sounds scary, but it’s not so bad! We
will just move all our index operations (initializing, querying, inserting) into the BaseManager “index_server”, which
will be called from our Flask server.
Here’s a basic example of what our index_server.py will look like after we’ve moved our code:
import os
from multiprocessing import Lock
from multiprocessing.managers import BaseManager
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, Document
# NOTE: for local testing only, do NOT deploy with your key hardcoded
os.environ['OPENAI_API_KEY'] = "your key here"
index = None
lock = Lock()
def initialize_index():
global index
with lock:
# same as before ...
...
def query_index(query_text):
global index
query_engine = index.as_query_engine()
response = query_engine.query(query_text)
return str(response)
if __name__ == "__main__":
# init the global index
(continues on next page)
# setup server
# NOTE: you might want to handle the password in a less hardcoded way
manager = BaseManager(('', 5602), b'password')
manager.register('query_index', query_index)
server = manager.get_server()
print("starting server...")
server.serve_forever()
index_server.py
So, we’ve moved our functions, introduced the Lock object which ensures sequential access to the global index, regis-
tered our single function in the server, and started the server on port 5602 with the password password.
Then, we can adjust our flask code as follows:
@app.route("/query", methods=["GET"])
def query_index():
global index
query_text = request.args.get("text", None)
if query_text is None:
return "No text found, please include a ?text=blah parameter in the URL", 400
response = manager.query_index(query_text)._getvalue()
return str(response), 200
@app.route("/")
def home():
return "Hello World!"
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5601)
flask_demo.py
The two main changes are connecting to our existing BaseManager server and registering the functions, as well as
calling the function through the manager in the /query endpoint.
One special thing to note is that BaseManager servers don’t return objects quite as we expect. To resolve the return
value into it’s original object, we call the _getvalue() function.
If we allow users to upload their own documents, we should probably remove the Paul Graham essay from the documents
folder, so let’s do that first. Then, let’s add an endpoint to upload files! First, let’s define our Flask endpoint function:
3.4. Tutorials 37
LlamaIndex
...
manager.register('insert_into_index')
...
@app.route("/uploadFile", methods=["POST"])
def upload_file():
global manager
if 'file' not in request.files:
return "Please send a POST request with a file", 400
filepath = None
try:
uploaded_file = request.files["file"]
filename = secure_filename(uploaded_file.filename)
filepath = os.path.join('documents', os.path.basename(filename))
uploaded_file.save(filepath)
Not too bad! You will notice that we write the file to disk. We could skip this if we only accept basic file formats
like txt files, but written to disk we can take advantage of LlamaIndex’s SimpleDirectoryReader to take care of a
bunch of more complex file formats. Optionally, we also use a second POST argument to either use the filename as a
doc_id or let LlamaIndex generate one for us. This will make more sense once we implement the frontend.
With these more complicated requests, I also suggest using a tool like Postman. Examples of using postman to test our
endpoints are in the repository for this project.
Lastly, you’ll notice we added a new function to the manager. Let’s implement that inside index_server.py:
def insert_into_index(doc_text, doc_id=None):
global index
document = SimpleDirectoryReader(input_files=[doc_text]).load_data()[0]
if doc_id is not None:
document.doc_id = doc_id
with lock:
index.insert(document)
index.storage_context.persist()
...
(continues on next page)
Easy! If we launch both the index_server.py and then the flask_demo.py python files, we have a Flask API server
that can handle multiple requests to insert documents into a vector index and respond to user queries!
To support some functionality in the frontend, I’ve adjusted what some responses look like from the Flask API, as well
as added some functionality to keep track of which documents are stored in the index (LlamaIndex doesn’t currently
support this in a user-friendly way, but we can augment it ourselves!). Lastly, I had to add CORS support to the server
using the Flask-cors python package.
Check out the complete flask_demo.py and index_server.py scripts in the repository for the final minor changes,
therequirements.txt file, and a sample Dockerfile to help with deployment.
React Frontend
Generally, React and Typescript are one of the most popular libraries and languages for writing webapps today. This
guide will assume you are familiar with how these tools work, because otherwise this guide will triple in length :smile:.
In the repository, the frontend code is organized inside of the react_frontend folder.
The most relevant part of the frontend will be the src/apis folder. This is where we make calls to the Flask server,
supporting the following queries:
• /query – make a query to the existing index
• /uploadFile – upload a file to the flask server for insertion into the index
• /getDocuments – list the current document titles and a portion of their texts
Using these three queries, we can build a robust frontend that allows users to upload and keep track of their files, query
the index, and view the query response and information about which text nodes were used to form the response.
fetchDocuments.tsx
This file contains the function to, you guessed it, fetch the list of current documents in the index. The code is as follows:
if (!response.ok) {
return [];
}
3.4. Tutorials 39
LlamaIndex
As you can see, we make a query to the Flask server (here, it assumes running on localhost). Notice that we need to
include the mode: 'cors' option, as we are making an external request.
Then, we check if the response was ok, and if so, get the response json and return it. Here, the response json is a list of
Document objects that are defined in the same file.
queryIndex.tsx
This file sends the user query to the flask server, and gets the response back, as well as details about which nodes in
our index provided the response.
export type ResponseSources = {
text: string;
doc_id: string;
start: number;
end: number;
similarity: number;
};
return queryResponse;
};
This is similar to the fetchDocuments.tsx file, with the main difference being we include the query text as a parameter
in the URL. Then, we check if the response is ok and return it with the appropriate typescript type.
insertDocument.tsx
Probably the most complex API call is uploading a document. The function here accepts a file object and constructs a
POST request using FormData.
The actual response text is not used in the app but could be utilized to provide some user feedback on if the file failed
to upload or not.
const insertDocument = async (file: File) => {
const formData = new FormData();
(continues on next page)
And that pretty much wraps up the frontend portion! The rest of the react frontend code is some pretty basic react
components, and my best attempt to make it look at least a little nice :smile:.
I encourage to read the rest of the codebase and submit any PRs for improvements!
Conclusion
This guide has covered a ton of information. We went from a basic “Hello World” Flask server written in python, to a
fully functioning LlamaIndex powered backend and how to connect that to a frontend application.
As you can see, we can easily augment and wrap the services provided by LlamaIndex (like the little external document
tracker) to help provide a good user experience on the frontend.
You could take this and add many features (multi-index/user support, saving objects into S3, adding a Pinecone vector
server, etc.). And when you build an app after reading this, be sure to share the final result in the Discord! Good Luck!
:muscle:
This guide seeks to walk you through using LlamaIndex with a production-ready web app starter template called Del-
phic. All code examples here are available from the Delphic repo
3.4. Tutorials 41
LlamaIndex
Architectural Overview
Delphic leverages the LlamaIndex python library to let users to create their own document collections they can then
query in a responsive frontend.
We chose a stack that provides a responsive, robust mix of technologies that can (1) orchestrate complex python process-
ing tasks while providing (2) a modern, responsive frontend and (3) a secure backend to build additional functionality
upon.
The core libraries are:
1. Django
2. Django Channels
3. Django Ninja
4. Redis
5. Celery
6. LlamaIndex
7. Langchain
8. React
9. Docker & Docker Compose
Thanks to this modern stack built on the super stable Django web framework, the starter Delphic app boasts a stream-
lined developer experience, built-in authentication and user management, asynchronous vector store processing, and
web-socket-based query connections for a responsive UI. In addition, our frontend is built with TypeScript and is based
on MUI React for a responsive and modern user interface.
System Requirements
Celery doesn’t work on Windows. It may be deployable with Windows Subsystem for Linux, but configuring that is
beyond the scope of this tutorial. For this reason, we recommend you only follow this tutorial if you’re running Linux or
OSX. You will need Docker and Docker Compose installed to deploy the application. Local development will require
node version manager (nvm).
Django Backend
The Delphic application has a structured backend directory organization that follows common Django project conven-
tions. From the repo root, in the ./delphic subfolder, the main folders are:
1. contrib: This directory contains custom modifications or additions to Django’s built-in contrib apps.
2. indexes: This directory contains the core functionality related to document indexing and LLM integration. It
includes:
• admin.py: Django admin configuration for the app
• apps.py: Application configuration
• models.py: Contains the app’s database models
• migrations: Directory containing database schema migrations for the app
• signals.py: Defines any signals for the app
Database Models
The Delphic application has two core models: Document and Collection. These models represent the central entities
the application deals with when indexing and querying documents using LLMs. They’re defined in ./delphic/
indexes/models.py.
1. Collection:
• api_key: A foreign key that links a collection to an API key. This helps associate jobs with the source API key.
• title: A character field that provides a title for the collection.
• description: A text field that provides a description of the collection.
• status: A character field that stores the processing status of the collection, utilizing the CollectionStatus
enumeration.
• created: A datetime field that records when the collection was created.
• modified: A datetime field that records the last modification time of the collection.
• model: A file field that stores the model associated with the collection.
• processing: A boolean field that indicates if the collection is currently being processed.
2. Document:
• collection: A foreign key that links a document to a collection. This represents the relationship between
documents and collections.
• file: A file field that stores the uploaded document file.
• description: A text field that provides a description of the document.
• created: A datetime field that records when the document was created.
• modified: A datetime field that records the last modification time of the document.
These models provide a solid foundation for collections of documents and the indexes created from them with Lla-
maIndex.
Django Ninja is a web framework for building APIs with Django and Python 3.7+ type hints. It provides a simple,
intuitive, and expressive way of defining API endpoints, leveraging Python’s type hints to automatically generate input
validation, serialization, and documentation.
In the Delphic repo, the ./config/api/endpoints.py file contains the API routes and logic for the API endpoints.
Now, let’s briefly address the purpose of each endpoint in the endpoints.py file:
3.4. Tutorials 43
LlamaIndex
1. /heartbeat: A simple GET endpoint to check if the API is up and running. Returns True if the API is acces-
sible. This is helpful for Kubernetes setups that expect to be able to query your container to ensure it’s up and
running.
2. /collections/create: A POST endpoint to create a new Collection. Accepts form parameters such as
title, description, and a list of files. Creates a new Collection and Document instances for each file,
and schedules a Celery task to create an index.
@collections_router.post("/create")
async def create_collection(request,
title: str = Form(...),
description: str = Form(...),
files: list[UploadedFile] = File(...), ):
key = None if getattr(request, "auth", None) is None else request.auth
if key is not None:
key = await key
collection_instance = Collection(
api_key=key,
title=title,
description=description,
status=CollectionStatusEnum.QUEUED,
)
await sync_to_async(collection_instance.save)()
create_index.si(collection_instance.id).apply_async()
3. /collections/query — a POST endpoint to query a document collection using the LLM. Accepts a JSON
payload containing collection_id and query_str, and returns a response generated by querying the collec-
tion. We don’t actually use this endpoint in our chat GUI (We use a websocket - see below), but you could build
an app to integrate to this REST endpoint to query a specific collection.
@collections_router.post("/query",
response=CollectionQueryOutput,
summary="Ask a question of a document collection", )
def query_collection_view(request: HttpRequest, query_input: CollectionQueryInput):
collection_id = query_input.collection_id
query_str = query_input.query_str
response = query_collection(collection_id, query_str)
return {"response": response}
4. /collections/available: A GET endpoint that returns a list of all collections created with the user’s API
key. The output is serialized using the CollectionModelSchema.
@collections_router.get("/available",
response=list[CollectionModelSchema],
summary="Get a list of all of the collections created with my␣
˓→api_key", )
collections = Collection.objects.filter(api_key=key)
return [
{
...
}
async for collection in collections
]
Intro to Websockets
WebSockets are a communication protocol that enables bidirectional and full-duplex communication between a client
and a server over a single, long-lived connection. The WebSocket protocol is designed to work over the same ports
as HTTP and HTTPS (ports 80 and 443, respectively) and uses a similar handshake process to establish a connection.
Once the connection is established, data can be sent in both directions as “frames” without the need to reestablish the
connection each time, unlike traditional HTTP requests.
There are several reasons to use WebSockets, particularly when working with code that takes a long time to load into
memory but is quick to run once loaded:
1. Performance: WebSockets eliminate the overhead associated with opening and closing multiple connections for
each request, reducing latency.
2. Efficiency: WebSockets allow for real-time communication without the need for polling, resulting in more effi-
cient use of resources and better responsiveness.
3. Scalability: WebSockets can handle a large number of simultaneous connections, making it ideal for applications
that require high concurrency.
In the case of the Delphic application, using WebSockets makes sense as the LLMs can be expensive to load into
memory. By establishing a WebSocket connection, the LLM can remain loaded in memory, allowing subsequent
requests to be processed quickly without the need to reload the model each time.
3.4. Tutorials 45
LlamaIndex
The ASGI configuration file ./config/asgi.py defines how the application should handle incoming connections,
using the Django Channels ProtocolTypeRouter to route connections based on their protocol type. In this case, we
have two protocol types: “http” and “websocket”.
The “http” protocol type uses the standard Django ASGI application to handle HTTP requests, while the “websocket”
protocol type uses a custom TokenAuthMiddleware to authenticate WebSocket connections. The URLRouter within
the TokenAuthMiddleware defines a URL pattern for the CollectionQueryConsumer, which is responsible for
handling WebSocket connections related to querying document collections.
application = ProtocolTypeRouter(
{
"http": get_asgi_application(),
"websocket": TokenAuthMiddleware(
URLRouter(
[
re_path(
r"ws/collections/(?P<collection_id>\w+)/query/$",
CollectionQueryConsumer.as_asgi(),
),
]
)
),
}
)
This configuration allows clients to establish WebSocket connections with the Delphic application to efficiently query
document collections using the LLMs, without the need to reload the models for each request.
Websocket Handler
The connect method is responsible for establishing the connection, extracting the collection ID from the connection
path, loading the collection model, and accepting the connection.
except ValueError as e:
(continues on next page)
The disconnect method is empty in this case, as there are no additional actions to be taken when the WebSocket is
closed.
The receive method is responsible for processing incoming messages from the WebSocket. It takes the incoming
message, decodes it, and then queries the loaded collection model using the provided query. The response is then
formatted as a markdown string and sent back to the client over the WebSocket connection.
query_engine = self.index.as_query_engine()
response = query_engine.query(modified_query_str)
formatted_response = f"{markdown_response}{markdown_sources}"
To load the collection model, the load_collection_model function is used, which can be found in delphic/utils/
collections.py. This function retrieves the collection object with the given collection ID, checks if a JSON file for
the collection model exists, and if not, creates one. Then, it sets up the LLMPredictor and ServiceContext before
loading the GPTVectorStoreIndex using the cache file.
Args:
collection_id (Union[str, int]): The ID of the Collection model instance.
(continues on next page)
3.4. Tutorials 47
LlamaIndex
Returns:
GPTVectorStoreIndex: The loaded index.
'/cache/model_{collection_id}.json'.
4. Call GPTVectorStoreIndex.load_from_disk with the cache_file_path.
"""
# Retrieve the Collection object
collection = await Collection.objects.aget(id=collection_id)
logger.info(f"load_collection_model() - loaded collection {collection_id}")
# define LLM
logger.info(
f"load_collection_model() - Setup service context with tokens {settings.MAX_
˓→TOKENS} and "
f"model {settings.MODEL_NAME}"
)
llm_predictor = LLMPredictor(
llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=512)
)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# Call GPTVectorStoreIndex.load_from_disk
logger.info("load_collection_model() - Load llama index")
index = GPTVectorStoreIndex.load_from_disk(
cache_file_path, service_context=service_context
)
logger.info(
"load_collection_model() - Llamaindex loaded and ready for query..."
)
else:
logger.error(
f"load_collection_model() - collection {collection_id} has no model!"
(continues on next page)
return index
React Frontend
Overview
We chose to use TypeScript, React and Material-UI (MUI) for the Delphic project’s frontend for a couple reasons. First,
as the most popular component library (MUI) for the most popular frontend framework (React), this choice makes this
project accessible to a huge community of developers. Second, React is, at this point, a stable and generally well-liked
framework that delivers valuable abstractions in the form of its virtual DOM while still being relatively stable and, in
our opinion, pretty easy to learn, again making it accessible.
The frontend can be found in the /frontend directory of the repo, with the React-related components being in /
frontend/src . You’ll notice there is a DockerFile in the frontend directory and several folders and files related to
configuring our frontend web server — nginx.
The /frontend/src/App.tsx file serves as the entry point of the application. It defines the main components, such
as the login form, the drawer layout, and the collection create modal. The main components are conditionally rendered
based on whether the user is logged in and has an authentication token.
The DrawerLayout2 component is defined in theDrawerLayour2.tsx file. This component manages the layout of the
application and provides the navigation and main content areas.
Since the application is relatively simple, we can get away with not using a complex state management solution like
Redux and just use React’s useState hooks.
The collections available to the logged-in user are retrieved and displayed in the DrawerLayout2 component. The
process can be broken down into the following steps:
1. Initializing state variables:
Here, we initialize two state variables: collections to store the list of collections and loading to track whether the
collections are being fetched.
2. Collections are fetched for the logged-in user with the fetchCollections() function:
const
fetchCollections = async () = > {
try {
const accessToken = localStorage.getItem("accessToken");
if (accessToken) {
(continues on next page)
3.4. Tutorials 49
LlamaIndex
The fetchCollections function retrieves the collections for the logged-in user by calling the getMyCollections
API function with the user’s access token. It then updates the collections state with the retrieved data and sets the
loading state to false to indicate that fetching is complete.
Displaying Collections
You’ll notice that the disabled property of a collection’s ListItemButton is set based on whether the collection’s
status is not CollectionStatus.COMPLETE or the collection does not have a model (!collection.has_model). If
either of these conditions is true, the button is disabled, preventing users from selecting an incomplete or model-less
collection. Where the CollectionStatus is RUNNING, we also show a loading wheel over the button.
In a separate useEffect hook, we check if any collection in the collections state has a status of
CollectionStatus.RUNNING or CollectionStatus.QUEUED. If so, we set up an interval to repeatedly call the
fetchCollections function every 15 seconds (15,000 milliseconds) to update the collection statuses. This way, the
application periodically checks for completed collections, and the UI is updated accordingly when the processing is
done.
useEffect(() = > {
let
interval: NodeJS.Timeout;
if (
collections.some(
(collection) = >
collection.status == = CollectionStatus.RUNNING | |
collection.status == = CollectionStatus.QUEUED
)
) {
interval = setInterval(() = > {
fetchCollections();
}, 15000);
}
return () = > clearInterval(interval);
}, [collections]);
The ChatView component in frontend/src/chat/ChatView.tsx is responsible for handling and displaying a chat
interface for a user to interact with a collection. The component establishes a WebSocket connection to communicate
in real-time with the server, sending and receiving messages.
Key features of the ChatView component include:
1. Establishing and managing the WebSocket connection with the server.
2. Displaying messages from the user and the server in a chat-like format.
3. Handling user input to send messages to the server.
4. Updating the messages state and UI based on received messages from the server.
5. Displaying connection status and errors, such as loading messages, connecting to the server, or encountering
errors while loading a collection.
Together, all of this allows users to interact with their selected collection with a very smooth, low-latency experience.
The WebSocket connection in the ChatView component is used to establish real-time communication between the
client and the server. The WebSocket connection is set up and managed in the ChatView component as follows:
First, we want to initialize the the WebSocket reference:
const websocket = useRef<WebSocket | null>(null);
A websocket reference is created using useRef, which holds the WebSocket object that will be used for communi-
cation. useRef is a hook in React that allows you to create a mutable reference object that persists across renders. It
is particularly useful when you need to hold a reference to a mutable object, such as a WebSocket connection, without
causing unnecessary re-renders.
3.4. Tutorials 51
LlamaIndex
In the ChatView component, the WebSocket connection needs to be established and maintained throughout the lifetime
of the component, and it should not trigger a re-render when the connection state changes. By using useRef, you ensure
that the WebSocket connection is kept as a reference, and the component only re-renders when there are actual state
changes, such as updating messages or displaying errors.
The setupWebsocket function is responsible for establishing the WebSocket connection and setting up event handlers
to handle different WebSocket events.
Overall, the setupWebsocket function looks like this:
);
return () => {
websocket.current?.close();
};
};
Notice in a bunch of places we trigger updates to the GUI based on the information from the web socket client.
When the component first opens and we try to establish a connection, the onopen listener is triggered. In the callback,
the component updates the states to reflect that the connection is established, any previous errors are cleared, and no
messages are awaiting responses:
onmessageis triggered when a new message is received from the server through the WebSocket connection. In the
callback, the received data is parsed and the messages state is updated with the new message from the server:
if (data.response) {
// Update the messages state with the new message from the server
setMessages((prevMessages) => [
...prevMessages,
{
sender_id: "server",
message: data.response,
timestamp: new Date().toLocaleTimeString(),
},
]);
}
};
oncloseis triggered when the WebSocket connection is closed. In the callback, the component checks for a specific
close code (4000) to display a warning toast and update the component states accordingly. It also logs the close event:
Finally, onerror is triggered when an error occurs with the WebSocket connection. In the callback, the component
updates the states to reflect the error and logs the error event:
3.4. Tutorials 53
LlamaIndex
In the ChatView component, the layout is determined using CSS styling and Material-UI components. The main
layout consists of a container with a flex display and a column-oriented flexDirection. This ensures that the
content within the container is arranged vertically.
There are three primary sections within the layout:
1. The chat messages area: This section takes up most of the available space and displays a list of messages ex-
changed between the user and the server. It has an overflow-y set to ‘auto’, which allows scrolling when the
content overflows the available space. The messages are rendered using the ChatMessage component for each
message and a ChatMessageLoading component to show the loading state while waiting for a server response.
2. The divider: A Material-UI Divider component is used to separate the chat messages area from the input area,
creating a clear visual distinction between the two sections.
3. The input area: This section is located at the bottom and allows the user to type and send messages. It contains a
TextField component from Material-UI, which is set to accept multiline input with a maximum of 2 rows. The
input area also includes a Button component to send the message. The user can either click the “Send” button
or press “ Enter” on their keyboard to send the message.
The user inputs accepted in the ChatView component are text messages that the user types in the TextField. The
component processes these text inputs and sends them to the server through the WebSocket connection.
Deployment
Prerequisites
To deploy the app, you’re going to need Docker and Docker Compose installed. If you’re on Ubuntu or another, common
Linux distribution, DigitalOcean has a great Docker tutorial and another great tutorial for Docker Compose you can
follow. If those don’t work for you, try the official docker documentation.
The project is based on django-cookiecutter, and it’s pretty easy to get it deployed on a VM and configured to serve
HTTPs traffic for a specific domain. The configuration is somewhat involved, however — not because of this project,
but it’s just a fairly involved topic to configure your certificates, DNS, etc.
For the purposes of this guide, let’s just get running locally. Perhaps we’ll release a guide on production deployment.
In the meantime, check out the Django Cookiecutter project docs for starters.
This guide assumes your goal is to get the application up and running for use. If you want to develop, most likely you
won’t want to launch the compose stack with the — profiles fullstack flag and will instead want to launch the react
frontend using the node development server.
To deploy, first clone the repo:
cd delphic
mkdir -p ./.envs/.local/
cp -a ./docs/sample_envs/local/.frontend ./frontend
cp -a ./docs/sample_envs/local/.django ./.envs/.local
cp -a ./docs/sample_envs/local/.postgres ./.envs/.local
Edit the .django and .postgres configuration files to include your OpenAI API key and set a unique password for
your database user. You can also set the response token limit in the .django file or switch which OpenAI model you
want to use. GPT4 is supported, assuming you’re authorized to access it.
Build the docker compose stack with the --profiles fullstack flag:
The fullstack flag instructs compose to build a docker container from the frontend folder and this will be launched along
with all of the needed, backend containers. It takes a long time to build a production React container, however, so we
don’t recommend you develop this way. Follow the instructions in the project readme.md for development environment
setup instructions.
Finally, bring up the application:
Now, visit localhost:3000 in your browser to see the frontend, and use the Delphic application locally.
Setup Users
In order to actually use the application (at the moment, we intend to make it possible to share certain models with
unauthenticated users), you need a login. You can use either a superuser or non-superuser. In either case, someone
needs to first create a superuser using the console:
Why set up a Django superuser? A Django superuser has all the permissions in the application and can manage
all aspects of the system, including creating, modifying, and deleting users, collections, and other data. Setting up a
superuser allows you to fully control and manage the application.
How to create a Django superuser:
1 Run the following command to create a superuser:
sudo docker-compose -f local.yml run django python manage.py createsuperuser
2 You will be prompted to provide a username, email address, and password for the superuser. Enter the required
information.
How to create additional users using Django admin:
1. Start your Delphic application locally following the deployment instructions.
2. Visit the Django admin interface by navigating to http://localhost:8000/admin in your browser.
3. Log in with the superuser credentials you created earlier.
4. Click on “Users” under the “Authentication and Authorization” section.
5. Click on the “Add user +” button in the top right corner.
6. Enter the required information for the new user, such as username and password. Click “Save” to create the user.
3.4. Tutorials 55
LlamaIndex
7. To grant the new user additional permissions or make them a superuser, click on their username in the user list,
scroll down to the “Permissions” section, and configure their permissions accordingly. Save your changes.
A lot of modern data systems depend on structured data, such as a Postgres DB or a Snowflake data warehouse. Lla-
maIndex provides a lot of advanced features, powered by LLM’s, to both create structured data from unstructured data,
as well as analyze this structured data through augmented text-to-SQL capabilities.
This guide helps walk through each of these capabilities. Specifically, we cover the following topics:
• Inferring Structured Datapoints: Converting unstructured data to structured data.
• Text-to-SQL (basic): How to query a set of tables using natural language.
• Injecting Context: How to inject context for each table into the text-to-SQL prompt. The context can be manu-
ally added, or it can be derived from unstructured documents.
• Storing Table Context within an Index: By default, we directly insert the context into the prompt. Sometimes
this is not feasible if the context is large. Here we show how you can actually use a LlamaIndex data structure to
contain the table context!
We will walk through a toy example table which contains city/population/country information.
Setup
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select,␣
˓→column
engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData(bind=engine)
Finally, we can wrap the SQLAlchemy engine with our SQLDatabase wrapper; this allows the db to be used within
LlamaIndex:
If the db is already populated with data, we can instantiate the SQL index with a blank documents list. Otherwise see
the below section.
index = GPTSQLStructStoreIndex(
[],
sql_database=sql_database,
table_name="city_stats",
)
LlamaIndex offers the capability to convert unstructured datapoints to structured data. In this section, we show how
we can populate the city_stats table by ingesting Wikipedia articles about each city.
First, we use the Wikipedia reader from LlamaHub to load some pages regarding the relevant data.
WikipediaReader = download_loader("WikipediaReader")
wiki_docs = WikipediaReader().load_data(pages=['Toronto', 'Berlin', 'Tokyo'])
When we build the SQL index, we can specify these docs as the first input; these documents will be converted to
structured datapoints and inserted into the db:
You can take a look at the current table to verify that the datapoints have been inserted!
3.4. Tutorials 57
LlamaIndex
Text-to-SQL (basic)
LlamaIndex offers “text-to-SQL” capabilities, both at a very basic level and also at a more advanced level. In this
section, we show how to make use of these text-to-SQL capabilities at a basic level.
A simple example is shown here:
You can access the underlying derived SQL query through response.extra_info['sql_query']. It should look
something like this:
Injecting Context
By default, the text-to-SQL prompt just injects the table schema information into the prompt. However, oftentimes
you may want to add your own context as well. This section shows you how you can add context, either manually, or
extracted through documents.
We offer you a context builder class to better manage the context within your SQL tables:
SQLContextContainerBuilder. This class takes in the SQLDatabase object, and a few other optional pa-
rameters, and builds a SQLContextContainer object that you can then pass to the index during construction +
query-time.
You can add context manually to the context builder. The code snippet below shows you how:
"The user will query with codewords, where 'foo' corresponds to population and 'bar'"
"corresponds to city."
)
table_context_dict={"city_stats": city_stats_text}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_
(continues on next page)
You can also choose to extract context from a set of unstructured Documents. To do this, you
can call SQLContextContainerBuilder.from_documents. We use the TableContextPrompt and the
RefineTableContextPrompt (see the reference docs).
# this is a dummy document that we will extract context from
# in GPTSQLContextContainerBuilder
city_stats_text = (
"This table gives information regarding the population and country of a given city.\n
˓→"
)
context_documents_dict = {"city_stats": [Document(city_stats_text)]}
context_builder = SQLContextContainerBuilder.from_documents(
context_documents_dict,
sql_database
)
context_container = context_builder.build_context_container()
A database collection can have many tables, and if each table has many columns + a description associated with it, then
the total context can be quite large.
Luckily, you can choose to use a LlamaIndex data structure to store this table context! Then when the SQL index is
queried, we can use this “side” index to retrieve the proper context that can be fed into the text-to-SQL prompt.
Here we make use of the derive_index_from_context function within SQLContextContainerBuilder to create
a new index. You have flexibility in choosing which index class to specify + which arguments to pass in. We then use
a helper method called query_index_for_context which is a simple wrapper on the query call that wraps a query
template + stores the context on the generated context container.
You can then build the context container, and pass it to the index during query-time!
from llama_index import GPTSQLStructStoreIndex, SQLDatabase, GPTVectorStoreIndex
from llama_index.indices.struct_store import SQLContextContainerBuilder
(continues on next page)
3.4. Tutorials 59
LlamaIndex
sql_database = SQLDatabase(engine)
# build a vector index from the table schema information
context_builder = SQLContextContainerBuilder(sql_database)
table_schema_index = context_builder.derive_index_from_context(
GPTVectorStoreIndex,
store_index=True
)
Concluding Thoughts
This is it for now! We’re constantly looking for ways to improve our structured data support. If you have any questions
let us know in our Discord.
Llama Index has many use cases (semantic search, summarization, etc.) that are well documented. However, this
doesn’t mean we can’t apply Llama Index to very specific use cases!
In this tutorial, we will go through the design process of using Llama Index to extract terms and definitions from text,
while allowing users to query those terms later. Using Streamlit, we can provide an easy to build frontend for running
and testing all of this, and quickly iterate with our design.
This tutorial assumes you have Python3.9+ and the following packages installed:
• llama-index
• streamlit
At the base level, our objective is to take text from a document, extract terms and definitions, and then provide a way
for users to query that knowledge base of terms and definitions. The tutorial will go over features from both Llama
Index and Streamlit, and hopefully provide some interesting solutions for common problems that come up.
The final version of this tutorial can be found here and a live hosted demo is available on Huggingface Spaces.
Uploading Text
Step one is giving users a way to upload documents. Let’s write some code using Streamlit to provide the interface for
this! Use the following code and launch the app with streamlit run app.py.
import streamlit as st
Super simple right! But you’ll notice that the app doesn’t do anything useful yet. To use llama_index, we also need to
setup our OpenAI LLM. There are a bunch of possible settings for the LLM, so we can let the user figure out what’s
best. We should also let the user set the prompt that will extract the terms (which will also help us debug what works
best).
LLM Settings
This next step introduces some tabs to our app, to separate it into different panes that provide different features. Let’s
create a tab for LLM settings and for uploading text:
import os
import streamlit as st
DEFAULT_TERM_STR = (
"Make a list of terms and definitions that are defined in the context, "
"with one pair on each line. "
"If a term is missing it's definition, use your best judgment. "
"Write each line as as follows:\nTerm: <term> Definition: <definition>"
)
with setup_tab:
st.subheader("LLM Setup")
api_key = st.text_input("Enter your OpenAI API key here", type="password")
llm_name = st.selectbox('Which LLM?', ["text-davinci-003", "gpt-3.5-turbo", "gpt-4"])
model_temperature = st.slider("LLM Temperature", min_value=0.0, max_value=1.0,␣
˓→step=0.1)
with upload_tab:
st.subheader("Extract and Query Definitions")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and document_text:
with st.spinner("Extracting..."):
(continues on next page)
3.4. Tutorials 61
LlamaIndex
Now our app has two tabs, which really helps with the organization. You’ll also noticed I added a default prompt to
extract terms – you can change this later once you try extracting some terms, it’s just the prompt I arrived at after
experimenting a bit.
Speaking of extracting terms, it’s time to add some functions to do just that!
Now that we are able to define LLM settings and upload text, we can try using Llama Index to extract the terms from
text for us!
We can add the following functions to both initialize our LLM, as well as use it to extract terms from the input text.
else:
return ChatOpenAI(temperature=model_temperature, model_name=llm_name, max_
˓→tokens=max_tokens)
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm),
prompt_helper=PromptHelper(max_input_
˓→size=4096,
max_chunk_
˓→overlap=20,
num_
˓→output=1024),
chunk_size_limit=1024)
return terms_to_definition
Now, using the new functions, we can finally extract our terms!
...
with upload_tab:
st.subheader("Extract and Query Definitions")
document_text = st.text_area("Or enter raw text")
if st.button("Extract Terms and Definitions") and document_text:
with st.spinner("Extracting..."):
extracted_terms = extract_terms([Document(document_text)],
term_extract_str, llm_name,
model_temperature, api_key)
st.write(extracted_terms)
There’s a lot going on now, let’s take a moment to go over what is happening.
get_llm() is instantiating the LLM based on the user configuration from the setup tab. Based on the model name, we
need to use the appropriate class (OpenAI vs. ChatOpenAI).
extract_terms() is where all the good stuff happens. First, we call get_llm() with max_tokens=1024, since we
don’t want to limit the model too much when it is extracting our terms and definitions (the default is 256 if not set).
Then, we define our ServiceContext object, aligning num_output with our max_tokens value, as well as setting
the chunk size to be no larger than the output. When documents are indexed by Llama Index, they are broken into
chunks (also called nodes) if they are large, and chunk_size_limit sets the maximum size for these chunks.
Next, we create a temporary list index and pass in our service context. A list index will read every single piece of text
in our index, which is perfect for extracting terms. Finally, we use our pre-defined query text to extract terms, using
response_mode="tree_summarize. This response mode will generate a tree of summaries from the bottom up,
where each parent summarizes its children. Finally, the top of the tree is returned, which will contain all our extracted
terms and definitions.
Lastly, we do some minor post processing. We assume the model followed instructions and put a term/definition pair
on each line. If a line is missing the Term: or Definition: labels, we skip it. Then, we convert this to a dictionary
for easy storage!
Now that we can extract terms, we need to put them somewhere so that we can query for them later. A
GPTVectorStoreIndex should be a perfect choice for now! But in addition, our app should also keep track of which
terms are inserted into the index so that we can inspect them later. Using st.session_state, we can store the current
list of terms in a session dict, unique to each user!
First things first though, let’s add a feature to initialize a global vector index and another function to insert the extracted
terms.
...
if 'all_terms' not in st.session_state:
st.session_state['all_terms'] = DEFAULT_TERMS
...
def insert_terms(terms_to_definition):
for term, definition in terms_to_definition.items():
doc = Document(f"Term: {term}\nDefinition: {definition}")
st.session_state['llama_index'].insert(doc)
@st.cache_resource
def initialize_index(llm_name, model_temperature, api_key):
"""Create the GPTSQLStructStoreIndex object."""
(continues on next page)
3.4. Tutorials 63
LlamaIndex
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
return index
...
with upload_tab:
st.subheader("Extract and Query Definitions")
if st.button("Initialize Index and Reset Terms"):
st.session_state['llama_index'] = initialize_index(llm_name, model_temperature,␣
˓→api_key)
st.session_state['all_terms'] = {}
if "llama_index" in st.session_state:
st.markdown("Either upload an image/screenshot of a document, or enter the text␣
˓→manually.")
st.session_state['terms'] = {}
terms_docs = {}
with st.spinner("Extracting..."):
terms_docs.update(extract_terms([Document(document_text)], term_extract_
˓→str, llm_name, model_temperature, api_key))
st.session_state['terms'].update(terms_docs)
if st.button("Insert terms?"):
with st.spinner("Inserting terms"):
insert_terms(st.session_state['terms'])
st.session_state['all_terms'].update(st.session_state['terms'])
st.session_state['terms'] = {}
st.experimental_rerun()
Now you are really starting to leverage the power of streamlit! Let’s start with the code under the upload tab. We added
a button to initialize the vector index, and we store it in the global streamlit state dictionary, as well as resetting the
currently extracted terms. Then, after extracting terms from the input text, we store it the extracted terms in the global
state again and give the user a chance to review them before inserting. If the insert button is pressed, then we call our
insert terms function, update our global tracking of inserted terms, and remove the most recently extracted terms from
the session state.
With the terms and definitions extracted and saved, how can we use them? And how will the user even remember
what’s previously been saved?? We can simply add some more tabs to the app to handle these features.
...
setup_tab, terms_tab, upload_tab, query_tab = st.tabs(
["Setup", "All Terms", "Upload/Extract Terms", "Query Terms"]
)
...
with terms_tab:
with terms_tab:
st.subheader("Current Extracted Terms and Definitions")
st.json(st.session_state["all_terms"])
...
with query_tab:
st.subheader("Query for Terms/Definitions!")
st.markdown(
(
"The LLM will attempt to answer your query, and augment it's answers using␣
˓→the terms/definitions you've inserted. "
"If a term is not in the index, it will answer using it's internal knowledge.
˓→"
)
)
if st.button("Initialize Index and Reset Terms", key="init_index_2"):
st.session_state["llama_index"] = initialize_index(
llm_name, model_temperature, api_key
)
st.session_state["all_terms"] = {}
if "llama_index" in st.session_state:
query_text = st.text_input("Ask about a term or definition:")
if query_text:
query_text = query_text + "\nIf you can't find the answer, answer the query␣
˓→with the best of your knowledge."
3.4. Tutorials 65
LlamaIndex
Well, actually I hope you’ve been testing as we went. But now, let’s try one complete test.
1. Refresh the app
2. Enter your LLM settings
3. Head over to the query tab
4. Ask the following: What is a bunnyhug?
5. The app should give some nonsense response. If you didn’t know, a bunnyhug is another word for a hoodie, used
by people from the Canadian Prairies!
6. Let’s add this definition to the app. Open the upload tab and enter the following text: A bunnyhug is a
common term used to describe a hoodie. This term is used by people from the Canadian
Prairies.
7. Click the extract button. After a few moments, the app should display the correctly extracted term/definition.
Click the insert term button to save it!
8. If we open the terms tab, the term and definition we just extracted should be displayed
9. Go back to the query tab and try asking what a bunnyhug is. Now, the answer should be correct!
With our base app working, it might feel like a lot of work to build up a useful index. What if we gave the user some
kind of starting point to show off the app’s query capabilities? We can do just that! First, let’s make a small change to
our app so that we save the index to disk after every upload:
def insert_terms(terms_to_definition):
for term, definition in terms_to_definition.items():
doc = Document(f"Term: {term}\nDefinition: {definition}")
st.session_state['llama_index'].insert(doc)
# TEMPORARY - save to disk
st.session_state['llama_index'].storage_context.persist()
Now, we need some document to extract from! The repository for this project used the wikipedia page on New York
City, and you can find the text here.
If you paste the text into the upload tab and run it (it may take some time), we can insert the extracted terms. Make
sure to also copy the text for the extracted terms into a notepad or similar before inserting into the index! We will need
them in a second.
After inserting, remove the line of code we used to save the index to disk. With a starting index now saved, we can
modify our initialize_index function to look like this:
@st.cache_resource
def initialize_index(llm_name, model_temperature, api_key):
"""Create the GPTSQLStructStoreIndex object."""
llm = get_llm(llm_name, model_temperature, api_key)
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=llm))
index = load_index_from_storage(service_context=service_context)
return index
Did you remember to save that giant list of extracted terms in a notepad? Now when our app initializes, we want to
pass in the default terms that are in the index to our global terms state:
...
if "all_terms" not in st.session_state:
st.session_state["all_terms"] = DEFAULT_TERMS
...
Repeat the above anywhere where we were previously resetting the all_terms values.
If you play around with the app a bit now, you might notice that it stopped following our prompt! Remember, we added
to our query_str variable that if the term/definition could not be found, answer to the best of its knowledge. But now
if you try asking about random terms (like bunnyhug!), it may or may not follow those instructions.
This is due to the concept of “refining” answers in Llama Index. Since we are querying across the top 5 matching
results, sometimes all the results do not fit in a single prompt! OpenAI models typically have a max input size of 4097
tokens. So, Llama Index accounts for this by breaking up the matching results into chunks that will fit into the prompt.
After Llama Index gets an initial answer from the first API call, it sends the next chunk to the API, along with the
previous answer, and asks the model to refine that answer.
So, the refine process seems to be messing with our results! Rather than appending extra instructions to the query_str,
remove that, and Llama Index will let us provide our own custom prompts! Let’s create those now, using the default
prompts and chat specific prompts as a guide. Using a new file constants.py, let’s create some new query templates:
# Text QA templates
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information answer the following question "
"(if you don't know the answer, use the best of your knowledge): {query_str}\n"
)
TEXT_QA_TEMPLATE = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
# Refine templates
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {query_str}\n"
"We have provided an existing answer: {existing_answer}\n"
"We have the opportunity to refine the existing answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
(continues on next page)
3.4. Tutorials 67
LlamaIndex
"If you can't improve the existing answer, just repeat it again."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"We have the opportunity to refine the above answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context and using the best of your knowledge, improve the␣
˓→existing answer. "
"If you can't improve the existing answer, just repeat it again."
),
]
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
That seems like a lot of code, but it’s not too bad! If you looked at the default prompts, you might have noticed that
there are default prompts, and prompts specific to chat models. Continuing that trend, we do the same for our custom
prompts. Then, using a prompt selector, we can combine both prompts into a single object. If the LLM being used is
a chat model (ChatGPT, GPT-4), then the chat prompts are used. Otherwise, use the normal prompt templates.
Another thing to note is that we only defined one QA template. In a chat model, this will be converted to a single
“human” message.
So, now we can import these prompts into our app and use them during the query.
If you experiment a bit more with queries, hopefully you notice that the responses follow our instructions a little better
now!
Llama index also supports images! Using Llama Index, we can upload images of documents (papers, letters, etc.), and
Llama Index handles extracting the text. We can leverage this to also allow users to upload images of their documents
and extract terms and definitions from them.
If you get an import error about PIL, install it using pip install Pillow first.
@st.cache_resource
def get_file_extractor():
image_parser = ImageParser(keep_image=True, parse_text=True)
file_extractor = DEFAULT_FILE_EXTRACTOR
file_extractor.update(
{
".jpg": image_parser,
".png": image_parser,
".jpeg": image_parser,
}
)
return file_extractor
file_extractor = get_file_extractor()
...
with upload_tab:
st.subheader("Extract and Query Definitions")
if st.button("Initialize Index and Reset Terms", key="init_index_1"):
st.session_state["llama_index"] = initialize_index(
llm_name, model_temperature, api_key
)
st.session_state["all_terms"] = DEFAULT_TERMS
if "llama_index" in st.session_state:
st.markdown(
"Either upload an image/screenshot of a document, or enter the text manually.
˓→"
)
uploaded_file = st.file_uploader(
"Upload an image/screenshot of a document:", type=["png", "jpg", "jpeg"]
)
document_text = st.text_area("Or enter raw text")
(continues on next page)
3.4. Tutorials 69
LlamaIndex
if st.button("Insert terms?"):
with st.spinner("Inserting terms"):
insert_terms(st.session_state["terms"])
st.session_state["all_terms"].update(st.session_state["terms"])
st.session_state["terms"] = {}
st.experimental_rerun()
Here, we added the option to upload a file using Streamlit. Then the image is opened and saved to disk (this seems
hacky but it keeps things simple). Then we pass the image path to the reader, extract the documents/text, and remove
our temp image file.
Now that we have the documents, we can call extract_terms() the same as before.
Conclusion/TLDR
In this tutorial, we covered a ton of information, while solving some common issues and problems along the way:
• Using different indexes for different use cases (List vs. Vector index)
• Storing global state values with Streamlit’s session_state concept
• Customizing internal prompts with Llama Index
• Reading text from images with Llama Index
The final version of this tutorial can be found here and a live hosted demo is available on Huggingface Spaces.
Setup
In this example, we will analyze Wikipedia articles of different cities: Boston, Seattle, San Francisco, and more.
The below code snippet downloads the relevant data into files.
data_path = Path('data')
(continues on next page)
3.4. Tutorials 71
LlamaIndex
We will now define a set of indexes and graphs over your data. You can think of each index/graph as a lightweight
structure that solves a distinct use case.
We will first define a vector index over the documents of each city.
Querying a vector index lets us easily perform semantic search over a given city’s documents.
Example response:
The sports teams in Toronto are the Toronto Maple Leafs (NHL), Toronto Blue Jays (MLB),␣
˓→Toronto Raptors (NBA), Toronto Argonauts (CFL), Toronto FC (MLS), Toronto Rock (NLL),␣
We will now define a composed graph in order to run compare/contrast queries (see use cases doc). This graph
contains a keyword table composed on top of existing vector indexes.
To do this, we first want to set the “summary text” for each vector index.
index_summaries = {}
for wiki_title in wiki_titles:
# set summary text for city
index_summaries[wiki_title] = (
f"This content contains Wikipedia articles about {wiki_title}. "
f"Use this index if you need to lookup specific facts about {wiki_title}.\n"
"Do not use this index if you want to analyze multiple cities."
)
Next, we compose a keyword table on top of these vector indexes, with these indexes and summaries, in order to build
the graph.
graph = ComposableGraph.from_indices(
GPTSimpleKeywordTableIndex,
[index for _, index in vector_indices.items()],
[summary for _, summary in index_summaries.items()],
max_keywords_per_chunk=50
)
Querying this graph (with a query transform module), allows us to easily compare/contrast between different cities. An
example is shown below.
# define decompose_transform
from llama_index.indices.query.query_transform.base import DecomposeQueryTransform
decompose_transform = DecomposeQueryTransform(
llm_predictor_chatgpt, verbose=True
)
3.4. Tutorials 73
LlamaIndex
Now that we’ve defined the set of indexes/graphs, we want to build an outer abstraction layer that provides a unified
query interface to our data structures. This means that during query-time, we can query this outer abstraction layer and
trust that the right index/graph will be used for the job.
There are a few ways to do this, both within our framework as well as outside of it!
• Build a router query engine on top of your existing indexes/graphs
• Define each index/graph as a Tool within an agent framework (e.g. LangChain).
For the purposes of this tutorial, we follow the former approach. If you want to take a look at how the latter approach
works, take a look at our example tutorial here.
Let’s take a look at an example of building a router query engine to automatically “route” any query to the set of
indexes/graphs that you have define under the hood.
First, we define the query engines for the set of indexes/graph that we want to route our query to. We also give each a
description (about what data it holds and what it’s useful for) to help the router choose between them depending on the
specific query.
query_engine_tools = []
query_engine = index.as_query_engine(service_context=service_context)
vector_tool = QueryEngineTool.from_defaults(query_engine, description=summary)
query_engine_tools.append(vector_tool)
query_engine_tools.append(graph_tool)
Now, we can define the routing logic and overall router query engine. Here, we use the LLMSingleSelector, which
uses LLM to choose a underlying query engine to route the query to.
router_query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(service_context=service_context),
query_engine_tools=query_engine_tools
)
The advantage of a unified query interface is that it can now handle different types of queries.
It can now handle queries about specific cities (by routing to the specific city vector index), and also compare/contrast
different cities.
Let’s take a look at a few examples!
Asking a Compare/Contrast Question
This “outer” abstraction is able to handle different queries by routing to the right underlying abstractions.
3.4. Tutorials 75
LlamaIndex
3.5 Notebooks
We offer a wide variety of example notebooks. They are referenced throughout the documentation.
Example notebooks are found here.
At a high-level, LlamaIndex gives you the ability to query your data for any downstream LLM use case, whether it’s
question-answering, summarization, or a component in a chatbot.
This section describes the different ways you can query your data with LlamaIndex, roughly in order of simplest (top-k
semantic search), to more advanced capabilities.
The most basic example usage of LlamaIndex is through semantic search. We provide a simple in-memory vector store
for you to get started, but you can also choose to use any one of our vector store integrations:
Relevant Resources:
• Quickstart
• Example notebook
3.6.2 Summarization
A summarization query requires the LLM to iterate through many if not most documents in order to synthesize an
answer. For instance, a summarization query could look like one of the following:
• “What is a summary of this collection of text?”
• “Give me a summary of person X’s experience with the company.”
In general, a list index would be suited for this use case. A list index by default goes through all the data.
Empirically, setting response_mode="tree_summarize" also leads to better summarization results.
index = GPTListIndex.from_documents(documents)
query_engine = index.as_query_engine(
response_mode="tree_summarize"
)
response = query_engine.query("<summarization_query>")
LlamaIndex supports queries over structured data, whether that’s a Pandas DataFrame or a SQL Database.
Here are some relevant resources:
• Guide on Text-to-SQL
• SQL Demo Notebook 1
• SQL Demo Notebook 2 (Context)
• SQL Demo Notebook 3 (Big tables)
• Pandas Demo Notebook.
LlamaIndex supports synthesizing across heterogeneous data sources. This can be done by composing a graph over
your existing data. Specifically, compose a list index over your subindices. A list index inherently combines information
for each node; therefore it can synthesize information across your heterogeneous data sources.
index1 = GPTVectorStoreIndex.from_documents(notion_docs)
index2 = GPTVectorStoreIndex.from_documents(slack_docs)
query_engine = graph.as_query_engine()
response = query_engine.query("<query_str>")
LlamaIndex also supports routing over heterogeneous data sources with RouterQueryEngine - for instance, if you
want to “route” a query to an underlying Document or a sub-index.
To do this, first build the sub-indices over different data sources. Then construct the corresponding query engines, and
give each query engine a description to obtain a QueryEngineTool.
...
# define sub-indices
index1 = GPTVectorStoreIndex.from_documents(notion_docs)
index2 = GPTVectorStoreIndex.from_documents(slack_docs)
(continues on next page)
Then, we define a RouterQueryEngine over them. By default, this uses a LLMSingleSelector as the router, which
uses the LLM to choose the best sub-index to router the query to, given the descriptions.
query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[tool1, tool2]
)
response = query_engine.query(
"In Notion, give me a summary of the product roadmap."
)
LlamaIndex can support compare/contrast queries as well. It can do this in the following fashion:
• Composing a graph over your data
• Adding in query transformations.
You can perform compare/contrast queries by just composing a graph over your data.
Here are some relevant resources:
• Composability
• SEC 10-k Analysis Example notebook.
You can also perform compare/contrast queries with a query transformation module.
This module will help break down a complex query into a simpler one over your existing index structure.
Here are some relevant resources:
• Query Transformations
• City Analysis Example Notebook
LlamaIndex can also support multi-step queries. Given a complex query, break it down into subquestions.
For instance, given a question “Who was in the first batch of the accelerator program the author started?”, the module
will first decompose the query into a simpler initial question “What was the accelerator program the author started?”,
query the index, and then ask followup questions.
Here are some relevant resources:
• Query Transformations
• Multi-Step Query Decomposition Notebook
LlamaIndex modules provide plug and play data loaders, data structures, and query interfaces. They can be used in
your downstream LLM Application. Some of these applications are described below.
3.7.1 Chatbots
Chatbots are an incredibly popular use case for LLM’s. LlamaIndex gives you the tools to build Knowledge-augmented
chatbots and agents.
Relevant Resources:
• Building a Chatbot
• Using with a LangChain Agent
LlamaIndex can be integrated into a downstream full-stack web application. It can be used in a backend server (such
as Flask), packaged into a Docker container, and/or directly used in a framework such as Streamlit.
We provide tutorials and resources to help you get started in this area.
Relevant Resources:
• Fullstack Application Guide
• LlamaIndex Starter Pack
Our data connectors are offered through LlamaHub . LlamaHub is an open-source repository containing data loaders
that you can easily plug and play into any LlamaIndex application.
GoogleDocsReader = download_loader('GoogleDocsReader')
(continues on next page)
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
query_engine.query('Where did the author go to school?')
At the core of LlamaIndex is a set of index data structures. You can choose to use them on their own, or you can choose
to compose a graph over these data structures.
In the following sections, we detail how each index structure works, as well as some of the key capabilities our in-
dices/graphs provide.
Insertion
You can “insert” a new Document into any index data structure, after building the index initially. The underlying
mechanism behind insertion depends on the index structure. For instance, for the list index, a new Document is inserted
as additional node(s) in the list. For the vector store index, a new Document (and embedding) is inserted into the
underlying document/embedding store.
An example notebook showcasing our insert capabilities is given here. In this notebook we showcase how to construct
an empty index, manually create Document objects, and add those to our index data structures.
An example code snippet is given below:
index = GPTListIndex([])
embed_model = OpenAIEmbedding()
doc_chunks = []
for i, text in enumerate(text_chunks):
doc = Document(text, doc_id=f"doc_id_{i}")
doc_chunks.append(doc)
# insert
for doc_chunk in doc_chunks:
index.insert(doc_chunk)
Deletion
You can “delete” a Document from most index data structures by specifying a document_id. (NOTE: the tree index
currently does not support deletion). All nodes corresponding to the document will be deleted.
NOTE: In order to delete a Document, that Document must have a doc_id specified when first loaded into the index.
index.delete("doc_id_0")
Update
If a Document is already present within an index, you can “update” a Document with the same doc_id (for instance,
if the information in the Document has changed).
3.9.2 Composability
LlamaIndex offers composability of your indices, meaning that you can build indices on top of other indices. This
allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.
Composability allows you to to define lower-level indices for each document, and higher-order indices over a collection
of documents. To see how this works, imagine defining 1) a tree index for the text within each document, and 2) a list
index over each tree index (one document) within your collection.
Defining Subindices
To see how this works, imagine you have 3 documents: doc1, doc2, and doc3.
doc1 = SimpleDirectoryReader('data1').load_data()
doc2 = SimpleDirectoryReader('data2').load_data()
doc3 = SimpleDirectoryReader('data3').load_data()
Now let’s define a tree index for each document. In Python, we have:
index1 = GPTTreeIndex.from_documents(doc1)
index2 = GPTTreeIndex.from_documents(doc2)
index3 = GPTTreeIndex.from_documents(doc3)
You then need to explicitly define summary text for each subindex. This allows
the subindices to be used as Documents for higher-level indices.
index1_summary = "<summary1>"
index2_summary = "<summary2>"
index3_summary = "<summary3>"
You may choose to manually specify the summary text, or use LlamaIndex itself to generate a summary, for instance
with the following:
summary = index1.query(
"What is a summary of this document?", retriever_mode="all_leaf"
)
index1_summary = str(summary)
If specified, this summary text for each subindex can be used to refine the answer during query-time.
We can then create a graph with a list index on top of these 3 tree indices: We can query, save, and load the graph
to/from disk as any other index.
graph = ComposableGraph.from_indices(
GPTListIndex,
[index1, index2, index3],
index_summaries=[index1_summary, index2_summary, index3_summary],
)
During a query, we would start with the top-level list index. Each node in the list corresponds to an underlying tree index.
The query will be executed recursively, starting from the root index, then the sub-indices. The default query engine
for each index is called under the hood (i.e. index.as_query_engine()), unless otherwise configured by passing
custom_query_engines to the ComposableGraphQueryEngine. Below we show an example that configure the tree
index retrievers to use child_branch_factor=2 (instead of the default child_branch_factor=1).
More detail on how to configure ComposableGraphQueryEngine can be found here.
Note that specifying custom retriever for index by id might require you to inspect e.g., index1.
index_struct.index_id. Alternatively, you can explicitly set it as follows:
index1.index_struct.index_id = "<index_id_1>"
index2.index_struct.index_id = "<index_id_2>"
index3.index_struct.index_id = "<index_id_3>"
So within a node, instead of fetching the text, we would recursively query the stored tree index to retrieve our answer.
NOTE: You can stack indices as many times as you want, depending on the hierarchies of your knowledge base!
We can take a look at a code example below as well. We first build two tree indices, one over the Wikipedia NYC page,
and the other over Paul Graham’s essay. We then define a keyword extractor index over the two tree indices.
Here is an example notebook.
Progressive disclosure of complexity is a design philosophy that aims to strike ka balance between the needs of begin-
ners and experts. The idea is that you should give users the simplest and most straightforward interface or experience
possible when they first encounter a system or product, but then gradually reveal more complexity and advanced features
as users become more familiar with the system. This can help prevent users from feeling overwhelmed or intimidated
by a system that seems too complex, while still giving experienced users the tools they need to accomplish advanced
tasks.
In the case of LlamaIndex, we’ve tried to balance simplicity and complexity by providing a high-level API that’s easy to
use out of the box, but also a low-level composition API that gives experienced users the control they need to customize
the system to their needs. By doing this, we hope to make LlamaIndex accessible to beginners while still providing the
flexibility and power that experienced users need.
3.10.2 Resources
• The basic query interface over an index is found in our usage pattern guide. The guide details how to specify
parameters for a retriever/synthesizer/query engine over a single index structure.
• A more advanced query interface is found in our composability guide. The guide describes how to specify a
graph over multiple index structures.
• We also provide a guide to some of our more advanced components, which can be added to a retriever or a query
engine. See our Query Transformations and Node Postprocessor modules.
Query Transformations
LlamaIndex allows you to perform query transformations over your index structures. Query transformations are mod-
ules that will convert a query into another query. They can be single-step, as in the transformation is run once before
the query is executed against an index.
They can also be multi-step, as in:
1. The query is transformed, executed against an index,
2. The response is retrieved.
3. Subsequent queries are transformed/executed in a sequential fashion.
We list some of our query transformations in more detail below.
Use Cases
HyDE is a technique where given a natural language query, a hypothetical document/answer is generated first. This
hypothetical document is then used for embedding lookup rather than the raw query.
To use HyDE, an example code snippet is shown below.
Some recent approaches (e.g. self-ask, ReAct) have suggested that LLM’s perform better at answering complex ques-
tions when they break the question into smaller steps. We have found that this is true for queries that require knowledge
augmentation as well.
If your query is complex, different parts of your knowledge base may answer different “subqueries” around the overall
query.
Our single-step query decomposition feature transforms a complicated question into a simpler one over the data col-
lection to help provide a sub-answer to the original question.
This is especially helpful over a composed graph. Within a composed graph, a query can be routed to multiple
subindexes, each representing a subset of the overall knowledge corpus. Query decomposition allows us to transform
the query into a more suitable question over any given index.
An example image is shown below.
# configure retrievers
vector_query_engine = vector_index.as_query_engine()
vector_query_engine = TransformQueryEngine(
vector_query_engine,
query_transform=decompose_transform
transform_extra_info={'index_summary': vector_index.index_struct.summary}
)
custom_query_engines = {
vector_index.index_id: vector_query_engine
}
# query
query_str = (
"Compare and contrast the airports in Seattle, Houston, and Toronto. "
)
query_engine = graph.as_query_engine(custom_query_engines=custom_query_engines)
response = query_engine.query(query_str)
Multi-step query transformations are a generalization on top of existing single-step query transformation approaches.
Given an initial, complex query, the query is transformed and executed against an index. The response is retrieved from
the query. Given the response (along with prior responses) and the query, followup questions may be asked against the
index as well. This technique allows a query to be run against a single knowledge source until that query has satisfied
all questions.
An example image is shown below.
query_engine = index.as_query_engine()
query_engine = MultiStepQueryEngine(query_engine, query_transform=step_decompose_
˓→transform)
response = query_engine.query(
"Who was in the first batch of the accelerator program the author started?",
)
print(str(response))
Node Postprocessor
By default, when a query is executed on an index or a composed graph, LlamaIndex performs the following steps:
1. Retrieval step: Retrieve a set of nodes from the index given the query. For instance, with a vector index, this
would be top-k relevant nodes; with a list index this would be all nodes.
2. Synthesis step: Synthesize a response over the set of nodes.
LlamaIndex provides a set of “postprocessor” modules that can augment the retrieval process in (1). The process is
very simple. After the retrieval step, we can analyze the initial set of nodes and add a “processing” step to refine this
set of nodes - whether its by filtering out irrelevant nodes, adding more nodes, and more.
This is a simple but powerful step. This allows us to perform tasks like keyword filtering, as well as temporal reasoning
over your data.
We first provide the high-level API interface, and provide some example modules, and finally discuss usage.
We are also very open to contributions! Take a look at our contribution guide if you are interested in contributing a
Postprocessor.
API Interface
The base class is BaseNodePostprocessor, and the API interface is very simple:
class BaseNodePostprocessor:
"""Node postprocessor."""
@abstractmethod
def postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
) -> List[NodeWithScore]:
"""Postprocess nodes."""
It takes in a list of Node objects, and outputs another list of Node objects.
The full API reference can be found here.
Example Usage
Index querying
query_engine = index.as_query_engine(
similarity_top_k=3,
(continues on next page)
The module can also be used on its own as part of a broader flow. For instance, here’s an example where you choose
to manually postprocess an initial set of source nodes.
Example Modules
Default Postprocessors
These postprocessors are simple modules that are already included by default.
KeywordNodePostprocessor
A simple postprocessor module where you are able to specify required_keywords or exclude_keywords. This
will filter out nodes that don’t have required keywords, or contain excluded keywords.
SimilarityPostprocessor
Previous/Next Postprocessors
These postprocessors are able to exploit temporal relationships between nodes (e.g. prev/next relationships) in order to
retrieve additional context, in the event that the existing context may not directly answer the question. They augment
the set of retrieved nodes with context either in the future or the past (or both).
The most basic version is PrevNextNodePostprocessor, which takes a fixed num_nodes as well as mode specifying
“previous”, “next”, or “both”.
We also have AutoPrevNextNodePostprocessor, which is able to infer the previous, next direction.
Recency Postprocessors
These postprocessors are able to ensure that only the most recent data is used as context, and that out of date context
information is filtered out.
Imagine that you have three versions of a document, with slight changes between versions. For instance, this document
may be describing patient history. If you ask a question over this data, you would want to make sure that you’re
referencing the latest document, and that out of date information is not passed in.
We support recency filtering through the following modules.
FixedRecencyPostProcessor: sorts retrieved nodes by date in reverse order, and takes a fixed top-k set of nodes.
EmbeddingRecencyPostprocessor: sorts retrieved nodes by date in reverse order, and then looks at subsequent
nodes and filters out nodes that have high embedding similarity with the current node. This allows us to maintain
recent Nodes that have “distinct” context, but filter out overlapping Nodes that are outdated and overlap with more
recent context.
TimeWeightedPostprocessor: adds time-weighting to retrieved nodes, using the formula (1-time_decay) **
hours_passed. The recency score is added to any score that the node already contains.
3.11 Customization
The goal of LlamaIndex is to provide a toolkit of data structures that can organize external information in a manner
that is easily compatible with the prompt limitations of an LLM. Therefore LLMs are always used to construct the final
answer. Depending on the type of index being used, LLMs may also be used during index construction, insertion, and
query traversal.
LlamaIndex uses Langchain’s LLM and LLMChain module to define the underlying abstraction. We introduce a wrap-
per class, LLMPredictor, for integration into LlamaIndex.
We also introduce a PromptHelper class, to allow the user to explicitly set certain constraint parameters, such as
maximum input size (default is 4096 for davinci models), number of generated output tokens, maximum chunk overlap,
and more.
By default, we use OpenAI’s text-davinci-003 model. But you may choose to customize the underlying LLM being
used.
Below we show a few examples of LLM customization. This includes
• changing the underlying LLM
• changing the number of output tokens (for OpenAI, Cohere, or AI21)
• having more fine-grained control over all parameters for any LLM, from input size to chunk overlap
An example snippet of customizing the LLM being used is shown below. In this ex-
ample, we use text-davinci-002 instead of text-davinci-003. Available mod-
els include text-davinci-003,text-curie-001,text-babbage-001,text-ada-001,
code-davinci-002,code-cushman-001. Note that you may plug in any LLM shown on Langchain’s LLM
page.
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# build index
index = GPTKeywordTableIndex.from_documents(documents, service_context=service_context)
Example: Changing the number of output tokens (for OpenAI, Cohere, AI21)
The number of output tokens is usually set to some low number by default (for instance, with OpenAI the default is
256).
For OpenAI, Cohere, AI21, you just need to set the max_tokens parameter (or maxTokens for AI21). We will handle
text chunking/calculations under the hood.
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002",␣
˓→max_tokens=512))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# build index
index = GPTKeywordTableIndex.from_documents(documents, service_context=service_context)
If you are using other LLM classes from langchain, please see below.
3.11. Customization 99
LlamaIndex
To have fine-grained control over all parameters, you will need to define a custom PromptHelper class.
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002",␣
˓→max_tokens=num_output))
# build index
index = GPTKeywordTableIndex.from_documents(documents, service_context=service_context)
To use a custom LLM model, you only need to implement the LLM class from Langchain. You will be responsible for
passing the text to the model and returning the newly generated tokens.
Here is a small example using locally running FLAN-T5 model and Huggingface’s pipeline abstraction:
import torch
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex,␣
˓→PromptHelper
class CustomLLM(LLM):
model_name = "facebook/opt-iml-max-30b"
pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_
˓→kwargs={"torch_dtype":torch.bfloat16})
@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}
@property
def _llm_type(self) -> str:
return "custom"
Using this method, you can use any LLM. Maybe you have one running locally, or running on your own server. As
long as the class is implemented and the generated tokens are returned, it should work out. Note that we need to use
the prompt helper to customize the prompt sizes, since every model has a slightly different context length.
Note that you may have to adjust the internal prompts to get good performance. Even then, you should be using a
sufficiently large LLM to ensure it’s capable of handling the complex queries that LlamaIndex uses internally, so your
mileage may vary.
A list of all default internal prompts is available here, and chat-specific prompts are listed here. You can also implement
your own custom prompts, as described here.
Prompting is the fundamental input that gives LLMs their expressive power. LlamaIndex uses prompts to build the
index, do insertion, perform traversal during querying, and to synthesize the final answer.
LlamaIndex uses a finite set of prompt types, described here. All index classes, along with their associated queries,
utilize a subset of these prompts. The user may provide their own prompt. If the user does not provide their own
prompt, default prompts are used.
NOTE: The majority of custom prompts are typically passed in during query-time, not during index construc-
tion. For instance, both the QuestionAnswerPrompt and RefinePrompt are used during query-time to syn-
thesize an answer. Some indices do use prompts during index construction to build the index; for instance,
GPTTreeIndex uses a SummaryPrompt to hierarchically summarize the nodes, and GPTKeywordTableIndex uses a
KeywordExtractPrompt to extract keywords. Some indices do allow QuestionAnswerPrompt and RefinePrompt
to be passed in during index construction, but that usage is deprecated.
An API reference of all query classes and index classes (used for index construction) are found below. The definition
of each query class and index class contains optional prompts that the user may pass in.
• Queries
• Indices
Example
# load documents
documents = SimpleDirectoryReader('data').load_data()
query_engine = index.as_query_engine(
text_qa_template=QA_PROMPT
)
response = query_engine.query(query_str)
print(response)
Check out the reference documentation for a full set of all prompts.
You can pass in user-specified embeddings when constructing an index. This gives you control in specifying embed-
dings per Document instead of having us determine embeddings for your text (see below).
Simply specify the embedding field when creating a Document:
Please see the corresponding section in our Vector Stores guide for more details.
LlamaIndex provides embedding support to our tree and list indices. In addition to each node storing text, each node
can optionally store an embedding. During query-time, we can use embeddings to do max-similarity retrieval of nodes
before calling the LLM to synthesize an answer. Since similarity lookup using embeddings (e.g. using cosine similarity)
does not require a LLM call, embeddings serve as a cheaper lookup mechanism instead of using LLMs to traverse nodes.
Since we offer embedding support during query-time for our list and tree indices, embeddings are lazily generated and
then cached (if retriever_mode="embedding" is specified during query(...)), and not during index construction.
This design choice prevents the need to generate embeddings for all text chunks during index construction.
NOTE: Our vector-store based indices generate embeddings during index construction.
Embedding Lookups
Custom Embeddings
LlamaIndex allows you to define custom embedding modules. By default, we use text-embedding-ada-002 from
OpenAI.
You can also choose to plug in embeddings from Langchain’s embeddings module. We introduce a wrapper class,
LangchainEmbedding, for integration into LlamaIndex.
An example snippet is shown below (to use Hugging Face embeddings) on the GPTListIndex:
# build index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
new_index = GPTListIndex.from_documents(documents)
# load index
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
new_index = GPTVectorStoreIndex.from_documents(
documents,
service_context=service_context,
)
By default, LlamaIndex hides away the complexities and let you query your data in under 5 lines of code:
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Summarize the documents.")
Under the hood, LlamaIndex also supports a swappable storage layer that allows you to customize where ingested
documents (i.e., Node objects), embedding vectors, and index metadata are stored.
Low-Level API
index = GPTVectorStoreIndex.from_documents(documents)
# build index
index = GPTVectorStoreIndex(nodes, storage_context=storage_context)
You can customize the underlying storage with a one-line change to instantiate different document stores, index stores,
and vector stores. See Document Stores, Vector Stores, Index Stores guides for more details.
LlamaIndex provides a variety of tools for analysis and optimization of your indices and queries. Some of our tools
involve the analysis/ optimization of token usage and cost.
We also offer a Playground module, giving you a visual means of analyzing the token usage of various index structures
+ performance.
Each call to an LLM will cost some amount of money - for instance, OpenAI’s Davinci costs $0.02 / 1k tokens. The
cost of building an index and querying depends on
• the type of LLM used
• the type of data structure used
• parameters used during building
• parameters used during querying
The cost of building and querying each index is a TODO in the reference documentation. In the meantime, we provide
the following information:
1. A high-level overview of the cost structure of the indices.
2. A token predictor that you can use directly within LlamaIndex!
The following indices don’t require LLM calls at all during building (0 cost):
• GPTListIndex
• GPTSimpleKeywordTableIndex - uses a regex keyword extractor to extract keywords from each document
• GPTRAKEKeywordTableIndex - uses a RAKE keyword extractor to extract keywords from each document
Query Time
There will always be >= 1 LLM call during query time, in order to synthesize the final answer. Some indices contain
cost tradeoffs between index building and querying. GPTListIndex, for instance, is free to build, but running a query
over a list index (without filtering or embedding lookups), will call the LLM 𝑁 times.
Here are some notes regarding each of the indices:
• GPTListIndex: by default requires 𝑁 LLM calls, where N is the number of nodes.
• GPTTreeIndex: by default requires log(𝑁 ) LLM calls, where N is the number of leaf nodes.
– Setting child_branch_factor=2 will be more expensive than the default child_branch_factor=1
(polynomial vs logarithmic), because we traverse 2 children instead of just 1 for each parent node.
• GPTKeywordTableIndex: by default requires an LLM call to extract query keywords.
– Can do index.as_retriever(retriever_mode="simple") or index.
as_retriever(retriever_mode="rake") to also use regex/RAKE keyword extractors on your
query text.
LlamaIndex offers token predictors to predict token usage of LLM and embedding calls. This allows you to estimate
your costs during 1) index construction, and 2) index querying, before any respective LLM calls are made.
Using MockLLMPredictor
To predict token usage of LLM calls, import and instantiate the MockLLMPredictor with the following:
llm_predictor = MockLLMPredictor(max_tokens=256)
You can then use this predictor during both index construction and querying. Examples are given below.
Index Construction
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
# the "mock" llm predictor is our token counter
llm_predictor = MockLLMPredictor(max_tokens=256)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
# pass the "mock" llm_predictor into GPTTreeIndex during index construction
index = GPTTreeIndex.from_documents(documents, service_context=service_context)
Index Querying
query_engine = index.as_query_engine(
service_context=service_context
)
response = query_engine.query("What did the author do growing up?")
Using MockEmbedding
You may also predict the token usage of embedding calls with MockEmbedding. You can use it in tandem with
MockLLMPredictor.
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
service_context=service_context
)
response = query_engine.query(
"What did the author do after his time at Y Combinator?",
)
3.12.2 Playground
The Playground module in LlamaIndex is a way to automatically test your data (i.e. documents) across a diverse
combination of indices, models, embeddings, modes, etc. to decide which ones are best for your purposes. More
options will continue to be added.
For each combination, you’ll be able to compare the results for any query and compare the answers, latency, tokens
used, and so on.
You may initialize a Playground with a list of pre-built indices, or initialize one from a list of Documents using the
preset indices.
Sample Code
# load data
WikipediaReader = download_loader("WikipediaReader")
loader = WikipediaReader()
documents = loader.load_data(pages=['Berlin'])
# initialize playground
playground = Playground(indices=indices)
# playground compare
playground.compare("What is the population of Berlin?")
API Reference
Example Notebook
3.12.3 Optimizers
print("With optimization")
start_time = time.time()
query_engine = index.as_query_engine(
optimizer=SentenceEmbeddingOptimizer(percentile_cutoff=0.5)
)
res = query_engine.query("What is the population of Berlin?")
end_time = time.time()
print("Total time elapsed: {}".format(end_time - start_time))
print("Answer: {}".format(res))
Output:
Without optimization
INFO:root:> [query] Total LLM token usage: 3545 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens
Total time elapsed: 2.8928110599517822
Answer:
The population of Berlin in 1949 was approximately 2.2 million inhabitants. After the␣
˓→fall of the Berlin Wall in 1989, the population of Berlin increased to approximately 3.
With optimization
INFO:root:> [optimize] Total embedding token usage: 7 tokens
INFO:root:> [query] Total LLM token usage: 1779 tokens
INFO:root:> [query] Total embedding token usage: 7 tokens
Total time elapsed: 2.346346139907837
Answer:
The population of Berlin is around 4.5 million.
API Reference
3.13.1 Guardrails
Guardrails is an open-source Python package for specification/validation/correction of output schemas. See below for
a code example.
# specify StructuredLLMPredictor
# this is a special LLMPredictor that allows for structured outputs
<output>
<list name="points" description="Bullet points regarding events in the author's life.
˓→">
<object>
<string name="explanation" format="one-line" on-fail-one-line="noop" />
<string name="explanation2" format="one-line" on-fail-one-line="noop" />
(continues on next page)
<prompt>
@xml_prefix_prompt
{output_schema}
@json_suffix_prompt_v2_wo_none
</prompt>
</rail>
""")
Output:
3.13.2 Langchain
Langchain also offers output parsing modules that you can use within LlamaIndex.
# query index
query_engine = index.as_query_engine(
service_context=ServiceContext.from_defaults(
llm_predictor=llm_predictor
),
text_qa_temjlate=qa_prompt,
refine_template=refine_prompt,
)
response = query_engine.query(
"What are a few things the author did growing up?",
)
print(str(response))
Output:
{'Education': 'Before college, the author wrote short stories and experimented with␣
˓→programming on an IBM 1401.', 'Work': 'The author worked on writing and programming␣
˓→outside of school.'}
3.14 Evaluation
LlamaIndex offers a few key modules for evaluating the quality of both Document retrieval and response synthesis.
Here are some key questions for each component:
• Document retrieval: Are the sources relevant to the query?
• Response synthesis: Does the response match the retrieved context? Does it also match the query?
This guide describes how the evaluation components within LlamaIndex work. Note that our current evaluation mod-
ules do not require ground-truth labels. Evaluation can be done with some combination of the query, context, response,
and combine these with LLM calls.
Each response from an query_engine.query calls returns both the synthesized response as well as source documents.
We can evaluate the response against the retrieved sources - without taking into account the query!
This allows you to measure hallucination - if the response does not match the retrieved sources, this means that the
model may be “hallucinating” an answer since it is not rooting the answer in the context provided to it in the prompt.
There are two sub-modes of evaluation here. We can either get a binary response “YES”/”NO” on whether response
matches any source context, and also get a response list across sources to see which sources match.
Binary Evaluation
This mode of evaluation will return “YES”/”NO” if the synthesized response matches any source context.
# build index
...
# define evaluator
evaluator = ResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American␣
˓→Revolution?")
eval_result = evaluator.evaluate(response)
print(str(eval_result))
Diagram
Sources Evaluation
This mode of evaluation will return “YES”/”NO” for every source node.
# build index
...
# define evaluator
evaluator = ResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American␣
˓→Revolution?")
eval_result = evaluator.evaluate_source_nodes(response)
print(str(eval_result))
You’ll get back a list of “YES”/”NO”, corresponding to each source node in response.source_nodes.
Notebook
This is similar to the above section, except now we also take into account the query. The goal is to determine if the
response + source context answers the query.
As with the above, there are two submodes of evaluation.
• We can either get a binary response “YES”/”NO” on whether the response matches the query, and whether any
source node also matches the query.
• We can also ignore the synthesized response, and check every source node to see if it matches the query.
Binary Evaluation
This mode of evaluation will return “YES”/”NO” if the synthesized response matches the query + any source context.
# build index
...
# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American␣
˓→Revolution?")
eval_result = evaluator.evaluate(response)
print(str(eval_result))
Diagram
Sources Evaluation
This mode of evaluation will look at each source node, and see if each source node contains an answer to the query.
# build index
...
# define evaluator
evaluator = QueryResponseEvaluator(service_context=service_context)
# query index
query_engine = vector_index.as_query_engine()
response = query_engine.query("What battles took place in New York City in the American␣
˓→Revolution?")
eval_result = evaluator.evaluate_source_nodes(response)
print(str(eval_result))
Diagram
Notebook
3.15 Integrations
LlamaIndex provides a diverse range of integrations with other toolsets and storage providers.
Some of these integrations are provided in more detailed guides below.
LlamaIndex offers multiple integration points with vector stores / vector databases:
1. LlamaIndex can load data from vector stores, similar to any other data connector. This data can then be used
within LlamaIndex data structures.
2. LlamaIndex can use a vector store itself as an index. Like any other index, this index can store documents and
be used to answer queries.
LlamaIndex supports loading data from the following sources. See Data Connectors for more details and API docu-
mentation.
• Chroma (ChromaReader) Installation
• DeepLake (DeepLakeReader) Installation
• Qdrant (QdrantReader) Installation Python Client
• Weaviate (WeaviateReader). Installation. Python Client.
• Pinecone (PineconeReader). Installation/Quickstart.
• Faiss (FaissReader). Installation.
• Milvus (MilvusReader). Installation
• Zilliz (MilvusReader). Quickstart
• MyScale (MyScaleReader). Quickstart. Installation/Python Client.
Chroma stores both documents and vectors. This is an example of how to use Chroma:
query_engine = index.as_query_engine()
response = query_engine.query("<query_text>")
display(Markdown(f"<b>{response}</b>"))
Qdrant also stores both documents and vectors. This is an example of how to use Qdrant:
reader = QdrantReader(host="localhost")
NOTE: Since Weaviate can store a hybrid of document and vector objects, the user may either choose to explicitly
specify class_name and properties in order to query documents, or they may choose to specify a raw GraphQL
query. See below for usage.
NOTE: Both Pinecone and Faiss data loaders assume that the respective data sources only store vectors; text content
is stored elsewhere. Therefore, both data loaders require that the user specifies an id_to_text_map in the load_data
call.
For instance, this is an example usage of the Pinecone data loader PineconeReader:
id_to_text_map = {
"id1": "text blob 1",
"id2": "text blob 2",
}
documents = reader.load_data(
index_name="quickstart", id_to_text_map=id_to_text_map, top_k=3, vector=query_vector,
˓→ separate_documents=True
LlamaIndex also supports different vector stores as the storage backend for GPTVectorStoreIndex.
A detailed API reference is found here.
Similar to any other index within LlamaIndex (tree, keyword table, list), GPTVectorStoreIndex can be constructed
upon any collection of documents. We use the vector store within the index to store embeddings for the input text
chunks.
Once constructed, the index can be used for querying.
Default Vector Store Index Construction/Querying
By default, GPTVectorStoreIndex uses a in-memory SimpleVectorStore that’s initialized as part of the default
storage context.
# Query index
query_engine = index.as_query_engine()
(continues on next page)
# Query index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
Below we show more examples of how to construct various vector stores we support.
DeepLake
import os
import getpath
from llama_index.vector_stores import DeepLakeVectorStore
Faiss
import faiss
from llama_index.vector_stores import FaissVectorStore
...
Weaviate
import weaviate
from llama_index.vector_stores import WeaviateVectorStore
Pinecone
import pinecone
from llama_index.vector_stores import PineconeVectorStore
# can define filters specific to this vector index (so you can
# reuse pinecone indexes)
metadata_filters = {"title": "paul_graham_essay"}
Qdrant
import qdrant_client
from llama_index.vector_stores import QdrantVectorStore
Chroma
import chromadb
from llama_index.vector_stores import ChromaVectorStore
Milvus
• Milvus Index offers the ability to store both Documents and their embeddings. Documents are limited to the
predefined Document attributes and does not include extra_info.
import pymilvus
from llama_index.vector_stores import MilvusVectorStore
Note: MilvusVectorStore depends on the pymilvus library. Use pip install pymilvus if not already in-
stalled. If you get stuck at building wheel for grpcio, check if you are using python 3.11 (there’s a known issue:
https://github.com/milvus-io/pymilvus/issues/1308) and try downgrading.
Zilliz
• Zilliz Cloud (hosted version of Milvus) uses the Milvus Index with some extra arguments.
import pymilvus
from llama_index.vector_stores import MilvusVectorStore
Note: MilvusVectorStore depends on the pymilvus library. Use pip install pymilvus if not already in-
stalled. If you get stuck at building wheel for grpcio, check if you are using python 3.11 (there’s a known issue:
https://github.com/milvus-io/pymilvus/issues/1308) and try downgrading.
MyScale
import clickhouse_connect
from llama_index.vector_stores import MyScaleVectorStore
NOTE: This is a work-in-progress, stay tuned for more exciting updates on this front!
The OpenAI ChatGPT Retrieval Plugin offers a centralized API specification for any document storage system to
interact with ChatGPT. Since this can be deployed on any service, this means that more and more document retrieval
services will implement this spec; this allows them to not only interact with ChatGPT, but also interact with any LLM
toolkit that may use a retrieval service.
LlamaIndex provides a variety of integrations with the ChatGPT Retrieval Plugin.
The ChatGPT Retrieval Plugin defines an /upsert endpoint for users to load documents. This offers a natural inte-
gration point with LlamaHub, which offers over 65 data loaders from various API’s and document formats.
Here is a sample code snippet of showing how to load a document from LlamaHub into the JSON format that /upsert
expects:
from llama_index import download_loader, Document
from typing import Dict, List
import json
# "source": ...,
# "source_id": ...,
# "url": url,
# "created_at": ...,
# "author": "Paul Graham",
}
result_json.append(cur_dict)
# load documents
(continues on next page)
The ChatGPT Retrieval Plugin Index allows you to easily build a vector index over any documents, with storage backed
by a document store implementing the ChatGPT endpoint.
Note: this index is a vector index, allowing top-k retrieval.
Example code:
# load documents
documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()
# build index
bearer_token = os.getenv("BEARER_TOKEN")
# initialize without metadata filter
index = ChatGPTRetrievalPluginIndex(
documents,
endpoint_url="http://localhost:8000",
bearer_token=bearer_token,
)
# query index
query_engine = vector_index.as_query_engine(
similarity_top_k=3,
response_mode="compact",
)
response = query_engine.query("What did the author do growing up?")
(continues on next page)
LlamaIndex provides both Tool abstractions for a Langchain agent as well as a memory module.
The API reference of the Tool abstractions + memory modules are here.
LlamaIndex provides Tool abstractions so that you can use LlamaIndex along with a Langchain agent.
For instance, you can choose to create a “Tool” from an QueryEngine directly as follows:
tool_config = IndexToolConfig(
query_engine=query_engine,
name=f"Vector Index",
description=f"useful for when you want to answer queries about X",
tool_kwargs={"return_direct": True}
)
tool = LlamaIndexTool.from_tool_config(tool_config)
toolkit = LlamaToolkit(
index_configs=index_configs,
)
Such a toolkit can be used to create a downstream Langchain-based chat agent through our create_llama_agent and
create_llama_chat_agent commands:
agent_chain = create_llama_chat_agent(
toolkit,
llm,
memory=memory,
verbose=True
)
We provide another demo notebook showing how you can build a chat agent with the following components.
• Using LlamaIndex as a generic callable tool with a Langchain agent
• Using LlamaIndex as a memory module; this allows you to insert arbitrary amounts of conversation history with
a Langchain chatbot!
Please see the notebook here.
3.16 Storage
LlamaIndex provides a high-level interface for ingesting, indexing, and querying your external data. By default, Lla-
maIndex hides away the complexities and let you query your data in under 5 lines of code.
Under the hood, LlamaIndex also supports swappable storage components that allows you to customize:
• Document stores: where ingested documents (i.e., Node objects) are stored,
• Index stores: where index metadata are stored,
• Vector tores: where embedding vectors are stored.
The Document/Index stores rely on a common Key-Value store abstraction, which is also detailed below.
Persisting Data
By default, LlamaIndex stores data in-memory, and this data can be explicitly persisted if desired:
storage_context.persist(persist_dir="<persist_dir>")
This will persist data to disk, under the specified persist_dir (or ./storage by default).
User can also configure alternative storage backends (e.g. MongoDB) that persist data by default. In this case, calling
storage_context.persist() will do nothing.
Loading Data
To load data, user simply needs to re-create the storage context using the same configuration (e.g. pass in the same
persist_dir or vector store client).
storage_context = StorageContext.from_defaults(
docstore=SimpleDocumentStore.from_persist_dir(persist_dir="<persist_dir>"),
vector_store=SimpleVectorStore.from_persist_dir(persist_dir="<persist_dir>"),
index_store=SimpleIndexStore.from_persist_dir(persist_dir="<persist_dir>"),
)
We can then load specific indices from the StorageContext through some convenience functions below.
Document stores contain ingested document chunks, which we call Node objects.
See the API Reference for more details.
By default, the SimpleDocumentStore stores Node objects in-memory. They can be persisted to (and loaded from)
disk by calling docstore.persist() (and SimpleDocumentStore.from_persist_path(...) respectively).
We support MongoDB as an alternative document store backend that persists data as Node objects are ingested.
# build index
index = GPTVectorStoreIndex(nodes, storage_context=storage_context)
Under the hood, MongoDocumentStore connects to a fixed MongoDB database and initializes new collections (or
loads existing collections) for your nodes.
Note: You can configure the db_name and namespace when instantiating MongoDocumentStore, other-
wise they default to db_name="db_docstore" and namespace="docstore".
Note that it’s not necessary to call storage_context.persist() (or docstore.persist()) when using an
MongoDocumentStore since data is persisted by default.
You can easily reconnect to your MongoDB collection and reload the index by re-initializing a MongoDocumentStore
with an existing db_name and collection_name.
Index stores contains lightweight index metadata (i.e. additional state information created when building an index).
See the API Reference for more details.
By default, LlamaIndex uses a simple index store backed by an in-memory key-value store. They can be persisted to
(and loaded from) disk by calling index_store.persist() (and SimpleIndexStore.from_persist_path(...)
respectively).
Similarly to document stores, we can also use MongoDB as the storage backend of the index store.
# build index
index = GPTVectorStoreIndex(nodes, storage_context=storage_context)
Under the hood, MongoIndexStore connects to a fixed MongoDB database and initializes new collections (or loads
existing collections) for your index metadata.
Note: You can configure the db_name and namespace when instantiating MongoIndexStore, otherwise
they default to db_name="db_docstore" and namespace="docstore".
Note that it’s not necessary to call storage_context.persist() (or index_store.persist()) when using an
MongoIndexStore since data is persisted by default.
You can easily reconnect to your MongoDB collection and reload the index by re-initializing a MongoIndexStore
with an existing db_name and collection_name.
Vector stores contain embedding vectors of ingested document chunks (and sometimes the document chunks as well).
By default, LlamaIndex uses a simple in-memory vector store that’s great for quick experimentation. They
can be persisted to (and loaded from) disk by calling vector_store.persist() (and SimpleVectorStore.
from_persist_path(...) respectively).
We also integrate with a wide range of vector store implementations. They mainly differ in 2 aspects:
1. in-memory vs. hosted
2. stores only vector embeddings vs. also stores documents
• Faiss
• Chroma
• Pinecone
• Weaviate
• Milvus/Zilliz
• Qdrant
• Chroma
• Opensearch
• DeepLake
• MyScale
Others
• ChatGPTRetrievalPlugin
For more details, see Vector Store Integrations.
Key-Value stores are the underlying storage abstractions that power our Document Stores and Index Stores.
We provide the following key-value stores:
• Simple Key-Value Store: An in-memory KV store. The user can choose to call persist on this kv store to
persist data to disk.
• MongoDB Key-Value Store: A MongoDB KV store.
See the API Reference for more details.
Note: At the moment, these storage abstractions are not externally facing.
3.17 Indices
This doc shows both the overarching class used to represent an index. These classes allow for index creation, insertion,
and also querying. We first show the different index subclasses. We then show the base class that all indices inherit
from, which contains parameters and methods common to all indices.
Parameters
index_id (str) – Index id to set.
update(document: Document, **update_kwargs: Any) → None
Update a document.
This is equivalent to deleting the document and then inserting it again.
Parameters
• document (Union[BaseDocument, BaseGPTIndex]) – document to update
• insert_kwargs (Dict) – kwargs to pass to insert
• delete_kwargs (Dict) – kwargs to pass to delete
class llama_index.indices.list.ListIndexEmbeddingRetriever(index: GPTListIndex, similarity_top_k:
Optional[int] = 1, **kwargs: Any)
Embedding based retriever for ListIndex.
Generates embeddings in a lazy fashion for all nodes that are traversed.
Parameters
• index (GPTListIndex) – The index to retrieve from.
• similarity_top_k (Optional[int]) – The number of top nodes to return.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
class llama_index.indices.list.ListIndexRetriever(index: GPTListIndex, **kwargs: Any)
Simple retriever for ListIndex that returns all nodes.
Parameters
index (GPTListIndex) – The index to retrieve from.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
class llama_index.indices.keyword_table.KeywordTableRAKERetriever(index:
BaseGPTKeywordTableIndex,
keyword_extract_template:
Op-
tional[KeywordExtractPrompt]
= None,
query_keyword_extract_template:
Op-
tional[QueryKeywordExtractPrompt]
= None,
max_keywords_per_query: int
= 10, num_chunks_per_query:
int = 10, **kwargs: Any)
Keyword Table Index RAKE Retriever.
Extracts keywords using RAKE keyword extractor. Set when retriever_mode=”rake”.
See BaseGPTKeywordTableQuery for arguments.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
class llama_index.indices.keyword_table.KeywordTableSimpleRetriever(index: BaseGPTKey-
wordTableIndex,
keyword_extract_template:
Op-
tional[KeywordExtractPrompt]
= None,
query_keyword_extract_template:
Op-
tional[QueryKeywordExtractPrompt]
= None,
max_keywords_per_query:
int = 10,
num_chunks_per_query: int
= 10, **kwargs: Any)
Keyword Table Index Simple Retriever.
Extracts keywords using simple regex-based keyword extractor. Set when retriever_mode=”simple”.
See BaseGPTKeywordTableQuery for arguments.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
This allows users to save LLM and Embedding model calls, while only updating documents that have any
changes in text or extra_info. It will also insert any documents that previously were not stored.
set_index_id(index_id: str) → None
Set the index id.
NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call
add_index_struct on the index_store to update the index store.
Parameters
index_id (str) – Index id to set.
update(document: Document, **update_kwargs: Any) → None
Update a document.
This is equivalent to deleting the document and then inserting it again.
Parameters
• document (Union[BaseDocument, BaseGPTIndex]) – document to update
• insert_kwargs (Dict) – kwargs to pass to insert
• delete_kwargs (Dict) – kwargs to pass to delete
class llama_index.indices.struct_store.GPTSQLStructStoreQueryEngine(index:
GPTSQLStructStoreIndex,
sql_context_container: Op-
tional[SQLContextContainerBuilder]
= None, **kwargs: Any)
GPT SQL query engine over a structured database.
Runs raw SQL over a GPTSQLStructStoreIndex. No LLM calls are made here. NOTE: this query cannot work
with composed indices - if the index contains subindices, those subindices will not be queried.
class llama_index.indices.struct_store.SQLContextContainerBuilder(sql_database: SQLDatabase,
context_dict:
Optional[Dict[str, str]] =
None, context_str:
Optional[str] = None)
SQLContextContainerBuilder.
Build a SQLContextContainer that can be passed to the SQL index during index construction or during query-
time.
NOTE: if context_str is specified, that will be used as context instead of context_dict
Parameters
• sql_database (SQLDatabase) – SQL database
• context_dict (Optional[Dict[str, str]]) – context dict
build_context_container(ignore_db_schema: bool = False) → SQLContextContainer
Build index structure.
derive_index_from_context(index_cls: Type[BaseGPTIndex], ignore_db_schema: bool = False,
**index_kwargs: Any) → BaseGPTIndex
Derive index from context.
Parameters
• keywords (List[str]) – Keywords to index the node.
• node (Node) – Node to be indexed.
class llama_index.indices.knowledge_graph.KGTableRetriever(index: GPTKnowledgeGraphIndex,
query_keyword_extract_template: Op-
tional[QueryKeywordExtractPrompt]
= None, max_keywords_per_query: int
= 10, num_chunks_per_query: int =
10, include_text: bool = True,
retriever_mode:
Optional[KGRetrieverMode] =
KGRetrieverMode.KEYWORD,
similarity_top_k: int = 2, **kwargs:
Any)
Base GPT KG Table Index Query.
Arguments are shared among subclasses.
Parameters
• query_keyword_extract_template (Optional[QueryKGExtractPrompt]) – A
Query KG Extraction Prompt (see Prompt Templates).
• refine_template (Optional[RefinePrompt]) – A Refinement Prompt (see Prompt
Templates).
• text_qa_template (Optional[QuestionAnswerPrompt]) – A Question Answering
Prompt (see Prompt Templates).
• max_keywords_per_query (int) – Maximum number of keywords to extract from query.
• num_chunks_per_query (int) – Maximum number of text chunks to query.
• include_text (bool) – Use the document text source from each relevant triplet during
queries.
• retriever_mode (KGRetrieverMode) – Specifies whether to use keyowrds, embeddings,
or both to find relevant triplets. Should be one of “keyword”, “embedding”, or “hybrid”.
• similarity_top_k (int) – The number of top embeddings to use (if embeddings are used).
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
Parameters
index_id (str) – Index id to set.
update(document: Document, **update_kwargs: Any) → None
Update a document.
This is equivalent to deleting the document and then inserting it again.
Parameters
• document (Union[BaseDocument, BaseGPTIndex]) – document to update
• insert_kwargs (Dict) – kwargs to pass to insert
• delete_kwargs (Dict) – kwargs to pass to delete
This doc shows the classes that are used to query indices.
Retrievers
Index Retrievers
List Retriever
class llama_index.indices.keyword_table.retrievers.KeywordTableGPTRetriever(index:
BaseGPTKey-
wordTableIndex,
key-
word_extract_template:
Op-
tional[KeywordExtractPrompt]
= None,
query_keyword_extract_template:
Op-
tional[QueryKeywordExtractPromp
= None,
max_keywords_per_query:
int = 10,
num_chunks_per_query:
int = 10,
**kwargs: Any)
Keyword Table Index GPT Retriever.
Extracts keywords using GPT. Set when using retriever_mode=”default”.
See BaseGPTKeywordTableQuery for arguments.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
class llama_index.indices.keyword_table.retrievers.KeywordTableRAKERetriever(index:
BaseGPTKey-
wordTableIn-
dex,
key-
word_extract_template:
Op-
tional[KeywordExtractPrompt]
= None,
query_keyword_extract_template:
Op-
tional[QueryKeywordExtractProm
= None,
max_keywords_per_query:
int = 10,
num_chunks_per_query:
int = 10,
**kwargs:
Any)
Keyword Table Index RAKE Retriever.
Extracts keywords using RAKE keyword extractor. Set when retriever_mode=”rake”.
See BaseGPTKeywordTableQuery for arguments.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
class llama_index.indices.keyword_table.retrievers.KeywordTableSimpleRetriever(index:
BaseGP-
TKey-
wordTableIn-
dex,
key-
word_extract_template:
Op-
tional[KeywordExtractPrompt]
= None,
query_keyword_extract_templat
Op-
tional[QueryKeywordExtractPro
= None,
max_keywords_per_query:
int = 10,
num_chunks_per_query:
int = 10,
**kwargs:
Any)
Keyword Table Index Simple Retriever.
Extracts keywords using simple regex-based keyword extractor. Set when retriever_mode=”simple”.
See BaseGPTKeywordTableQuery for arguments.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
Tree Retrievers
Summarize query.
class llama_index.indices.tree.all_leaf_retriever.TreeAllLeafRetriever(index: Any)
GPT all leaf retriever.
This class builds a query-specific tree from leaf nodes to return a response. Using this query mode means that
the tree index doesn’t need to be built when initialized, since we rebuild the tree for each query.
Parameters
text_qa_template (Optional[QuestionAnswerPrompt]) – Question-Answer Prompt (see
Prompt Templates).
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
Leaf query mechanism.
class llama_index.indices.tree.select_leaf_retriever.TreeSelectLeafRetriever(index:
GPTTreeIndex,
query_template:
Op-
tional[TreeSelectPrompt]
= None,
text_qa_template:
Op-
tional[QuestionAnswerPrompt]
= None,
refine_template:
Op-
tional[RefinePrompt]
= None,
query_template_multiple:
Op-
tional[TreeSelectMultiplePrompt]
= None,
child_branch_factor:
int = 1,
verbose: bool =
False,
**kwargs:
Any)
Tree select leaf retriever.
This class traverses the index graph and searches for a leaf node that can best answer the query.
Parameters
• query_template (Optional[TreeSelectPrompt]) – Tree Select Query Prompt (see
Prompt Templates).
• query_template_multiple (Optional[TreeSelectMultiplePrompt]) – Tree Select
Query Prompt (Multiple) (see Prompt Templates).
• child_branch_factor (int) – Number of child nodes to consider at each level. If
child_branch_factor is 1, then the query will only choose one child node to traverse for any
given parent node. If child_branch_factor is 2, then the query will choose two child nodes.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
llama_index.indices.tree.select_leaf_retriever.get_text_from_node(node: Node, level:
Optional[int] = None,
verbose: bool = False) → str
Get text from node.
Query Tree using embedding similarity between query and node text.
class llama_index.indices.tree.select_leaf_embedding_retriever.TreeSelectLeafEmbeddingRetriever(index:
GPT-
TreeIn-
dex,
query_tem
Op-
tional[Tre
=
None,
text_qa_te
Op-
tional[Qu
=
None,
re-
fine_temp
Op-
tional[Re
=
None,
query_tem
Op-
tional[Tre
=
None,
child_bra
int
=
1,
ver-
bose:
bool
=
False,
**kwargs
Any)
Tree select leaf embedding retriever.
This class traverses the index graph using the embedding similarity between the query and the node text.
Parameters
• query_template (Optional[TreeSelectPrompt]) – Tree Select Query Prompt (see
Prompt Templates).
• query_template_multiple (Optional[TreeSelectMultiplePrompt]) – Tree Select
Query Prompt (Multiple) (see Prompt Templates).
• text_qa_template (Optional[QuestionAnswerPrompt]) – Question-Answer Prompt
(see Prompt Templates).
• refine_template (Optional[RefinePrompt]) – Refinement Prompt (see Prompt Tem-
plates).
• child_branch_factor (int) – Number of child nodes to consider at each level. If
child_branch_factor is 1, then the query will only choose one child node to traverse for any
given parent node. If child_branch_factor is 2, then the query will choose two child nodes.
Additional Retrievers
Here we show additional retriever classes; these classes can augment existing retrievers with new capabilities (e.g.
query transforms).
Transform Retriever
Base Retriever
Here we show the base retriever class, which contains the retrieve method which is shared amongst all retrievers.
class llama_index.indices.base_retriever.BaseRetriever
Base retriever.
retrieve(str_or_query_bundle: Union[str, QueryBundle]) → List[NodeWithScore]
Retrieve nodes given query.
Parameters
str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.
Response Synthesizer
Query Engines
class llama_index.query_engine.multistep_query_engine.MultiStepQueryEngine(query_engine:
BaseQueryEngine,
query_transform:
StepDecompose-
QueryTransform,
re-
sponse_synthesizer:
Op-
tional[ResponseSynthesizer]
= None,
num_steps:
Optional[int] = 3,
early_stopping:
bool = True,
index_summary:
str = 'None',
stop_fn: Op-
tional[Callable[[Dict],
bool]] = None)
Multi-step query engine.
This query engine can operate over an existing base query engine, along with the multi-step query transform.
Parameters
• query_engine (BaseQueryEngine) – A BaseQueryEngine object.
• query_transform (StepDecomposeQueryTransform) – A StepDecomposeQueryTrans-
form object.
• response_synthesizer (Optional[ResponseSynthesizer]) – A ResponseSynthe-
sizer object.
• num_steps (Optional[int]) – Number of steps to run the multi-step query.
• early_stopping (bool) – Whether to stop early if the stop function returns True.
• index_summary (str) – A string summary of the index.
• stop_fn (Optional[Callable[[Dict], bool]]) – A stop function that takes in a dic-
tionary of information and returns a boolean.
llama_index.query_engine.multistep_query_engine.default_stop_fn(stop_dict: Dict) → bool
Stop function for multi-step query combiner.
class llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine(retriever:
BaseRetriever, re-
sponse_synthesizer:
Op-
tional[ResponseSynthesizer]
= None, call-
back_manager:
Op-
tional[CallbackManager]
= None)
Retriever query engine.
Parameters
• retriever (BaseRetriever) – A retriever object.
• response_synthesizer (Optional[ResponseSynthesizer]) – A ResponseSynthe-
sizer object.
classmethod from_args(retriever: BaseRetriever, service_context: Optional[ServiceContext] = None,
node_postprocessors: Optional[List[BaseNodePostprocessor]] = None, verbose:
bool = False, response_mode: ResponseMode = ResponseMode.COMPACT,
text_qa_template: Optional[QuestionAnswerPrompt] = None, refine_template:
Optional[RefinePrompt] = None, simple_template: Optional[SimpleInputPrompt]
= None, response_kwargs: Optional[Dict] = None, use_async: bool = False,
streaming: bool = False, optimizer: Optional[BaseTokenUsageOptimizer] =
None, **kwargs: Any) → RetrieverQueryEngine
Initialize a RetrieverQueryEngine object.”
Parameters
• retriever (BaseRetriever) – A retriever object.
• service_context (Optional[ServiceContext]) – A ServiceContext object.
• node_postprocessors (Optional[List[BaseNodePostprocessor]]) – A list of
node postprocessors.
• verbose (bool) – Whether to print out debug info.
• response_mode (ResponseMode) – A ResponseMode object.
• text_qa_template (Optional[QuestionAnswerPrompt]) – A QuestionAnswer-
Prompt object.
• refine_template (Optional[RefinePrompt]) – A RefinePrompt object.
• simple_template (Optional[SimpleInputPrompt]) – A SimpleInputPrompt object.
• response_kwargs (Optional[Dict]) – A dict of response kwargs.
• use_async (bool) – Whether to use async.
• streaming (bool) – Whether to use streaming.
• optimizer (Optional[BaseTokenUsageOptimizer]) – A BaseTokenUsageOptimizer
object.
class llama_index.query_engine.transform_query_engine.TransformQueryEngine(query_engine:
BaseQueryEngine,
query_transform:
BaseQueryTrans-
form,
trans-
form_extra_info:
Optional[dict] =
None)
Transform query engine.
Applies a query transform to a query bundle before passing
it to a query engine.
Parameters
• query_engine (BaseQueryEngine) – A query engine object.
• query_transform (BaseQueryTransform) – A query transform object.
• transform_extra_info (Optional[dict]) – Extra info to pass to the query transform.
class llama_index.query_engine.router_query_engine.RetrieverRouterQueryEngine(retriever:
BaseRetriever,
node_to_query_engine_fn:
Callable)
Retriever-based router query engine.
Use a retriever to select a set of Nodes. Each node will be converted into a ToolMetadata object, and also used
to retrieve a query engine, to form a QueryEngineTool.
NOTE: this is a beta feature. We are figuring out the right interface between the retriever and query engine.
Parameters
• selector (BaseSelector) – A selector that chooses one out of many options based on
each candidate’s metadata and query.
• query_engine_tools (Sequence[QueryEngineTool]) – A sequence of candidate query
engines. They must be wrapped as tools to expose metadata to the selector.
class llama_index.query_engine.router_query_engine.RouterQueryEngine(selector: BaseSelector,
query_engine_tools: Se-
quence[QueryEngineTool])
Router query engine.
Selects one out of several candidate query engines to execute a query.
Parameters
• selector (BaseSelector) – A selector that chooses one out of many options based on
each candidate’s metadata and query.
• query_engine_tools (Sequence[QueryEngineTool]) – A sequence of candidate query
engines. They must be wrapped as tools to expose metadata to the selector.
llama_index.query_engine.router_query_engine.default_node_to_metadata_fn(node: Node) →
ToolMetadata
Default node to metadata function.
We use the node’s text as the Tool description.
We also show query engine classes specific to our structured indices.
class llama_index.indices.struct_store.pandas_query.GPTNLPandasQueryEngine(index:
GPTPandasIndex,
instruction_str:
Optional[str] =
None,
output_processor:
Op-
tional[Callable] =
None,
pandas_prompt:
Op-
tional[PandasPrompt]
= None,
output_kwargs:
Optional[dict] =
None, head: int =
5, verbose: bool =
False, **kwargs:
Any)
GPT Pandas query.
Convert natural language to Pandas python code.
Parameters
• df (pd.DataFrame) – Pandas dataframe to use.
• instruction_str (Optional[str]) – Instruction string to use.
• output_processor (Optional[Callable[[str], str]]) – Output processor. A
callable that takes in the output string, pandas DataFrame, and any output kwargs and re-
turns a string.
• pandas_prompt (Optional[PandasPrompt]) – Pandas prompt to use.
• head (int) – Number of rows to show in the table context.
llama_index.indices.struct_store.pandas_query.default_output_processor(output: str, df:
DataFrame,
**output_kwargs: Any)
→ str
Process outputs in a default manner.
Query Bundle
Query Schema.
This schema is used under the hood for all queries, but is primarily exposed for recursive queries over composable
indices.
class llama_index.indices.query.schema.QueryBundle(query_str: str, custom_embedding_strs:
Optional[List[str]] = None, embedding:
Optional[List[float]] = None)
Query bundle.
This dataclass contains the original query string and associated transformations.
Parameters
• query_str (str) – the original user-specified query string. This is currently used by all non
embedding-based queries.
• embedding_strs (list[str]) – list of strings used for embedding the query. This is cur-
rently used by all embedding-based queries.
• embedding (list[float]) – the stored embedding for the query.
property embedding_strs: List[str]
Use custom embedding strs if specified, otherwise use query str.
Query Transform
Query Transforms.
class llama_index.indices.query.query_transform.DecomposeQueryTransform(llm_predictor: Op-
tional[LLMPredictor]
= None, decom-
pose_query_prompt:
Op-
tional[DecomposeQueryTransformPromp
= None, verbose: bool
= False)
Decompose query transform.
Decomposes query into a subquery given the current index struct. Performs a single step transformation.
Parameters
llm_predictor (Optional[LLMPredictor]) – LLM for generating hypothetical documents
run(query_bundle_or_str: Union[str, QueryBundle], extra_info: Optional[Dict] = None) → QueryBundle
Run query transform.
class llama_index.indices.query.query_transform.HyDEQueryTransform(llm_predictor:
Optional[LLMPredictor] =
None, hyde_prompt:
Optional[Prompt] = None,
include_original: bool =
True)
Hypothetical Document Embeddings (HyDE) query transform.
It uses an LLM to generate hypothetical answer(s) to a given query, and use the resulting documents as embedding
strings.
3.19 Node
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
postprocess_nodes(nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None) →
List[NodeWithScore]
Postprocess nodes.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
class llama_index.indices.postprocessor.FixedRecencyPostprocessor(*, service_context:
ServiceContext, top_k: int =
1, date_key: str = 'date',
in_extra_info: bool = True)
Recency post-processor.
This post-processor does the following steps:
• Decides if we need to use the post-processor given the query (is it temporal-related?)
• If yes, sorts nodes by date.
• Take the first k nodes (by default 1), and use that to synthesize an answer.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
postprocess_nodes(nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None) →
List[NodeWithScore]
Postprocess nodes.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
class llama_index.indices.postprocessor.KeywordNodePostprocessor(*, required_keywords: List[str]
= None, exclude_keywords:
List[str] = None)
Keyword-based Node processor.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
mask_pii(text: str) → Tuple[str, Dict]
Mask PII in text.
postprocess_nodes(nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None) →
List[NodeWithScore]
Postprocess nodes.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
class llama_index.indices.postprocessor.PrevNextNodePostprocessor(*, docstore:
BaseDocumentStore,
num_nodes: int = 1, mode: str
= 'next')
Previous/Next Node post-processor.
Allows users to fetch additional nodes from the document store, based on the relationships of the nodes.
NOTE: this is a beta feature.
Parameters
• docstore (BaseDocumentStore) – The document store.
• num_nodes (int) – The number of nodes to return (default: 1)
• mode (str) – The mode of the post-processor. Can be “previous”, “next”, or “both.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
postprocess_nodes(nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None) →
List[NodeWithScore]
Postprocess nodes.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
class llama_index.indices.postprocessor.TimeWeightedPostprocessor(*, time_decay: float = 0.99,
last_accessed_key: str =
'__last_accessed__',
time_access_refresh: bool =
True, now: Optional[float] =
None, top_k: int = 1)
Time-weighted post-processor.
Reranks a set of nodes based on their recency.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
postprocess_nodes(nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None) →
List[NodeWithScore]
Postprocess nodes.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
LlamaIndex offers core abstractions around storage of Nodes, indices, and vectors. A key abstraction is the Storage-
Context - this contains the underlying BaseDocumentStore (for nodes), BaseIndexStore (for indices), and VectorStore
(for vectors).
The Document/Node and index stores rely on a common KVStore abstraction, which is also detailed below.
We show the API references for the Storage Classes, loading indices from the Storage Context, and the Storage Context
class itself below.
class llama_index.storage.docstore.BaseDocumentStore
Vector stores.
class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token:
Optional[str] = None, retries:
Optional[Retry] = None, batch_size:
int = 100, **kwargs: Any)
ChatGPT Retrieval Plugin Client.
In this client, we make use of the endpoints defined by ChatGPT.
Parameters
Return type
List[str]
property client: None
Get client.
delete(doc_id: str, **delete_kwargs: Any) → None
Delete the entities in the dataset :param id: The id to delete. :type id: Optional[str], optional
query(query: VectorStoreQuery) → VectorStoreQueryResult
Query index for top k most similar nodes.
Parameters
• query_embedding (List[float]) – query embedding
• similarity_top_k (int) – top k most similar nodes
class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)
Faiss Vector Store.
Embeddings are stored within a Faiss index.
During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.
Parameters
faiss_index (faiss.Index) – Faiss index instance
add(embedding_results: List[NodeWithEmbedding]) → List[str]
Add embedding results to index.
NOTE: in the Faiss vector store, we do not store text in Faiss.
Args
embedding_results: List[NodeWithEmbedding]: list of embedding results
property client: Any
Return the faiss index.
delete(doc_id: str, **delete_kwargs: Any) → None
Delete a document.
Parameters
doc_id (str) – document id
persist(persist_path: str) → None
Save to file.
This method saves the vector store to disk.
Parameters
save_path (str) – The save_path of the file.
query(query: VectorStoreQuery) → VectorStoreQueryResult
Query index for top k most similar nodes.
Parameters
• query_embedding (List[float]) – query embedding
• similarity_top_k (int) – top k most similar nodes
Parameters
• uri (str, required) – Location where LanceDB will store its files.
• table_name (str, optional) – The table name where the embeddings will be stored.
Defaults to “vectors”.
• nprobes (int, optional) – The number of probes used. A higher number makes search
more accurate but also slower. Defaults to 20.
• refine_factor – (int, optional): Refine the results by reading extra elements and re-
ranking them in memory. Defaults to None
Raises
ImportError – Unable to import lancedb.
Returns
VectorStore that supports creating LanceDB datasets and
querying it.
Return type
LanceDBVectorStore
• index_type (str, optional) – The type of the MyScale vector index. Defaults to “IVF-
FLAT”.
• metric (str, optional) – The metric type of the MyScale vector index. Defaults to
“cosine”.
• batch_size (int, optional) – the size of documents to insert. Defaults to 32.
• index_params (dict, optional) – The index parameters for MyScale. Defaults to None.
• search_params (dict, optional) – The search parameters for a MyScale query. De-
faults to None.
• service_context (ServiceContext, optional) – Vector store service context. De-
faults to None
add(embedding_results: List[NodeWithEmbedding]) → List[str]
Add embedding results to index.
Args
embedding_results: List[NodeWithEmbedding]: list of embedding results
property client: Any
Get client.
delete(doc_id: str, **delete_kwargs: Any) → None
Delete a document.
Parameters
doc_id (str) – document id
drop() → None
Drop MyScale Index and table
query(query: VectorStoreQuery) → VectorStoreQueryResult
Query index for top k most similar nodes.
Parameters
query (VectorStoreQuery) – query
class llama_index.vector_stores.OpensearchVectorClient(endpoint: str, index: str, dim: int,
embedding_field: str = 'embedding',
text_field: str = 'content', method:
Optional[dict] = None, auth: Optional[dict]
= None)
Object encapsulating an Opensearch index that has vector search enabled.
If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1)
not exist yet or 2) be created due to previous usage of this class.
Parameters
• endpoint (str) – URL (http/https) of elasticsearch endpoint
• index (str) – Name of the elasticsearch index
• dim (int) – Dimension of the vector
• embedding_field (str) – Name of the field in the index to store embedding array in.
• text_field (str) – Name of the field to grab text from
• method (Optional[dict]) – Opensearch “method” JSON obj for configuring the KNN
index. This includes engine, metric, and other config params. Defaults to: {“name”: “hnsw”,
“space_type”: “l2”, “engine”: “faiss”, “parameters”: {“ef_construction”: 256, “m”: 48}}
delete_doc_id(doc_id: str) → None
Delete a document.
Parameters
doc_id (str) – document id
do_approx_knn(query_embedding: List[float], k: int) → VectorStoreQueryResult
Do approximate knn.
index_results(results: List[NodeWithEmbedding]) → List[str]
Store results in the index.
class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)
Elasticsearch/Opensearch vector store.
Parameters
client (OpensearchVectorClient) – Vector index client to use for data insertion/querying.
add(embedding_results: List[NodeWithEmbedding]) → List[str]
Add embedding results to index.
Args
embedding_results: List[NodeWithEmbedding]: list of embedding results
property client: Any
Get client.
delete(doc_id: str, **delete_kwargs: Any) → None
Delete a document.
Parameters
doc_id (str) – document id
query(query: VectorStoreQuery) → VectorStoreQueryResult
Query index for top k most similar nodes.
Parameters
• query_embedding (List[float]) – query embedding
• similarity_top_k (int) – top k most similar nodes
class llama_index.vector_stores.PineconeVectorStore(pinecone_index: Optional[Any] = None,
index_name: Optional[str] = None,
environment: Optional[str] = None, namespace:
Optional[str] = None, metadata_filters:
Optional[Dict[str, Any]] = None,
pinecone_kwargs: Optional[Dict] = None,
insert_kwargs: Optional[Dict] = None,
query_kwargs: Optional[Dict] = None,
delete_kwargs: Optional[Dict] = None,
add_sparse_vector: bool = False, tokenizer:
Optional[Callable] = None, **kwargs: Any)
Pinecone Vector Store.
In this vector store, embeddings and docs are stored within a Pinecone index.
During query time, the index uses Pinecone to query for the top k most similar nodes.
Parameters
• pinecone_index (Optional[pinecone.Index]) – Pinecone index instance
• pinecone_kwargs (Optional[Dict]) – kwargs to pass to Pinecone index. NOTE: dep-
recated. If specified, then insert_kwargs, query_kwargs, and delete_kwargs cannot be spec-
ified.
• insert_kwargs (Optional[Dict]) – insert kwargs during upsert call.
• query_kwargs (Optional[Dict]) – query kwargs during query call.
• delete_kwargs (Optional[Dict]) – delete kwargs during delete call.
• add_sparse_vector (bool) – whether to add sparse vector to index.
• tokenizer (Optional[Callable]) – tokenizer to use to generate sparse
add(embedding_results: List[NodeWithEmbedding]) → List[str]
Add embedding results to index.
Args
embedding_results: List[NodeWithEmbedding]: list of embedding results
property client: Any
Return Pinecone client.
delete(doc_id: str, **delete_kwargs: Any) → None
Delete a document.
Parameters
doc_id (str) – document id
query(query: VectorStoreQuery) → VectorStoreQueryResult
Query index for top k most similar nodes.
Parameters
• query_embedding (List[float]) – query embedding
• similarity_top_k (int) – top k most similar nodes
class llama_index.vector_stores.QdrantVectorStore(collection_name: str, client: Optional[Any] =
None, **kwargs: Any)
Qdrant Vector Store.
In this vector store, embeddings and docs are stored within a Qdrant collection.
During query time, the index uses Qdrant to query for the top k most similar nodes.
Parameters
• collection_name – (str): name of the Qdrant collection
• client (Optional[Any]) – QdrantClient instance from qdrant-client package
add(embedding_results: List[NodeWithEmbedding]) → List[str]
Add embedding results to index.
Args
embedding_results: List[NodeWithEmbedding]: list of embedding results
3.21.4 KV Storage
to_dict() → dict
Save the store as dict.
3.22 Composability
Below we show the API reference for composable data structures. This contains both the ComposableGraph class as
well as any builder classes that generate ComposableGraph objects.
Init composability.
class llama_index.composability.ComposableGraph(all_indices: Dict[str, BaseGPTIndex], root_id: str)
Composable graph.
classmethod from_indices(root_index_cls: Type[BaseGPTIndex], children_indices:
Sequence[BaseGPTIndex], index_summaries: Optional[Sequence[str]] =
None, **kwargs: Any) → ComposableGraph
Create composable graph using this index class as the root.
get_index(index_struct_id: Optional[str] = None) → BaseGPTIndex
Get index from index struct id.
class llama_index.composability.QASummaryQueryEngineBuilder(storage_context:
Optional[StorageContext] = None,
service_context:
Optional[ServiceContext] = None,
summary_text: str = 'Use this index
for summarization queries', qa_text:
str = 'Use this index for queries that
require retrieval of specific context
from documents.')
Joint QA Summary graph builder.
Can build a graph that provides a unified query interface for both QA and summarization tasks.
NOTE: this is a beta feature. The API may change in the future.
Parameters
NOTE: Our data connectors are now offered through LlamaHub . LlamaHub is an open-source repository containing
data loaders that you can easily plug and play into any LlamaIndex application.
The following data connectors are still available in the core repo.
Data Connectors for LlamaIndex.
This module contains the data connectors for LlamaIndex. Each connector inherits from a BaseReader class, connects
to a data source, and loads Document objects from that data source.
You may also choose to construct Document objects manually, for instance in our Insert How-To Guide. See below for
the API definition of a Document - the bare minimum is a text property.
class llama_index.readers.BeautifulSoupWebReader(website_extractor: Optional[Dict[str, Callable]] =
None)
BeautifulSoup web page reader.
Reads pages from the web. Requires the bs4 and urllib packages.
Parameters
file_extractor (Optional[Dict[str, Callable]]) – A mapping of website hostname
(e.g. google.com) to a function that specifies how to extract text from the BeautifulSoup obj. See
DEFAULT_WEBSITE_EXTRACTOR.
load_data(urls: List[str], custom_hostname: Optional[str] = None) → List[Document]
Load data from the urls.
Parameters
• urls (List[str]) – List of URLs to scrape.
• custom_hostname (Optional[str]) – Force a certain hostname in the case a website is
displayed under custom URLs (e.g. Substack blogs)
Returns
List of documents.
Return type
List[Document]
load_langchain_documents(**load_kwargs: Any) → List[Document]
Load data in LangChain document format.
get_doc_hash() → str
Get doc_hash.
get_doc_id() → str
Get doc_id.
get_embedding() → List[float]
Get embedding.
Errors if embedding is None.
get_text() → str
Get text.
classmethod get_type() → str
Get Document type.
classmethod get_types() → List[str]
Get Document type.
property is_doc_id_none: bool
Check if doc_id is None.
property is_text_none: bool
Check if text is None.
to_langchain_format() → Document
Convert struct to LangChain document format.
class llama_index.readers.ElasticsearchReader(endpoint: str, index: str, httpx_client_args:
Optional[dict] = None)
Read documents from an Elasticsearch/Opensearch index.
These documents can then be used in a downstream Llama Index data structure.
Parameters
• endpoint (str) – URL (http/https) of cluster
• index (str) – Name of the index (required)
• httpx_client_args (dict) – Optional additional args to pass to the httpx.Client
load_data(field: str, query: Optional[dict] = None, embedding_field: Optional[str] = None) →
List[Document]
Read data from the Elasticsearch index.
Parameters
• field (str) – Field in the document to retrieve text from
• query (Optional[dict]) – Elasticsearch JSON query DSL object. For example:
{“query”: {“match”: {“message”: {“query”: “this is a test”}}}}
• embedding_field (Optional[str]) – If there are embeddings stored in this index, this
field can be used to set the embedding field on the returned Document list.
Returns
A list of documents.
Return type
List[Document]
Examples
Returns
A list of documents.
Return type
List[Document]
load_langchain_documents(**load_kwargs: Any) → List[Document]
Load data in LangChain document format.
class llama_index.readers.MilvusReader(host: str = 'localhost', port: int = 19530, user: str = '', password:
str = '', use_secure: bool = False)
Milvus reader.
load_data(query_vector: List[float], collection_name: str, expr: Optional[Any] = None, search_params:
Optional[dict] = None, limit: int = 10) → List[Document]
Load data from Milvus.
Parameters
• collection_name (str) – Name of the Milvus collection.
• query_vector (List[float]) – Query vector.
• limit (int) – Number of results to return.
Returns
A list of documents.
Return type
List[Document]
load_langchain_documents(**load_kwargs: Any) → List[Document]
Load data in LangChain document format.
class llama_index.readers.MyScaleReader(myscale_host: str, username: str, password: str, myscale_port:
Optional[int] = 8443, database: str = 'default', table: str =
'llama_index', index_type: str = 'IVFLAT', metric: str = 'cosine',
batch_size: int = 32, index_params: Optional[dict] = None,
search_params: Optional[dict] = None, **kwargs: Any)
MyScale reader.
Parameters
• myscale_host (str) – An URL to connect to MyScale backend.
• username (str) – Usernamed to login.
• password (str) – Password to login.
• myscale_port (int) – URL port to connect with HTTP. Defaults to 8443.
• database (str) – Database name to find the table. Defaults to ‘default’.
• table (str) – Table name to operate on. Defaults to ‘vector_table’.
• index_type (str) – index type string. Default to “IVFLAT”
• metric (str) – Metric to compute distance, supported are (‘l2’, ‘cosine’, ‘ip’). Defaults to
‘cosine’
• batch_size (int, optional) – the size of documents to insert. Defaults to 32.
• index_params (dict, optional) – The index parameters for MyScale. Defaults to None.
• search_params (dict, optional) – The search parameters for a MyScale query. De-
faults to None.
load_data(query_vector: List[float], where_str: Optional[str] = None, limit: int = 10) → List[Document]
Load data from MyScale.
Parameters
• query_vector (List[float]) – Query vector.
• where_str (Optional[str], optional) – where condition string. Defaults to None.
• limit (int) – Number of results to return.
Returns
A list of documents.
Return type
List[Document]
load_langchain_documents(**load_kwargs: Any) → List[Document]
Load data in LangChain document format.
class llama_index.readers.NotionPageReader(integration_token: Optional[str] = None)
Notion Page reader.
Reads a set of Notion pages.
Parameters
integration_token (str) – Notion integration token.
load_data(page_ids: List[str] = [], database_id: Optional[str] = None) → List[Document]
Load data from the input directory.
Parameters
page_ids (List[str]) – List of page ids to load.
Returns
List of documents.
Return type
List[Document]
load_langchain_documents(**load_kwargs: Any) → List[Document]
Load data in LangChain document format.
query_database(database_id: str, query_dict: Dict[str, Any] = {}) → List[str]
Get all the pages from a Notion database.
read_page(page_id: str) → str
Read a page.
search(query: str) → List[str]
Search Notion page given a text query.
class llama_index.readers.ObsidianReader(input_dir: str)
Utilities for loading data from an Obsidian Vault.
Parameters
input_dir (str) – Path to the vault.
Example
Returns
A list of documents.
Return type
List[Document]
Note: Requires install of steamship package and an active Steamship API Key. To get a Steamship API Key,
visit: https://steamship.com/account/api. Once you have an API Key, expose it via an environment variable
Note: The collection of Files from both query and file_handles will be combined. There is no (current)
support for deconflicting the collections (meaning that if a file appears both in the result set of the query
and as a handle in file_handles, it will be loaded twice).
Example
documents = StringIterableReader().load_data(
texts=["I went to the store", "I bought an apple"])
index = GPTTreeIndex.from_documents(documents)
query_engine = index.as_query_engine()
query_engine.query("what did I buy?")
3.24.2 Prompts
Parameters
• template (str) – Template for the prompt.
• **prompt_kwargs – Keyword arguments for the prompt.
format(llm: Optional[BaseLanguageModel] = None, **kwargs: Any) → str
Format the prompt.
classmethod from_langchain_prompt(prompt: BasePromptTemplate, **kwargs: Any) → PMT
Load prompt from LangChain prompt.
classmethod from_langchain_prompt_selector(prompt_selector: ConditionalPromptSelector,
**kwargs: Any) → PMT
Load prompt from LangChain prompt.
classmethod from_prompt(prompt: Prompt, llm: Optional[BaseLanguageModel] = None) → PMT
Create a prompt from an existing prompt.
Use case: If the existing prompt is already partially filled, and the remaining fields satisfy the requirements
of the prompt class, then we can create a new prompt from the existing partially filled prompt.
get_langchain_prompt(llm: Optional[BaseLanguageModel] = None) → BasePromptTemplate
Get langchain prompt.
partial_format(**kwargs: Any) → PMT
Format the prompt partially.
Return an instance of itself.
class llama_index.prompts.prompts.RefineTableContextPrompt(template: Optional[str] = None,
langchain_prompt:
Optional[BasePromptTemplate] =
None, langchain_prompt_selector:
Optional[ConditionalPromptSelector]
= None, stop_token: Optional[str] =
None, output_parser:
Optional[BaseOutputParser] = None,
**prompt_kwargs: Any)
Refine Table context prompt.
Prompt to refine a table context given a table schema schema, as well as unstructured text context context_msg,
and a task query_str. This includes both a high-level description of the table as well as a description of each
column in the table.
Parameters
• template (str) – Template for the prompt.
• **prompt_kwargs – Keyword arguments for the prompt.
format(llm: Optional[BaseLanguageModel] = None, **kwargs: Any) → str
Format the prompt.
classmethod from_langchain_prompt(prompt: BasePromptTemplate, **kwargs: Any) → PMT
Load prompt from LangChain prompt.
classmethod from_langchain_prompt_selector(prompt_selector: ConditionalPromptSelector,
**kwargs: Any) → PMT
Load prompt from LangChain prompt.
Parameters
• template (str) – Template for the prompt.
• **prompt_kwargs – Keyword arguments for the prompt.
Prompt class.
class llama_index.prompts.Prompt(template: Optional[str] = None, langchain_prompt:
Optional[BasePromptTemplate] = None, langchain_prompt_selector:
Optional[ConditionalPromptSelector] = None, stop_token: Optional[str]
= None, output_parser: Optional[BaseOutputParser] = None,
**prompt_kwargs: Any)
Prompt class for LlamaIndex.
Wrapper around langchain’s prompt class. Adds ability to:
• enforce certain prompt types
• partially fill values
• define stop token
format(llm: Optional[BaseLanguageModel] = None, **kwargs: Any) → str
Format the prompt.
classmethod from_langchain_prompt(prompt: BasePromptTemplate, **kwargs: Any) → PMT
Load prompt from LangChain prompt.
classmethod from_langchain_prompt_selector(prompt_selector: ConditionalPromptSelector,
**kwargs: Any) → PMT
Load prompt from LangChain prompt.
classmethod from_prompt(prompt: Prompt, llm: Optional[BaseLanguageModel] = None) → PMT
Create a prompt from an existing prompt.
Use case: If the existing prompt is already partially filled, and the remaining fields satisfy the requirements
of the prompt class, then we can create a new prompt from the existing partially filled prompt.
get_langchain_prompt(llm: Optional[BaseLanguageModel] = None) → BasePromptTemplate
Get langchain prompt.
partial_format(**kwargs: Any) → PMT
Format the prompt partially.
Return an instance of itself.
The service context container is a utility container for LlamaIndex index and query classes. The container contains
the following objects that are commonly used for configuring every index and query, such as the LLMPredictor (for
configuring the LLM), the PromptHelper (for configuring input size/chunk size), the BaseEmbedding (for configuring
the embedding model), and more.
3.25.1 Embeddings
3.25.2 LLMPredictor
Our LLMPredictor is a wrapper around Langchain’s LLMChain that allows easy integration into LlamaIndex.
Wrapper functions around an LLM chain.
Our MockLLMPredictor is used for token prediction. See Cost Analysis How-To for more information.
Mock chain wrapper.
class llama_index.token_counter.mock_chain_wrapper.MockLLMPredictor(max_tokens: int = 256, llm:
Optional[BaseLLM] =
None)
Mock LLM Predictor.
async apredict(prompt: Prompt, **prompt_args: Any) → Tuple[str, str]
Async predict the answer to a query.
Parameters
prompt (Prompt) – Prompt to use for prediction.
Returns
Tuple of the predicted answer and the formatted prompt.
Return type
Tuple[str, str]
get_llm_metadata() → LLMMetadata
Get LLM metadata.
property last_token_usage: int
Get the last token usage.
property llm: BaseLanguageModel
Get LLM.
predict(prompt: Prompt, **prompt_args: Any) → Tuple[str, str]
Predict the answer to a query.
Parameters
prompt (Prompt) – Prompt to use for prediction.
Returns
Tuple of the predicted answer and the formatted prompt.
Return type
Tuple[str, str]
stream(prompt: Prompt, **prompt_args: Any) → Tuple[Generator, str]
Stream the answer to a query.
NOTE: this is a beta feature. Will try to build or use better abstractions about response handling.
Parameters
prompt (Prompt) – Prompt to use for prediction.
Returns
The predicted answer.
Return type
str
property total_tokens_used: int
Get the total tokens used so far.
3.25.3 PromptHelper
General prompt helper that can help deal with token limitations.
The helper can split text. It can also concatenate text from Node structs but keeping token limitations in mind.
class llama_index.indices.prompt_helper.PromptHelper(max_input_size: int, num_output: int,
max_chunk_overlap: int, embedding_limit:
Optional[int] = None, chunk_size_limit:
Optional[int] = None, tokenizer:
Optional[Callable[[str], List]] = None,
separator: str = ' ')
Prompt helper.
This utility helps us fill in the prompt, split the text, and fill in context information according to necessary token
limitations.
Parameters
• max_input_size (int) – Maximum input size for the LLM.
• num_output (int) – Number of outputs for the LLM.
• max_chunk_overlap (int) – Maximum chunk overlap for the LLM.
• embedding_limit (Optional[int]) – Maximum number of embeddings to use.
• chunk_size_limit (Optional[int]) – Maximum chunk size to use.
• tokenizer (Optional[Callable[[str], List]]) – Tokenizer to use.
compact_text_chunks(prompt: Prompt, text_chunks: Sequence[str]) → List[str]
Compact text chunks.
This will combine text chunks into consolidated chunks that more fully “pack” the prompt template given
the max_input_size.
classmethod from_llm_predictor(llm_predictor: LLMPredictor, max_chunk_overlap: Optional[int] =
None, embedding_limit: Optional[int] = None, chunk_size_limit:
Optional[int] = None, tokenizer: Optional[Callable[[str], List]] =
None) → PromptHelper
Create from llm predictor.
This will autofill values like max_input_size and num_output.
Init params.
class llama_index.logger.LlamaLogger
Logger class.
add_log(log: Dict) → None
Add log.
get_logs() → List[Dict]
Get logs.
get_metadata() → Dict
Get metadata.
reset() → None
Reset logs.
set_metadata(metadata: Dict) → None
Set metadata.
unset_metadata(metadata_keys: Set) → None
Unset metadata.
3.26 Optimizers
Optimization.
class llama_index.optimization.SentenceEmbeddingOptimizer(embed_model:
Optional[BaseEmbedding] = None,
percentile_cutoff: Optional[float] =
None, threshold_cutoff: Optional[float]
= None, tokenizer_fn:
Optional[Callable[[str], List[str]]] =
None)
Optimization of a text chunk given the query by shortening the input text.
optimize(query_bundle: QueryBundle, text: str) → str
Optimize a text chunk given the query by shortening the input text.
3.27 Callbacks
get_llm_inputs_outputs() → List[List[CBEvent]]
Get the exact LLM inputs and outputs.
on_event_end(event_type: CBEventType, payload: Optional[Dict[str, Any]] = None, event_id: str = '',
**kwargs: Any) → None
Store event end data by event type.
Parameters
• event_type (CBEventType) – event type to store.
• payload (Optional[Dict[str, Any]]) – payload to store.
• event_id (str) – event id to store.
on_event_start(event_type: CBEventType, payload: Optional[Dict[str, Any]] = None, event_id: str = '',
**kwargs: Any) → str
Store event start data by event type.
Parameters
• event_type (CBEventType) – event type to store.
• payload (Optional[Dict[str, Any]]) – payload to store.
• event_id (str) – event id to store.
Our structured indices are documented in Structured Store Index. Below, we provide a reference of the classes that are
used to configure our structured indices.
SQL wrapper around SQLDatabase in langchain.
class llama_index.langchain_helpers.sql_wrapper.SQLDatabase(engine: Engine, schema:
Optional[str] = None, metadata:
Optional[MetaData] = None,
ignore_tables: Optional[List[str]] =
None, include_tables:
Optional[List[str]] = None,
sample_rows_in_table_info: int = 3,
indexes_in_table_info: bool = False,
custom_table_info: Optional[dict] =
None, view_support: bool = False)
SQL Database.
Wrapper around SQLDatabase object from langchain. Offers some helper utilities for insertion and querying.
See langchain documentation for more details:
Parameters
• *args – Arguments to pass to langchain SQLDatabase.
• **kwargs – Keyword arguments to pass to langchain SQLDatabase.
property dialect: str
Return string representation of dialect to use.
3.29 Response
Response schema.
class llama_index.response.schema.Response(response: ~typing.Optional[str], source_nodes: ~typ-
ing.List[~llama_index.data_structs.node.NodeWithScore] =
<factory>, extra_info: ~typing.Optional[~typing.Dict[str,
~typing.Any]] = None)
Response object.
Returned if streaming=False.
response
The response text.
Type
Optional[str]
get_formatted_sources(length: int = 100) → str
Get formatted sources text.
class llama_index.response.schema.StreamingResponse(response_gen:
~typing.Optional[~typing.Generator],
source_nodes: ~typ-
ing.List[~llama_index.data_structs.node.NodeWithScore]
= <factory>, extra_info:
~typing.Optional[~typing.Dict[str,
~typing.Any]] = None, response_txt:
~typing.Optional[str] = None)
StreamingResponse object.
Returned if streaming=True.
response_gen
The response generator.
Type
Optional[Generator]
get_formatted_sources(length: int = 100) → str
Get formatted sources text.
get_response() → Response
Get a standard response object.
print_response_stream() → None
Print the response stream.
3.30 Playground
Node parsers.
class llama_index.node_parser.NodeParser
Base interface for node parser.
abstract get_nodes_from_documents(documents: Sequence[Document]) → List[Node]
Parse documents into nodes.
Parameters
documents (Sequence[Document]) – documents to parse
class llama_index.node_parser.SimpleNodeParser(text_splitter: Optional[TextSplitter] = None,
include_extra_info: bool = True,
include_prev_next_rel: bool = True)
Simple node parser.
Splits a document into Nodes using a TextSplitter.
Parameters
• text_splitter (Optional[TextSplitter]) – text splitter
• include_extra_info (bool) – whether to include extra info in nodes
• include_prev_next_rel (bool) – whether to include prev/next relationships
get_nodes_from_documents(documents: Sequence[Document]) → List[Node]
Parse document into nodes.
Parameters
• documents (Sequence[Document]) – documents to parse
• include_extra_info (bool) – whether to include extra info in nodes
We offer a wide variety of example notebooks. They are referenced throughout the documentation.
Example notebooks are found here.
args_schema: Optional[Type[BaseModel]]
Pydantic model class to validate and parse the tool’s input arguments.
async arun(tool_input: Union[str, Dict], verbose: Optional[bool] = None, start_color: Optional[str] =
'green', color: Optional[str] = 'green', callbacks: Optional[Union[List[BaseCallbackHandler],
BaseCallbackManager]] = None, **kwargs: Any) → Any
Run the tool asynchronously.
callback_manager: Optional[BaseCallbackManager]
Deprecated. Please use callbacks instead.
callbacks: Callbacks
Callbacks to be called during tool execution.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
description: str
Used to tell the model how/when/why to use the tool.
You can provide few-shot examples as a part of the description.
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
classmethod from_tool_config(tool_config: IndexToolConfig) → LlamaIndexTool
Create a tool from a tool config.
property is_single_input: bool
Whether the tool only accepts a single input.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
name: str
The unique name of the tool that clearly communicates its purpose.
classmethod raise_deprecation(values: Dict) → Dict
Raise deprecation warning if callback_manager is used.
return_direct: bool
Whether to return the tool’s output directly. Setting this to True means
that after the tool is called, the AgentExecutor will stop looping.
run(tool_input: Union[str, Dict], verbose: Optional[bool] = None, start_color: Optional[str] = 'green', color:
Optional[str] = 'green', callbacks: Optional[Union[List[BaseCallbackHandler],
BaseCallbackManager]] = None, **kwargs: Any) → Any
Run the tool.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
verbose: bool
Whether to log the tool’s progress.
class llama_index.langchain_helpers.agents.LlamaToolkit(*, index_configs: List[IndexToolConfig] =
None)
Toolkit for interacting with Llama indices.
class Config
Configuration for this pydantic object.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
class Config
Configuration for this pydantic object.
clear() → None
Clear memory contents.
classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values
are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it
adds all passed values
copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None,
deep: bool = False) → Model
Duplicate a model, optionally choose which fields to include, exclude and change.
Parameters
• include – fields to include in new model
• exclude – fields to exclude from new model, as with values this takes precedence over
include
• update – values to change/add in the new model. Note: the data is not validated before
creating the new model: you should trust this data
• deep – set to True to make a deep copy of the model
Returns
new model instance
dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False) → DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude:
Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults:
Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none:
bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True,
**dumps_kwargs: Any) → unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
load_memory_variables(inputs: Dict[str, Any]) → Dict[str, str]
Return key-value pairs given the text input to the chain.
property memory_variables: List[str]
Return memory variables.
save_context(inputs: Dict[str, Any], outputs: Dict[str, str]) → None
Save the context of this model run to memory.
classmethod update_forward_refs(**localns: Any) → None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
Here is a sample of some of the incredible applications and tools built on top of LlamaIndex!
Hosted API service. Includes a “Dense Data Retrieval” API built on top of LlamaIndex where users can upload their
documents and query them. [Website]
3.34.2 Algovera
Build AI workflows using building blocks. Many workflows built on top of LlamaIndex.
[Website].
Interface that allows users to upload long docs and chat with the bot. [Tweet thread]
3.34.4 AgentHQ
3.34.5 PapersGPT
Feed any of the following content into GPT to give it deep customized knowledge:
• Scientific Papers
• Substack Articles
• Podcasts
• Github Repos and more.
[Tweet thread] [Website]
VideoQues: A tool that answers your queries on YouTube videos. [LinkedIn post here].
DocsQues: A tool that answers your questions on longer documents (including .pdfs!) [LinkedIn post here].
3.34.7 PaperBrain
3.34.8 CACTUS
A chatbot that can answer questions over a directory of Obsidian notes. [Tweet thread].
Ask questions about the Real Housewives of Beverly Hills. [Tweet thread] [Website]
3.34.11 Mynd
A journaling app that uses AI to uncover insights and patterns over time. [Website]
3.34.13 AnySummary
3.34.14 Blackmaria
l 164
llama_index.callbacks, 248 llama_index.indices.vector_store.base, 149
llama_index.composability, 211 llama_index.indices.vector_store.retrievers,
llama_index.data_structs.node, 176 167
llama_index.embeddings.langchain, 243 llama_index.langchain_helpers.agents, 255
llama_index.embeddings.openai, 241 llama_index.langchain_helpers.chain_wrapper,
llama_index.indices.base, 158 244
llama_index.indices.base_retriever, 168 llama_index.langchain_helpers.memory_wrapper,
llama_index.indices.common.struct_store.base, 260
252 llama_index.langchain_helpers.sql_wrapper,
llama_index.indices.empty, 156 249
llama_index.indices.empty.retrievers, 159 llama_index.logger, 246
llama_index.indices.keyword_table, 140 llama_index.node_parser, 255
llama_index.indices.keyword_table.retrievers, llama_index.optimization, 247
162 llama_index.playground.base, 254
llama_index.indices.knowledge_graph, 154 llama_index.prompts, 240
llama_index.indices.knowledge_graph.retrievers,llama_index.prompts.prompts, 229
160 llama_index.query_engine.graph_query_engine,
llama_index.indices.list, 139 169
llama_index.indices.list.retrievers, 161 llama_index.query_engine.multistep_query_engine,
llama_index.indices.loading, 210 170
llama_index.indices.postprocessor, 178 llama_index.query_engine.retriever_query_engine,
llama_index.indices.prompt_helper, 245 171
llama_index.indices.query.query_transform, llama_index.query_engine.router_query_engine,
175 172
llama_index.indices.query.response_synthesis, llama_index.query_engine.transform_query_engine,
168 172
llama_index.indices.query.schema, 175 llama_index.readers, 212
llama_index.indices.service_context, 246 llama_index.response.schema, 253
llama_index.indices.struct_store, 150 llama_index.retrievers.transform_retriever,
llama_index.indices.struct_store.container_builder, 168
251 llama_index.storage.docstore, 190
llama_index.indices.struct_store.pandas_query, llama_index.storage.index_store, 195
173 llama_index.storage.kvstore, 208
llama_index.indices.struct_store.sql_query, llama_index.storage.storage_context, 210
173 llama_index.token_counter.mock_chain_wrapper,
llama_index.indices.tree, 146 244
llama_index.indices.tree.all_leaf_retriever, llama_index.vector_stores, 197
164
llama_index.indices.tree.select_leaf_embedding_retriever,
165
llama_index.indices.tree.select_leaf_retriever,
267
LlamaIndex
A 246
add_node()
add() (llama_index.vector_stores.ChatGPTRetrievalPluginClient (llama_index.indices.knowledge_graph.GPTKnowledgeGraphI
method), 198 method), 154
add() (llama_index.vector_stores.ChromaVectorStore aget_embedding() (in module
method), 198 llama_index.embeddings.openai), 242
add() (llama_index.vector_stores.DeepLakeVectorStore aget_embeddings() (in module
method), 199 llama_index.embeddings.openai), 242
add() (llama_index.vector_stores.FaissVectorStore aget_queued_text_embeddings()
method), 200 (llama_index.embeddings.langchain.LangchainEmbedding
add() (llama_index.vector_stores.LanceDBVectorStore method), 243
method), 201 aget_queued_text_embeddings()
add() (llama_index.vector_stores.MetalVectorStore (llama_index.embeddings.openai.OpenAIEmbedding
method), 201 method), 241
add() (llama_index.vector_stores.MilvusVectorStore apredict() (llama_index.token_counter.mock_chain_wrapper.MockLLMP
method), 202 method), 244
add() (llama_index.vector_stores.MyScaleVectorStore args_schema (llama_index.langchain_helpers.agents.LlamaIndexTool
method), 204 attribute), 256
add() (llama_index.vector_stores.OpensearchVectorStore arun() (llama_index.langchain_helpers.agents.LlamaIndexTool
method), 205 method), 257
add() (llama_index.vector_stores.PineconeVectorStore AutoPrevNextNodePostprocessor (class in
method), 206 llama_index.indices.postprocessor), 178
add() (llama_index.vector_stores.QdrantVectorStore AutoPrevNextNodePostprocessor.Config (class in
method), 206 llama_index.indices.postprocessor), 180
add() (llama_index.vector_stores.SimpleVectorStore
method), 207 B
add() (llama_index.vector_stores.WeaviateVectorStore BaseDocumentStore (class in
method), 207 llama_index.storage.docstore), 190
BaseGPTIndex (class in llama_index.indices.base), 158
add_documents() (llama_index.storage.docstore.KVDocumentStore
method), 191 BaseKeywordTableRetriever (class in
llama_index.indices.keyword_table.retrievers),
add_documents() (llama_index.storage.docstore.MongoDocumentStore
method), 192 162
add_documents() (llama_index.storage.docstore.SimpleDocumentStore
BaseRetriever (class in
method), 194 llama_index.indices.base_retriever), 168
add_handler() (llama_index.callbacks.CallbackManager BaseStructDatapointExtractor (class in
method), 248 llama_index.indices.common.struct_store.base),
252
add_index_struct() (llama_index.storage.index_store.KVIndexStore
method), 195 BeautifulSoupWebReader (class in
llama_index.readers), 212
add_index_struct() (llama_index.storage.index_store.MongoIndexStore
method), 196 build_all_context_from_documents()
(llama_index.indices.common.struct_store.base.SQLDocumentCo
add_index_struct() (llama_index.storage.index_store.SimpleIndexStore
method), 197 method), 253
add_log() (llama_index.logger.LlamaLogger method), build_context_container()
269
LlamaIndex
(llama_index.indices.struct_store.container_builder.SQLContextContainerBuilder
client (llama_index.vector_stores.PineconeVectorStore
method), 251 property), 206
build_context_container() client (llama_index.vector_stores.QdrantVectorStore
(llama_index.indices.struct_store.SQLContextContainerBuilder
property), 206
method), 153 client (llama_index.vector_stores.SimpleVectorStore
build_from_documents() property), 207
(llama_index.composability.QASummaryQueryEngineBuilder
client (llama_index.vector_stores.WeaviateVectorStore
method), 212 property), 208
build_table_context_from_documents() CohereRerank (class in
(llama_index.indices.common.struct_store.base.SQLDocumentContextBuilder
llama_index.indices.postprocessor), 181
method), 253 compact_text_chunks()
(llama_index.indices.prompt_helper.PromptHelper
C method), 245
compare() (llama_index.playground.base.Playground
callback_manager (llama_index.langchain_helpers.agents.LlamaIndexTool
attribute), 257 method), 254
CallbackManager (class in llama_index.callbacks), 248 ComposableGraph (class in llama_index.composability),
callbacks (llama_index.langchain_helpers.agents.LlamaIndexTool 211
attribute), 257 ComposableGraphQueryEngine (class in
CBEvent (class in llama_index.callbacks), 248 llama_index.query_engine.graph_query_engine),
CBEventType (class in llama_index.callbacks), 248 169
ChatGPTRetrievalPluginClient (class in construct() (llama_index.indices.postprocessor.AutoPrevNextNodePostpr
llama_index.vector_stores), 197 class method), 180
ChatGPTRetrievalPluginReader (class in construct() (llama_index.indices.postprocessor.EmbeddingRecencyPostp
llama_index.readers), 212 class method), 181
CHILD (llama_index.data_structs.node.DocumentRelationship construct() (llama_index.indices.postprocessor.FixedRecencyPostprocess
attribute), 176 class method), 182
child_node_ids (llama_index.data_structs.node.Node construct() (llama_index.indices.postprocessor.KeywordNodePostproces
property), 177 class method), 183
ChromaReader (class in llama_index.readers), 213 construct() (llama_index.indices.postprocessor.NERPIINodePostprocess
ChromaVectorStore (class in class method), 184
llama_index.vector_stores), 198 construct() (llama_index.indices.postprocessor.PIINodePostprocessor
class method), 186
clear() (llama_index.langchain_helpers.memory_wrapper.GPTIndexChatMemory
method), 261 construct() (llama_index.indices.postprocessor.PrevNextNodePostproces
clear() (llama_index.langchain_helpers.memory_wrapper.GPTIndexMemoryclass method), 187
method), 262 construct() (llama_index.indices.postprocessor.SimilarityPostprocessor
client (llama_index.vector_stores.ChatGPTRetrievalPluginClient class method), 188
property), 198 construct() (llama_index.indices.postprocessor.TimeWeightedPostproces
client (llama_index.vector_stores.ChromaVectorStore class method), 189
property), 198 construct() (llama_index.langchain_helpers.agents.IndexToolConfig
client (llama_index.vector_stores.DeepLakeVectorStore class method), 255
property), 200 construct() (llama_index.langchain_helpers.agents.LlamaIndexTool
client (llama_index.vector_stores.FaissVectorStore class method), 257
property), 200 construct() (llama_index.langchain_helpers.agents.LlamaToolkit
client (llama_index.vector_stores.LanceDBVectorStore class method), 258
property), 201 construct() (llama_index.langchain_helpers.memory_wrapper.GPTIndex
client (llama_index.vector_stores.MetalVectorStore class method), 261
property), 201 construct() (llama_index.langchain_helpers.memory_wrapper.GPTIndex
client (llama_index.vector_stores.MilvusVectorStore class method), 262
property), 203 copy() (llama_index.indices.postprocessor.AutoPrevNextNodePostprocesso
client (llama_index.vector_stores.MyScaleVectorStore method), 180
property), 204 copy() (llama_index.indices.postprocessor.EmbeddingRecencyPostprocess
client (llama_index.vector_stores.OpensearchVectorStore method), 181
property), 205 copy() (llama_index.indices.postprocessor.FixedRecencyPostprocessor
method), 182
270 Index
LlamaIndex
copy() (llama_index.indices.postprocessor.KeywordNodePostprocessor
delete() (llama_index.indices.tree.GPTTreeIndex
method), 183 method), 146
copy() (llama_index.indices.postprocessor.NERPIINodePostprocessor
delete() (llama_index.storage.kvstore.MongoDBKVStore
method), 184 method), 208
copy() (llama_index.indices.postprocessor.PIINodePostprocessor
delete() (llama_index.storage.kvstore.SimpleKVStore
method), 186 method), 209
copy() (llama_index.indices.postprocessor.PrevNextNodePostprocessor
delete() (llama_index.vector_stores.ChatGPTRetrievalPluginClient
method), 188 method), 198
copy() (llama_index.indices.postprocessor.SimilarityPostprocessor
delete() (llama_index.vector_stores.ChromaVectorStore
method), 188 method), 198
copy() (llama_index.indices.postprocessor.TimeWeightedPostprocessor
delete() (llama_index.vector_stores.DeepLakeVectorStore
method), 189 method), 200
copy() (llama_index.langchain_helpers.agents.IndexToolConfig
delete() (llama_index.vector_stores.FaissVectorStore
method), 256 method), 200
copy() (llama_index.langchain_helpers.agents.LlamaIndexTool
delete() (llama_index.vector_stores.LanceDBVectorStore
method), 257 method), 201
copy() (llama_index.langchain_helpers.agents.LlamaToolkitdelete() (llama_index.vector_stores.MetalVectorStore
method), 258 method), 201
copy() (llama_index.langchain_helpers.memory_wrapper.GPTIndexChatMemory
delete() (llama_index.vector_stores.MilvusVectorStore
method), 261 method), 203
copy() (llama_index.langchain_helpers.memory_wrapper.GPTIndexMemory
delete() (llama_index.vector_stores.MyScaleVectorStore
method), 262 method), 204
create_documents() (llama_index.readers.ChromaReader delete() (llama_index.vector_stores.OpensearchVectorStore
method), 213 method), 205
create_llama_agent() (in module delete() (llama_index.vector_stores.PineconeVectorStore
llama_index.langchain_helpers.agents), 259 method), 206
create_llama_chat_agent() (in module delete() (llama_index.vector_stores.QdrantVectorStore
llama_index.langchain_helpers.agents), 259 method), 207
delete() (llama_index.vector_stores.SimpleVectorStore
D method), 207
DecomposeQueryTransform (class in delete() (llama_index.vector_stores.WeaviateVectorStore
llama_index.indices.query.query_transform), method), 208
175 delete_doc_id() (llama_index.vector_stores.OpensearchVectorClient
DeepLakeReader (class in llama_index.readers), 213 method), 205
DeepLakeVectorStore (class in delete_document() (llama_index.storage.docstore.BaseDocumentStore
llama_index.vector_stores), 198 method), 190
default_node_to_metadata_fn() (in module delete_document() (llama_index.storage.docstore.KVDocumentStore
llama_index.query_engine.router_query_engine), method), 191
173 delete_document() (llama_index.storage.docstore.MongoDocumentStore
default_output_processor() (in module method), 192
llama_index.indices.struct_store.pandas_query), delete_document() (llama_index.storage.docstore.SimpleDocumentStore
174 method), 194
default_stop_fn() (in module delete_index_struct()
llama_index.query_engine.multistep_query_engine), (llama_index.storage.index_store.KVIndexStore
170 method), 195
delete() (llama_index.indices.base.BaseGPTIndex delete_index_struct()
method), 158 (llama_index.storage.index_store.MongoIndexStore
delete() (llama_index.indices.keyword_table.GPTKeywordTableIndex method), 196
method), 141 delete_index_struct()
(llama_index.storage.index_store.SimpleIndexStore
delete() (llama_index.indices.keyword_table.GPTRAKEKeywordTableIndex
method), 142 method), 197
delete() (llama_index.indices.keyword_table.GPTSimpleKeywordTableIndex
derive_index_from_context()
method), 143 (llama_index.indices.struct_store.container_builder.SQLContextC
method), 251
Index 271
LlamaIndex
272 Index
LlamaIndex
Index 273
LlamaIndex
(llama_index.prompts.prompts.TableContextPrompt (llama_index.prompts.prompts.TreeInsertPrompt
class method), 236 class method), 238
from_langchain_prompt() from_langchain_prompt_selector()
(llama_index.prompts.prompts.TextToSQLPrompt (llama_index.prompts.prompts.TreeSelectMultiplePrompt
class method), 237 class method), 239
from_langchain_prompt() from_langchain_prompt_selector()
(llama_index.prompts.prompts.TreeInsertPrompt (llama_index.prompts.prompts.TreeSelectPrompt
class method), 238 class method), 239
from_langchain_prompt() from_llm_predictor()
(llama_index.prompts.prompts.TreeSelectMultiplePrompt (llama_index.indices.prompt_helper.PromptHelper
class method), 239 class method), 245
from_langchain_prompt() from_persist_dir() (llama_index.storage.docstore.SimpleDocumentStor
(llama_index.prompts.prompts.TreeSelectPrompt class method), 194
class method), 239 from_persist_dir() (llama_index.storage.index_store.SimpleIndexStore
from_langchain_prompt_selector() class method), 197
(llama_index.prompts.Prompt class method), from_persist_path()
240 (llama_index.storage.docstore.SimpleDocumentStore
from_langchain_prompt_selector() class method), 194
(llama_index.prompts.prompts.KeywordExtractPrompt
from_persist_path()
class method), 229 (llama_index.storage.index_store.SimpleIndexStore
from_langchain_prompt_selector() class method), 197
(llama_index.prompts.prompts.KnowledgeGraphPrompt
from_persist_path()
class method), 230 (llama_index.storage.kvstore.SimpleKVStore
from_langchain_prompt_selector() class method), 209
(llama_index.prompts.prompts.PandasPrompt from_persist_path()
class method), 230 (llama_index.vector_stores.SimpleVectorStore
from_langchain_prompt_selector() class method), 207
(llama_index.prompts.prompts.QueryKeywordExtractPrompt
from_prompt() (llama_index.prompts.Prompt class
class method), 231 method), 240
from_langchain_prompt_selector() from_prompt() (llama_index.prompts.prompts.KeywordExtractPrompt
(llama_index.prompts.prompts.QuestionAnswerPrompt class method), 229
class method), 232 from_prompt() (llama_index.prompts.prompts.KnowledgeGraphPrompt
from_langchain_prompt_selector() class method), 230
(llama_index.prompts.prompts.RefinePrompt from_prompt() (llama_index.prompts.prompts.PandasPrompt
class method), 233 class method), 231
from_langchain_prompt_selector() from_prompt() (llama_index.prompts.prompts.QueryKeywordExtractProm
(llama_index.prompts.prompts.RefineTableContextPrompt class method), 231
class method), 233 from_prompt() (llama_index.prompts.prompts.QuestionAnswerPrompt
from_langchain_prompt_selector() class method), 232
(llama_index.prompts.prompts.SchemaExtractPrompt
from_prompt() (llama_index.prompts.prompts.RefinePrompt
class method), 234 class method), 233
from_langchain_prompt_selector() from_prompt() (llama_index.prompts.prompts.RefineTableContextPrompt
(llama_index.prompts.prompts.SimpleInputPrompt class method), 233
class method), 235 from_prompt() (llama_index.prompts.prompts.SchemaExtractPrompt
from_langchain_prompt_selector() class method), 234
(llama_index.prompts.prompts.SummaryPrompt from_prompt() (llama_index.prompts.prompts.SimpleInputPrompt
class method), 236 class method), 235
from_langchain_prompt_selector() from_prompt() (llama_index.prompts.prompts.SummaryPrompt
(llama_index.prompts.prompts.TableContextPrompt class method), 236
class method), 236 from_prompt() (llama_index.prompts.prompts.TableContextPrompt
from_langchain_prompt_selector() class method), 236
(llama_index.prompts.prompts.TextToSQLPrompt from_prompt() (llama_index.prompts.prompts.TextToSQLPrompt
class method), 237 class method), 237
from_langchain_prompt_selector() from_prompt() (llama_index.prompts.prompts.TreeInsertPrompt
274 Index
LlamaIndex
Index 275
LlamaIndex
276 Index
LlamaIndex
get_text_embedding() 173
(llama_index.embeddings.langchain.LangchainEmbedding
GPTNLStructStoreQueryEngine (class in
method), 243 llama_index.indices.struct_store), 150
get_text_embedding() GPTNLStructStoreQueryEngine (class in
(llama_index.embeddings.openai.OpenAIEmbedding llama_index.indices.struct_store.sql_query),
method), 242 173
get_text_from_node() (in module GPTPandasIndex (class in
llama_index.indices.tree.select_leaf_retriever), llama_index.indices.struct_store), 151
165 GPTRAKEKeywordTableIndex (class in
get_text_from_nodes() llama_index.indices.keyword_table), 142
(llama_index.indices.prompt_helper.PromptHelperGPTSimpleKeywordTableIndex (class in
method), 246 llama_index.indices.keyword_table), 143
get_text_splitter_given_prompt() GPTSQLStructStoreIndex (class in
(llama_index.indices.prompt_helper.PromptHelper llama_index.indices.struct_store), 152
method), 246 GPTSQLStructStoreQueryEngine (class in
get_tools() (llama_index.langchain_helpers.agents.LlamaToolkit llama_index.indices.struct_store), 153
method), 259 GPTSQLStructStoreQueryEngine (class in
get_type() (llama_index.data_structs.node.Node class llama_index.indices.struct_store.sql_query),
method), 177 173
get_type() (llama_index.readers.Document class GPTTreeIndex (class in llama_index.indices.tree), 146
method), 215 GPTVectorStoreIndex (class in
get_types() (llama_index.data_structs.node.Node llama_index.indices.vector_store.base), 149
class method), 177
get_types() (llama_index.readers.Document class H
method), 215 HYBRID (llama_index.indices.knowledge_graph.retrievers.KGRetrieverMode
get_usable_table_names() attribute), 160
(llama_index.langchain_helpers.sql_wrapper.SQLDatabase
HyDEQueryTransform (class in
method), 250 llama_index.indices.query.query_transform),
GithubRepositoryReader (class in 175
llama_index.readers), 216
GoogleDocsReader (class in llama_index.readers), 217 I
GPTEmptyIndex (class in llama_index.indices.empty), index_id (llama_index.indices.base.BaseGPTIndex
157 property), 158
GPTIndexChatMemory (class in index_id (llama_index.indices.empty.GPTEmptyIndex
llama_index.langchain_helpers.memory_wrapper), property), 157
260 index_id (llama_index.indices.keyword_table.GPTKeywordTableIndex
GPTIndexChatMemory.Config (class in property), 141
llama_index.langchain_helpers.memory_wrapper), index_id (llama_index.indices.keyword_table.GPTRAKEKeywordTableInd
260 property), 142
GPTIndexMemory (class in index_id (llama_index.indices.keyword_table.GPTSimpleKeywordTableIn
llama_index.langchain_helpers.memory_wrapper), property), 143
261 index_id (llama_index.indices.knowledge_graph.GPTKnowledgeGraphInd
GPTIndexMemory.Config (class in property), 155
llama_index.langchain_helpers.memory_wrapper), index_id (llama_index.indices.list.GPTListIndex prop-
262 erty), 139
GPTKeywordTableIndex (class in index_id (llama_index.indices.struct_store.GPTPandasIndex
llama_index.indices.keyword_table), 140 property), 151
GPTKnowledgeGraphIndex (class in index_id (llama_index.indices.struct_store.GPTSQLStructStoreIndex
llama_index.indices.knowledge_graph), 154 property), 152
GPTListIndex (class in llama_index.indices.list), 139 index_id (llama_index.indices.tree.GPTTreeIndex prop-
GPTNLPandasQueryEngine (class in erty), 146
llama_index.indices.struct_store), 150 index_id (llama_index.indices.vector_store.base.GPTVectorStoreIndex
GPTNLPandasQueryEngine (class in property), 149
llama_index.indices.struct_store.pandas_query),
Index 277
LlamaIndex
index_results() (llama_index.vector_stores.OpensearchVectorClient
insert_datapoint_from_nodes()
method), 205 (llama_index.indices.common.struct_store.base.BaseStructDatapo
index_struct (llama_index.indices.base.BaseGPTIndex method), 252
property), 158 insert_into_table()
index_struct (llama_index.indices.keyword_table.GPTKeywordTableIndex
(llama_index.langchain_helpers.sql_wrapper.SQLDatabase
property), 141 method), 250
index_struct (llama_index.indices.keyword_table.GPTRAKEKeywordTableIndex
is_doc_id_none (llama_index.data_structs.node.Node
property), 142 property), 177
index_struct (llama_index.indices.keyword_table.GPTSimpleKeywordTableIndex
is_doc_id_none (llama_index.readers.Document prop-
property), 144 erty), 215
index_struct (llama_index.indices.tree.GPTTreeIndex is_single_input (llama_index.langchain_helpers.agents.LlamaIndexToo
property), 146 property), 257
index_struct_cls (llama_index.indices.keyword_table.GPTKeywordTableIndex
is_text_none (llama_index.data_structs.node.Node
attribute), 141 property), 177
index_struct_cls (llama_index.indices.keyword_table.GPTRAKEKeywordTableIndex
is_text_none (llama_index.readers.Document prop-
attribute), 142 erty), 215
index_struct_cls (llama_index.indices.keyword_table.GPTSimpleKeywordTableIndex
attribute), 144 J
index_struct_cls (llama_index.indices.tree.GPTTreeIndex json() (llama_index.indices.postprocessor.AutoPrevNextNodePostprocesso
attribute), 146 method), 180
index_structs() (llama_index.storage.index_store.KVIndexStore
json() (llama_index.indices.postprocessor.EmbeddingRecencyPostprocess
method), 196 method), 182
index_structs() (llama_index.storage.index_store.MongoIndexStore
json() (llama_index.indices.postprocessor.FixedRecencyPostprocessor
method), 196 method), 183
index_structs() (llama_index.storage.index_store.SimpleIndexStore
json() (llama_index.indices.postprocessor.KeywordNodePostprocessor
method), 197 method), 184
IndexToolConfig (class in json() (llama_index.indices.postprocessor.NERPIINodePostprocessor
llama_index.langchain_helpers.agents), 255 method), 185
IndexToolConfig.Config (class in json() (llama_index.indices.postprocessor.PIINodePostprocessor
llama_index.langchain_helpers.agents), 255 method), 187
indices (llama_index.playground.base.Playground json() (llama_index.indices.postprocessor.PrevNextNodePostprocessor
property), 254 method), 188
insert() (llama_index.indices.base.BaseGPTIndex json() (llama_index.indices.postprocessor.SimilarityPostprocessor
method), 158 method), 189
insert() (llama_index.indices.empty.GPTEmptyIndex json() (llama_index.indices.postprocessor.TimeWeightedPostprocessor
method), 157 method), 190
insert() (llama_index.indices.keyword_table.GPTKeywordTableIndex
json() (llama_index.langchain_helpers.agents.IndexToolConfig
method), 141 method), 256
insert() (llama_index.indices.keyword_table.GPTRAKEKeywordTableIndex
json() (llama_index.langchain_helpers.agents.LlamaIndexTool
method), 142 method), 257
insert() (llama_index.indices.keyword_table.GPTSimpleKeywordTableIndex
json() (llama_index.langchain_helpers.agents.LlamaToolkit
method), 144 method), 259
insert() (llama_index.indices.knowledge_graph.GPTKnowledgeGraphIndex
json() (llama_index.langchain_helpers.memory_wrapper.GPTIndexChatM
method), 155 method), 261
insert() (llama_index.indices.list.GPTListIndex json() (llama_index.langchain_helpers.memory_wrapper.GPTIndexMemo
method), 139 method), 262
insert() (llama_index.indices.struct_store.GPTPandasIndexJSONReader (class in llama_index.readers), 217
method), 151
insert() (llama_index.indices.struct_store.GPTSQLStructStoreIndex
K
method), 152 KEYWORD (llama_index.indices.knowledge_graph.retrievers.KGRetrieverMod
insert() (llama_index.indices.tree.GPTTreeIndex attribute), 160
method), 146 KeywordExtractPrompt (class in
insert() (llama_index.indices.vector_store.base.GPTVectorStoreIndex
llama_index.prompts.prompts), 229
method), 149
278 Index
LlamaIndex
Index 279
LlamaIndex
280 Index
LlamaIndex
Index 281
LlamaIndex
282 Index
LlamaIndex
Index 283
LlamaIndex
partial_format() (llama_index.prompts.prompts.TreeSelectPrompt
postprocess_nodes()
method), 240 (llama_index.indices.postprocessor.TimeWeightedPostprocessor
pass_response_to_webhook() method), 190
(llama_index.readers.MakeWrapper method), predict() (llama_index.token_counter.mock_chain_wrapper.MockLLMPr
218 method), 244
persist() (llama_index.storage.docstore.SimpleDocumentStore
prev_node_id (llama_index.data_structs.node.Node
method), 195 property), 178
persist() (llama_index.storage.index_store.KVIndexStorePREVIOUS (llama_index.data_structs.node.DocumentRelationship
method), 196 attribute), 176
persist() (llama_index.storage.index_store.MongoIndexStore
PrevNextNodePostprocessor (class in
method), 197 llama_index.indices.postprocessor), 187
persist() (llama_index.storage.index_store.SimpleIndexStore
print_response_stream()
method), 197 (llama_index.response.schema.StreamingResponse
persist() (llama_index.storage.kvstore.SimpleKVStore method), 253
method), 209 Prompt (class in llama_index.prompts), 240
persist() (llama_index.storage.storage_context.StorageContext
PromptHelper (class in
method), 211 llama_index.indices.prompt_helper), 245
persist() (llama_index.vector_stores.FaissVectorStore put() (llama_index.storage.kvstore.MongoDBKVStore
method), 200 method), 209
persist() (llama_index.vector_stores.SimpleVectorStore put() (llama_index.storage.kvstore.SimpleKVStore
method), 207 method), 209
PIINodePostprocessor (class in
llama_index.indices.postprocessor), 185 Q
PineconeReader (class in llama_index.readers), 221 QASummaryQueryEngineBuilder (class in
PineconeVectorStore (class in llama_index.composability), 211
llama_index.vector_stores), 205 QdrantReader (class in llama_index.readers), 221
Playground (class in llama_index.playground.base), 254 QdrantVectorStore (class in
postprocess_nodes() llama_index.vector_stores), 206
(llama_index.indices.postprocessor.AutoPrevNextNodePostprocessor
query() (llama_index.vector_stores.ChatGPTRetrievalPluginClient
method), 180 method), 198
postprocess_nodes() query() (llama_index.vector_stores.ChromaVectorStore
(llama_index.indices.postprocessor.CohereRerank method), 198
method), 181 query() (llama_index.vector_stores.DeepLakeVectorStore
postprocess_nodes() method), 200
(llama_index.indices.postprocessor.EmbeddingRecencyPostprocessor
query() (llama_index.vector_stores.FaissVectorStore
method), 182 method), 200
postprocess_nodes() query() (llama_index.vector_stores.LanceDBVectorStore
(llama_index.indices.postprocessor.FixedRecencyPostprocessor
method), 201
method), 183 query() (llama_index.vector_stores.MetalVectorStore
postprocess_nodes() method), 202
(llama_index.indices.postprocessor.KeywordNodePostprocessor
query() (llama_index.vector_stores.MilvusVectorStore
method), 184 method), 203
postprocess_nodes() query() (llama_index.vector_stores.MyScaleVectorStore
(llama_index.indices.postprocessor.NERPIINodePostprocessor method), 204
method), 185 query() (llama_index.vector_stores.OpensearchVectorStore
postprocess_nodes() method), 205
(llama_index.indices.postprocessor.PIINodePostprocessor
query() (llama_index.vector_stores.PineconeVectorStore
method), 187 method), 206
postprocess_nodes() query() (llama_index.vector_stores.QdrantVectorStore
(llama_index.indices.postprocessor.PrevNextNodePostprocessor
method), 207
method), 188 query() (llama_index.vector_stores.SimpleVectorStore
postprocess_nodes() method), 207
(llama_index.indices.postprocessor.SimilarityPostprocessor
query() (llama_index.vector_stores.WeaviateVectorStore
method), 189 method), 208
284 Index
LlamaIndex
Index 285
LlamaIndex
286 Index
LlamaIndex
T U
table_info (llama_index.langchain_helpers.sql_wrapper.SQLDatabase
unset_metadata() (llama_index.logger.LlamaLogger
property), 250 method), 246
TableContextPrompt (class in update() (llama_index.indices.base.BaseGPTIndex
llama_index.prompts.prompts), 236 method), 159
TextToSQLPrompt (class in update() (llama_index.indices.empty.GPTEmptyIndex
llama_index.prompts.prompts), 237 method), 157
TimeWeightedPostprocessor (class in update() (llama_index.indices.keyword_table.GPTKeywordTableIndex
llama_index.indices.postprocessor), 189 method), 141
to_dict() (llama_index.storage.kvstore.SimpleKVStore update() (llama_index.indices.keyword_table.GPTRAKEKeywordTableInd
method), 209 method), 143
to_langchain_format() update() (llama_index.indices.keyword_table.GPTSimpleKeywordTableIn
(llama_index.readers.Document method), method), 144
215 update() (llama_index.indices.knowledge_graph.GPTKnowledgeGraphInd
total_tokens_used (llama_index.embeddings.langchain.LangchainEmbedding
method), 155
property), 244 update() (llama_index.indices.list.GPTListIndex
total_tokens_used (llama_index.embeddings.openai.OpenAIEmbedding
method), 140
property), 242 update() (llama_index.indices.struct_store.GPTPandasIndex
total_tokens_used (llama_index.token_counter.mock_chain_wrapper.MockLLMPredictor
method), 151
property), 245 update() (llama_index.indices.struct_store.GPTSQLStructStoreIndex
TrafilaturaWebReader (class in llama_index.readers), method), 153
226
Index 287
LlamaIndex
V
VectorIndexRetriever (class in
288 Index