Open In App

Vector Indexing in MongoDB

Last Updated : 07 Oct, 2025
Comments
Improve
Suggest changes
1 Likes
Like
Report

Semantic search is quickly becoming part of everyday life. It goes beyond the traditional, simple text search to introduce the concept of semantics, aka meaning. This is why searches are often getting more reliable with their results in comparison to what the user was interested in. Vector search is one technical implementation of this, using large language models (LLMs) to map data to points in space, stored as vector embeddings, and then search semantically by finding data points that are close in space, which means being close in meaning.

One of the benefits of MongoDB’s Atlas Vector Search product is that you can not only store your data in a modern and flexible way, but also store the vector embeddings that represent that data’s semantic meaning and search it, all in the same database!

In this article, we will look at how to get started creating a vector index for your data in MongoDB, and then look into how to improve the efficacy of the index with fine-tuning!

Creating a Vector Index in MongoDB Atlas

Creating a vector index in MongoDB Atlas only takes a few simple steps. However, you will need to have a field in the collection you wish to add a search to that has an array of vector embeddings, generated using a service such as Voyage AI or OpenAI.

MongoDB’s sample dataset, available to all Atlas clusters for free, contains a good example of embeddings in action. In the `sample_mflix` database is a collection called `embedded_movies` which contains a field called `plot_embeddings`. This is an array of vectors that represent the movies available in the movies collection.

For this article, we are going to look at creating a vector index against that field. Let’s begin.

In MongoDB Atlas, navigate to the data explorer. You can access this by clicking “Browse Collections” along the top bar of the cluster box in Atlas.

Create-Vector-Index-MongoDB-Atlas

Navigate to the collection that you wish to add vector search support to and click the "Search Indexes" tab to load the page to manage indexes for that collection.

Search-Indexes-MongoDB

If you have not created a MongoDB Atlas Search or MongoDB Atlas Vector Search index for the chosen collection before, you will see a big green button you can click to create a new index. Otherwise, click the "Create Search Index" button in the top right corner.

Create-Search-Index

Clicking the button will load a wizard to guide you through creating an index. Since you can create both a full-text search index for use in MongoDB Atlas Search here and MongoDB Atlas Vector Search, be sure to select the button at the top of the wizard to specify you want to create a vector index.

Next, give your index a name. This will be referenced in applications that use it so be sure to give it a clear name. You then have to specify the collection you wish to create an index for, but the wizard is clever and will have this correctly pre-selected because you navigated to this page from the selected collection.

You can leave the Visual Editor option selected as this is the easiest way to configure the index. Then click "Next".

Vector-Search-Configuration

Next up is the configuration page. This is where you specify the path (field) that contains the vector embeddings, the number of dimensions (which also is the number of items in the array), the similarity method to use (which to choose depends on your use case but is often recommended by the model that generated the embeddings), and any filter fields.

Fine Tuning Performance

Two factors that can impact performance can actually be configured as part of this wizard: filter fields and similarity method. The idea of filter fields is to reduce the number of documents that are considered in the comparison, by filtering the search down to only documents that may contain the results.

As you may have already seen from the configuration section of the wizard, there are three methods of similarity (also known as the distance between vectors):

  • Euclidean - Calculates the distance between the ends of the vectors in geometric space. This is useful for data that is dense and values matter, such as image similarity searches.
  • Cosine - Uses the angle between vectors. This is good for less dense data such as text.
  • DotProduct - Uses both the angle between vectors as well as the distance.
Fine-Tuning-Performance-Vector-Indexing

Another key factor to consider is memory, aka RAM. You will get the best performance if the entire vector index fits in memory. This can be best achieved by taking advantage of dedicated search nodes in MongoDB. These search nodes can scale independently of the main database without impacting the main database which may already be performing optimally.

MongoDB Atlas provides metrics that you can view to see how the index is performing. Key metrics include index size, latency, and CPU/memory utilization.

Now we have discussed key performance considerations, it is time to finish creating the vector search index. You will come to a page to review everything that was configured. You can select "Create Vector Search Index" and it will go ahead and create the index on that field for all documents in the collection. This can take a few minutes depending on the number of documents in your collection, but you will receive an email when the index creation is complete so you will know it is ready to be used.

Create-Vector-Search-Index-MongoDB

Searching with Your New Vector Index

Now that you have created your index, let’s take a brief look at how to use it.

db.embedded_movies.aggregate([
{
$vectorSearch: {
index: "your_index_name",
path: "plot_embedding",
queryVector: [0.12, -0.45, 0.78, ...], // Your query embedding
limit: 5,
numCandidates: 100
}
}
])

This code will run in 'mongosh', the MongoDB shell, but it will look very similar in other language drivers, as well.

Understanding the Query Parameters

It is worth talking through this code a little bit because it has some fields that are worth discussing when it comes to performance improvements.

queryVector: This is a vectorized array of embeddings representing the query being searched for. This could be searching for all movies that involve a plot about aliens or space adventure. This is dynamic, of course, and not stored in the database but will still need to be present in order to make the comparison of the embeddings to find similar results.

limit: This is simply the number of results you want returned.

numCandidates: This is the number of approximate neighbours to be considered before reducing to only the provided limited number of results. A higher number can be slightly slower because it is considering more neighbours but will result in a more thorough search, also known as higher recall. A smaller value here will be faster but might be less accurate, albeit not in a hugely obvious way.


Article Tags :

Explore