RAIRS: Optimizing Redundant Assignment and List Layout for IVF-Based ANN Search

Zehai Yang 0009-0007-1031-8708 [email protected] SKLP, ACS, Institute of Computing Technology, CAS
University of Chinese Academy of Sciences
No. 6 Ke Xue Yuan South Rd, Haidian DistrictBeijingChina
and Shimin Chen 0009-0000-1043-6236 [email protected] SKLP, ACS, Institute of Computing Technology, CAS
University of Chinese Academy of Sciences
No. 6 Ke Xue Yuan South Rd, Haidian DistrictBeijingChina
Abstract.

IVF is one of the most widely used ANNS (Approximate Nearest Neighbors Search) methods in vector databases. The idea of redundant assignment is to assign a data vector to more than one IVF lists for reducing the chance of missing true neighbors in IVF search. However, the naïve strategy, which selects the second IVF list based on the distance between a data vector and the list centroids, performs poorly. Previous work focuses only on the inner product distance, while there is no optimized list selection study for the most popular Euclidean space. Moreover, the IVF search may access the same vector in more than one lists, resulting in redudant distance computation and decreasing query throughput.

In this paper, we present RAIRS to address the above two challenges. For the challenge of the list selection, we propose an optimized AIR metric for the Euclidean space. AIR takes not only distances but also directions into consideration in order to support queries that are closer to the data vector but father away from the first chosen list’s centroid. For the challenge of redudant distance computation, we propose SEIL, an optimized list layout that exploits shared cells to reduce repeated distance computations for IVF search. Our experimental results using representative real-world data sets show that RAIRS out-performs existing redundant assignment solutions and achieves up to 1.33x improvement over the best-performing IVF method, IVF-PQ Fast Scan with refinement.

Vector Database; IVF Index; Approximate Nearest Neighbor Search; Redundant Assignment.
copyright: rightsretainedjournal: PACMMODjournalyear: 2026journalvolume: 4journalnumber: 1 (SIGMOD)article: 73publicationmonth: 2doi: 10.1145/3786687conference: SIGMOD’26; May 31–June 5, 2026; Bengaluru, Indiaccs: Information systems Top-k retrieval in databasesccs: Information systems Data access methods

1. Introduction

Vector search is widely used in real-world applications, including recommendations (Schafer et al., 2007), data mining (Cover and Hart, 1967), information retrieval (Liu et al., 2007), and recently large language model (LLM) studies (Wu et al., 2022; Borgeaud et al., 2022). Approximate Nearest Neighbors Search (ANNS) is the key operation in vector databases. The inverted file index (IVF) is one of the most widely adopted ANNS methods. As depicted in Figure 1a, during construction, IVF computes nlistnlist clusters (a.k.a. lists) and assigns each data vector to the list whose centroid is the closet to the vector. For example, xx is assigned to list1list_{1} as xx lies closer to c1c_{1} than any other centroids. Given a query qq, IVF searches the list centroids and chooses the top-nprobenprobe nearest lists to qq, then traverses the chosen lists to compute distance for all vectors in the lists. In this example, IVF chooses the top-2 lists (i.e., list2list_{2} and list3list_{3}). Unfortunately, it fails to retrieve xx, qq’s true nearest neighbor. While increasing nprobenprobe (e.g., from 2 to 3) may help, IVF has to traverse more lists, leading to lower query throughput. In this paper, we investigate an alternative solution, redundant assignment. The idea is to assign a data vector to more than one IVF lists. Suppose xx is assigned to both list1list_{1} and list2list_{2}. Then, the traversal of list2list_{2} can successfully retrieve xx, thereby reducing the chance of missing true neighbors.

Refer to caption
(a) Problem of single assignment
Refer to caption
(b) NaïveRA works poorly
Figure 1. Redundant assignment for IVF index.

Challenges of Redundant Assignment. There are two main challenges for realizing the redundant assignment idea:

  • List selection strategy: The naïve strategy (NaïveRA) is to select the second list for a vector based purely on its distance to the list centroids. However, experimental results show that NaïveRA can hardly improve the ANNS performance. Figure 1b compares NaïveRA with single assignment on the SIFT1M data set for top-10 nearest-neighbor search. The X-axis reports the average distance computing operations (DCO) per query, while the Y-axis shows the percentage of true neighbors missed. We see that the two curves almost overlap, indicating that NaïveRA fails to out-perform the baseline IVF single assignment.

  • Redundant distance computation: With redundant assignment, IVF search may encounter the same vector in more than one chosen lists. For example, suppose xx is assigned to list1list_{1} and list2list_{2}. If both lists are chosen in a query, IVF will access xx twice, leading to redundant distance computation for xx. One fix seems to deduplicate the vector IDs from all chosen lists before distance computation. However, in advanced IVF methods, such as IVF-PQ fast scan (André et al., 2015), lists store the vector IDs in packed blocks to facilitate SIMD acceleration. It is very costly to unpack the blocks to obtain individual IDs for deduplication purposes.

Our Solution: RAIRS. To address these challenges, we propose RAIRS, consisting of the following two optimization techniques:

  • RAIR (Redundant list selection with Amplified Inverse Residual). We propose an optimized AIR metric for secondary list selection in the Euclidean space. After assigning a vector to the first list, whose centroid is the closest, RAIR selects the second list to minimize the AIR metric, i.e., (r2+λrTr||r^{\prime}||^{2}+\lambda r^{T}r^{\prime}), where r,rr,r^{\prime} are the clustering residuals of the first and the second lists, respectively, and λ\lambda is a constant parameter. This metric considers not only the distance (i.e., the first term r2||r^{\prime}||^{2}), but also the direction (i.e., the second term rTrr^{T}r^{\prime}). AIR prefers a negative second term; it selects the second list whose centroid is at an inverse direction of the data vector compared to the first assigned list’s centroid, thereby covering queries that are closer to the vector but father away from the first assigned list’s centroid. We formally prove the effectiveness of AIR. In addition, we investigate multiple list assignment by extending RAIR to select three or more lists.

  • SEIL (Shared-Cell Enhanced IVF Lists). To alleviate redundant distance computation, we propose an optimized list layout, SEIL. We use celli,jcell_{i,j} to denote all vectors that are assigned to both listilist_{i} and listjlist_{j}. Based on real-world data analysis, we observe that there are cells that contain a large number of vectors. In advanced IVF methods, such as IVF-PQ fast scan, every 32 vector items are packed into a block for SIMD acceleration. Thus, we call cells larger than a block as large cells. We observe that a large fraction of vectors reside in large cells. In light of the observations, we design SEIL to share blocks of a large cell (e.g., celli,jcell_{i,j}) by two lists (e.g., listilist_{i} and listjlist_{j}). Such shared blocks enable deduplication on the blocks, thereby saving repeated distance computation for shared blocks. Please note that SEIL can be applied not only to RAIR, but also to any redundant assignment strategy.

Contributions. The contributions of this paper are threefold. First, we propose RAIR, a novel redundant assignment strategy targeting the Euclidean space. We formally prove the effectiveness of the AIR assignment metric. Second, we propose SEIL, a novel list layout to reduce repeated distance computation caused by redundant assignment. Finally, we perform an extensive experimental study to evaluate the benefits of RAIRS. Experimental results show that RAIR out-performs existing redundant assignment strategies, and SEIL can effectively reduce redundant distance computation. Compared to IVF-PQ Fast Scan with refinement (IVFPQfs), RAIRS achieves up to 1.33x speedup in query throughput while maintaining similar recalls across the real-world data sets.

Outline. The remainder of the paper is organized as follows. Section 2 reviews relevant background and discusses the challenges of redundant assignment. Section 3 overviews the RAIRS solution, then Section 4 and 5 propose RAIR and SEIL, respectively. After that, Section 6 reports the evaluation results. Section 7 discusses related work. Finally, Section 8 concludes the paper.

2. Background

In this section, we review the background on ANNS, IVF, and redundant assignment.

2.1. ANN Search

Problem Definition. Given a set of DD-dimension vectors X={x1,x2,,xn}X=\{x_{1},x_{2},...,x_{n}\}, where xiDx_{i}\in\mathbb{R}^{D} for i=1,2,,ni=1,2,...,n, and a query vector qDq\in\mathbb{R}^{D}, the k Nearest Neighbors (kNN) problem finds the top-KK vectors that are closest to qq. In contemporary applications, data sets often reach massive scales with million to billion vectors, and the vectors often consist of hundreds to even thousands of dimensions. The curse of dimensionality (Indyk and Motwani, 1998; Weber et al., 1998b) makes it impossible to find the exact nearest neighbors without exhaustive searching, which can be prohibitively costly. Consequently, the focus of industry and academia has shifted towards ANNS, sacrificing accuracy slightly for substantial improvement in the processing speed and scalability.

Distance Metrics. The most popular distance metric is the Euclidean distance: dist(q,x)=i=1D(q(i)x(i))2,where x,qDdist(q,x)=\sqrt{\sum_{i=1}^{D}(q^{(i)}-x^{(i)})^{2}},\mbox{where\ }x,q\in\mathbb{R}^{D}. Other common distance metrics include inner product, cosine similarity, etc. In this paper, our proposed redundant selection method, RAIR, targets the Euclidean distance, whereas the optimized list layout, SEIL, works with all distance metrics.

2.2. IVF (Inverted File Index)

IVF is one of the most widely used ANNS methods in vector databases. In the following, we describe the best-performing IVF variant in practice, IVF-PQ Fast Scan with refinement (André et al., 2015; Aumüller et al., 2020), which we use as the baseline in our work.

  • Product quantization: PQ (Jegou et al., 2010) is a widely adopted quantization method for accelerating distance computation. It divides the vector dimensions into a number of groups (e.g., with 2 dimensions per group). Every data vector is divided into multiple sub-vectors accordingly. Then, for each dimension group, PQ partitions the sub-vectors into (e.g., 16) clusters (e.g., using K-means). It encodes a vector as its sub-vectors’ cluster IDs (a.k.a. code words).

    IVF-PQ stores the PQ codes along with the vector IDs in the IVF lists. Given a query, IVF-PQ builds LUTs (Look-Up Tables) that contain for each dimension group the squared distance between the query sub-vector and all cluster centroids. To estimate dist(q,x)dist(q,x), IVF-PQ looks up the LUT for each code word of xx’s sub-vector and adds up the squared distances.

  • Refinement: As estimated distances are not accurate, quantization methods, such as PQ, are often combined with refinement to improve the search quality. The idea is to retrieve a larger number of (e.g., 10×\timesK for a top-KK query) vectors from the quantization-enhanced IVF index, then compute the accurate distances and re-rank the retrieved vectors to obtain the top-KK results.

  • PQ Fast Scan: PQ Fast Scan (André et al., 2015) is among a number of existing techniques (André et al., 2015; Blalock and Guttag, 2017; André et al., 2019) that exploit SIMD acceleration for distance computation. During index construction, PQ Fast Scan organizes the items (i.e., PQ codes and vector IDs) in an IVF list into packed fixed sized (e.g., 32-item) vector blocks to facilitate SIMD accesses. Then, for query processing, it loads the LUTs into the SIMD registers, and uses SIMD instructions to compute the distances for a block of items at a time.

2.3. Redundant Assignment

Refer to caption
Figure 2. Different redundant assignment strategies.

Redundant assignment allocates each vector item to multiple IVF lists rather than a single list, thereby decreasing the likelihood of overlooking true top-KK nearest neighbors that may originally be assigned to only a list far from the query vector.

NaïveRA (Naïve Redundant Assignment) relies solely on the distance for list selection. SPANN (Chen et al., 2021) uses NaïveRA to place a subset of vectors in multiple lists in SSD pages. In contrast, we focus on memory-resident environments.

SOAR (Sun et al., 2023) is a redundant assignment strategy for the inner product distance. Given the first list, it selects the second list with the least r2+λ(rTrr)2||r^{\prime}||^{2}+\lambda(\frac{r^{T}r^{\prime}}{||r||})^{2}, while r,rr,r^{\prime} are the clustering residuals of the first and the second lists, respectively. The second term is non-negative and is minimized when rr and rr^{\prime} are close to orthogonal.

Figure 2 illustrates NaïveRA and SOAR. xx is a vector. c1c_{1}, c2c_{2}, c3c_{3}, and c4c_{4} are four list centroids. Because c1c_{1} is the nearest centroid, xx is first assigned to list1list_{1}. However, for a query qq, xx is qq’s true nearest neighbor, but c1c_{1} is far away from qq. Hence, it is likely that the IVF search for qq may miss xx. We would like to perform redundant assignment for xx. As shown in Figure 2, NaïveRA simply selects the second-nearest centroid, c2c_{2}. SOAR selects c3c_{3}, whose residual vector r=xc3r^{\prime}=x-c_{3} is close to orthogonal to the primary residual vector r=xc1r=x-c_{1}. Unfortunately, neither c2c_{2} nor c3c_{3} is ideal for qq.

In this paper, we propose an AIR strategy optimized for the Euclidean space. To support queries like qq, which are closer to the vector but father away from the first assigned list’s centroid, AIR selects the second list so that its residual is in the opposite direction of the primary residual. As shown in Figure 2, c4c_{4} is selected, which supports qq well.

3. RAIRS Overview

We overview the data structure, the two proposed optimizations, and the index operations of RAIRS.

Data Structure. RAIRS inherits the data structure of IVF-PQ with refinement. As illustrated in Figure 3, it comprises three main components: 1) centroids, 2) inverted lists, and 3) vector data. The first two components form the IVF-PQ module, while the refine module keeps the original vector data. Each vector is assigned to up to two lists, as depicted by the blue and red dotted lines. The inverted lists store PQ codes and vector IDs, consuming much less memory space than the original vectors. At query time, ANNS first retrieves a set of candidate vectors through IVF-PQ. Then, it accesses the refine module to compute accurate distances for the candidates, and re-ranks the candidates to obtain the final top-KK results.

Refer to caption
Figure 3. Overview of the RAIRS index.

RAIR. Given a query qq, IVF traverses the nprobenprobe lists whose centroids are the closest to qq. However, IVF would miss qq’s top-KK nearest neighbor xx if xx is not in the closest nprobenprobe lists. To address this problem, we propose an AIR (Amplified Inverse Residual) metric for optimized redundant assignment in the Euclidean space. RAIR employs the AIR metric to assign each vector to a second IVF list. In this way, it improves the chance of finding true top-KK results given the same nprobenprobe. From another angle, it can reduce the number of traversed lists for attaining similar search accuracy, thereby improving query throughput. Moreover, we consider the case where the first and the second selected lists are the same, and we generalize RAIR to multiple assignments. (cf. Section 4)

SEIL. We use the expression “xx is in celli,jcell_{i,j}” to mean that a vector xx is assigned to listilist_{i} and listjlist_{j} with redundant assignment. We observe that a subset of the cells contain a large number of vectors. There is strong skew in the number of vectors across the cells. This interesting finding motivates us to optimize the list layout for large cells in order to reduce redundant distance computation.

In the baseline, each list is divided into packed 32-item blocks to facilitate SIMD computation by PQ Fast Scan. A large celli,jcell_{i,j} is stored twice in both listilist_{i} and listjlist_{j}. If the two lists are both traversed in the same query, distance computation will be performed twice for celli,jcell_{i,j}, which is wasteful. To address this problem, we propose SEIL that stores the shared blocks of celli,jcell_{i,j} only once. In Figure 3, the shared cell blocks are depicted with the blue color. The gray colored block entry points to the shared cell block. For a query, SEIL performs distance computation for the shared blocks only once, decreasing redundant distance computation. (cf. Section 5)

Index Construction. Algorithm 1 shows the procedure for adding a batch of vectors into the RAIRS index. For each vector, the algorithm calls RairAssign to assign the vector to two lists (Line 4) and computes its PQ code (Line 6). The original vector is also appended to the vector data to facilitate accurate distance calculations by the refine module (Line 7). Finally, the algorithm invokes SeilInsert to insert the vector items into the inverted lists with SEIL layout optimization (Line 8). RairAssign and SeilInsert will be detailed in Section 4.2 and 5.3, respectively.

ANNS Query Processing. Algorithm 2 lists the ANNS procedure to find top-KK nearest neighbors for a batch of queries. It first computes the number of candidates (bigKbigK) to retrieve from the IVF lists as KK multiplied by a pre-defined K_FACTORK\_FACTOR (e.g., 10) (Line 2). For each query vector, the algorithm constructs the PQ distance lookup table (LUTLUT) (Line 4) and identifies nprobenprobe lists whose centroids are closest to the query (Line 5). Then, it calls SeilSearch to traverse each relevant list in the SEIL structure, computes approximate distances based on LUTLUT and the PQ codes, and retrieves a set of bigKbigK candidates (Line 6). Finally, the refine module attains the top-KK results based on exact distance calculations (Line 7). Section 4 describes in detail how SeilSearch efficiently obtains the desired candidates while reducing redundant distance computation.

Algorithm 1 Add a batch of vectors to RAIRS index.

Input: index, vecs, vec_ids


1:function AddVectors(index, vecs, vec_ids)
2:  assignments= []; codes = [];
3:  for (i = 0; i ¡ vecs.len; i ++) do
4:    (listID1, listID2) = RairAssign(index, vecs[i]);
5:    assignments.append( {listID1, listID2, vec_ids[i]} );
6:    codes.append( PQEncoding(vecs[i], index.code_book) );
7:    index.vec_data.append(vecs[i]);   
8:  SeilInsert(index, assignments, codes, vec_ids);
9:  index.ntotal += vecs.len;
Algorithm 2 ANNS with RAIRS index.

Input: index, queries, K, nprobe
Output: results


1:function RairsSearch(index, queries, K, nprobe)
2:  bigK = K * K_FACTOR;
3:  for (i = 0; i ¡ queries.len; i ++) do
4:    LUT = ComputeLookupTable(queries[i], index.code_book);
5:    selected_lists = FindNearestLists(index, queries[i], nprobe);
6:    candidates = SeilSearch(index, LUT, selected_lists, bigK);
7:    results[i] = Refine(index.vec_data, queries[i], candidates, K);   
8:  return results;

Applicability. In this paper, we assume that the vector data can fit into the main memory. We focus on IVF-PQ Fast Scan with refinement as the baseline to demonstrate the effectiveness of our proposed RAIR and SEIL optimizations.

Please note that RAIR can be applied to any IVF-based indices. The redundant assignment strategy is orthogonal to the quantization method, the storage medium, and the hardware optimization of the indices. Moreover, while SEIL is designed to support packed blocks in PQ Fast Scan, the idea of exploiting shared cells to reduce redundant distance computation can be generally applicable. For example, for a disk-resident IVF-flat index, which stores IVF lists on disk and centroids in memory, we can apply the idea of SEIL by replacing the on-disk lists with shared cells of vectors and recording the shared cell addresses along with the list centroids in memory.

4. Redundant List Selection with AIR

We propose AIR (Amplified Inverse Residual) as an optimized list selection metric in the Euclidean space in Section 4.1. Then, we describe and analyze the RAIR algorithm to assign a vector to two lists in Section 4.2. Finally, we consider the generalization of RAIR to assign a vector to multiple lists in Section 4.3.

4.1. Amplified Inverse Residual

Let xx be the data vector to be inserted into the IVF lists and cc be the centroid closest to xx. We consider queries within a maximal distance lml_{m} of xx. Let Q={q:qxlm}Q=\{q:||q-x||\leq l_{m}\} be the set of all queries in the hypersphere centered at xx with radius lml_{m}. Suppose qQq\in Q is a random query vector that is uniformly distributed in QQ. Figure 4 depicts the geometric relationship of the vectors.

Since cc is the closest centroid to xx, the list associated with centroid cc is the first selected list to assign xx. In most cases, representing xx with cc is satisfactory. However, for some query qq, cc may not be an ideal representation for xx. That is, cc lies outside the nearest nprobenprobe lists of qq, and therefore xx is a true top-KK nearest neighbor of qq but does not appear in the retrieved result with single assignment. In such cases, while qq is close to xx, qq is so far away from cc that there are nprobenprobe other centroids closer to qq than cc.

Refer to caption
Figure 4. Geometric relationship of vectors.

We would like to select the second list for xx to accommodate such queries that do not benefit sufficiently from the first selected list with centroid cc. As shown in Figure 4, let α=qxc\alpha=\angle qxc. α\alpha quantifies how unhappy a query qq is with cc as the first selected centroid for xx. This is because in the triangle qxc\triangle qxc, larger α\alpha indicates longer edge qcqc, which is dist(q,c)=qcdist(q,c)=||q-c||. That is, the larger the α\alpha, the farther away qq is from cc, and the less likely that cc appears in the nearest nprobenprobe lists of qq.

Building on this intuition, we formulate the following loss function for selecting the second list. Let cc^{\prime} denote the centroid of the second selected list. We ensure that the second centroid cc^{\prime} compensates for cc among all queries in QQ:

L(c,c,Q)=EqQ[ReLU(cosqxc)(qc2qx2)]L(c^{\prime},c,Q)=E_{q\in Q}[{ReLU}(-\cos\angle qxc)\cdot(||q-c^{\prime}||^{2}-||q-x||^{2})]

In the loss function, the second factor, qc2qx2||q-c^{\prime}||^{2}-||q-x||^{2}, is easy to understand. It expresses the preference for decreasing the squared Euclidean distance from qq to cc^{\prime} compared to qq to xx. The first factor, ReLU(cosα){ReLU}(-\cos\alpha) serves as a weighting term, indicating how important the second centroid cc^{\prime} is to qq. When απ2\alpha\leq\frac{\pi}{2}, it is likely that xx can be visited by qq in the list represented by cc. In such cases, cosα0-\cos\alpha\leq 0 and ReLU{ReLU} returns 0. Hence, the contribution of the second factor is ignored, meaning that the second list is not important. On the other hand, if α\alpha exceeds π2\frac{\pi}{2}, ReLU(cosα)>0{ReLU}(-\cos\alpha)>0. qq is close to xx but far away from cc. We increase the weight for such queries, giving them higher priority during assignment. The larger the α\alpha, the higher the weight. Then, we select the second list with the least L(c,c,Q)L(c^{\prime},c,Q) among all lists.

We prove the following theorem to simplify the computation of the loss function. The full proof is provided in the appendix.

Theorem 4.1.

For a set QQ of queries that are uniformly distributed in the hypersphere centered at xx with radius lml_{m},

L(c,c,Q)r2+λrTrL(c^{\prime},c,Q)\propto||r^{\prime}||^{2}+\lambda r^{T}r^{\prime}

where r=cxr=c-x, r=cxr^{\prime}=c^{\prime}-x, and λ>0\lambda>0 is a constant factor.

AIR. Based on Theorem 4.1, we set r2+λrTr||r^{\prime}||^{2}+\lambda r^{T}r^{\prime} as the metric for selecting the second list. From the formula, we see that the metric does not hinge solely on minimizing r||r^{\prime}||, the distance from the second selected centroid cc^{\prime} to xx. In addition, there is an added penalty term λrTr\lambda r^{T}r^{\prime}. When r||r^{\prime}|| is fixed, having rr^{\prime} closer to the inverse of rr (i.e., r-r) leads to a negative second term, which reduces the loss function. This indicates a preference for the second residual rr^{\prime} to be at an inverse direction of the first residual rr. Hence, we call this metric AIR (Amplified Inverse Residual) to capture its preference for inverse residuals.

Interestingly, if we set λ=0\lambda=0, AIR degenerates to NaïveRA, which selects the second nearest list as the second choice. In our experiments, we set λ=0.5\lambda=0.5 by default. We also perform an in-depth study of the impact of λ\lambda on ANNS performance in Section 6.3.

Table 1. Comparison of redundant assignment strategies.
Strategy NaïveRA SOAR AIR
Formula r2||r^{\prime}||^{2} r2+λ(rTrr)2||r^{\prime}||^{2}+\lambda(\frac{r^{T}r^{\prime}}{||r||})^{2} r2+λrTr||r^{\prime}||^{2}+\lambda r^{T}r^{\prime}
Geometric Interpretation 2nd nearest neighbor Prefer 2nd residual orthogonal to 1st residual Prefer 2nd residual opposite to 1st residual

note: rr (rr^{\prime}) is the clustering residual of the first (second) selected list.

Comparison of Redundant Assignment Strategies. Table 1 compares NaïveRA, SOAR, and AIR. First, NaïveRA aims to minimize the distance of the list centroid for the second selected list. In comparison, both SOAR and AIR consider not only distances but also the relationship between the first and the second selected lists, as evidenced by the second term of their formula. Second, SOAR and AIR are significantly different. The second term in SOAR is proportional to the squared projection of the second residual rr^{\prime} on the first residual rr. Thus, SOAR prefers rr^{\prime} to be orthogonal to rr, making the second term to close to 0. In contrast, the second term of AIR computes the dot product of the two residuals, and can be negative. AIR prefers rr^{\prime} to be at the inverse direction of rr. We compare NaïveRA, SOAR, and AIR experimentally in Section 6.

4.2. RAIR Algorithm

Algorithm 3 shows the RairAssign procedure. Given a data vector vv, the algorithm first obtains N_CANDSN\_CANDS lists whose centroids are closest to vv (Line 2). The cand_listscand\_lists are sorted in the ascending order of the distance between a list’s centroid and vv. Then, the AIR loss function is computed for all candidate lists (Line 3–8). After that, the algorithm selects the primary list to assign vv as cand_lists[0]cand\_lists[0], which is the list whose centroid is closest to vv (Line 9). It determines the secondary assignment by selecting the list that minimizes the computed AIR loss function (Line 11).

RAIR and Strict RAIR (SRAIR). In cases where the AIR loss function remains minimal for vv’s primary list, i.e., for any list,

r2+λrTr(1+λ)r2,||r^{\prime}||^{2}+\lambda r^{T}r^{\prime}\geq(1+\lambda)||r||^{2},

there is little benefit in assigning vv to any secondary list. Thus, such vectors are stored only in their first-choice list. This strategy not only curtails space overhead by limiting unnecessary redundancy but also reduces unworthy accesses to vectors when querying, thereby potentially improving overall query performance. We call this strategy RAIR.

In addition to RAIR, we also provide a strict version of RAIR, called SRAIR. SRAIR assigns a vector strictly to two lists. That is, it excludes the first selected list and applies AIR to select the second list from the rest of the lists.

The algorithm uses an is_strictis\_strict flag to support both RAIR and SRAIR (Line 10). When is_strictis\_strict is false, startstart=0 and the second list is selected from all N_CANDSN\_CANDS lists (Line 10–11). When is_strictis\_strict is true, startstart=1, and cand_lists[0]cand\_lists[0], which is the first selected list, is excluded from consideration (Line 10–11).

Reducing Computation Cost with Limited Candidate Lists. In practice, we do not compute the loss function across all nlistnlist lists for each vector. Instead, we evaluate only the top N_CANDSN\_CANDS nearest lists, as shown in Algorithm 3. Note that this is important for large data sets, where nlistnlist is large. FindNearestLists can perform an ANNS rather than the exhaustive search, thereby reducing the worst-case O(nlistD)O(nlist\cdot D) cost for each vector.

We find that a small N_CANDSN\_CANDS (e.g., 10) is often sufficient for the quality of redundant assignment. For instance, in the SIFT1M (Jegou and Amsaleg, 2010) data set, for over 99.95% of vectors, the minimal loss function is obtained among the top-10 nearest lists when λ=0.5\lambda=0.5 and nlist=1024nlist=1024. We study the setting of N_CANDSN\_CANDS in depth in Section 6.3.

Algorithm 3 Assign a vector to lists using RAIR.

Input: index, v
Output: listID1, listID2


1:function RairAssign(index, v)
2:  cand_lists = FindNearestLists(index, v, N_CANDS);
3:  centroid0 = index.centroids[cand_lists[0]];
4:  residual0 = centroid0 - v;
5:  for (i = 0; i ¡ N_CANDS; i ++) do
6:    centroid = index.centroids[cand_lists[i]];
7:    residual = centroid - v;
8:    loss[i] = L2sqr(residual) +λ+\ \lambda\cdot InnerProd(residual0, residual);   
9:  listID1 = cand_lists[0];
10:  start = (is_strict) ? 1 : 0;
11:  listID2 = cand_lists[ argmin(loss[start .. N_CANDS-1]) ];
12:  return listID1 ¡ listID2 ? (listID1, listID2) : (listID2, listID1);

Cost Analysis. First, selecting the first list consists of calling FindNearestLists (Line 2) and setting listID1listID1 (Line 9) in Algorithm 3. Depending on the implementation, the cost of FindNearestLists is O(nlistD)O(nlist\cdot D) if FindNearestLists performs exhaustive search, or can be decreased to O(sublinear(nlist)D)O(sublinear(nlist)\cdot D) if FindNearestLists performs ANNS. (For example, if FindNearestLists performs IVF-based ANNS, the complexity can be O(nlistD)O(\sqrt{nlist}\cdot D).)

Second, the remaining Algorithm steps select the second list among N_CANDSN\_CANDS candidate vectors using DD-dimensional vector computations. Therefore, the cost is O(N_CANDSD)O(N\_CANDS\cdot D).

Overall, the time complexity can be expressed as O((nlist+N_CANDS)D)O((nlist+N\_CANDS)D) or O((sublinear(nlist)+N_CANDS)D)O((sublinear(nlist)+N\_CANDS)D) depending on the implementation of FindNearestLists.

Note that N_CANDSN\_CANDS is often several orders of magnitude smaller than nlistnlist. The additional cost of selecting the second list is often much smaller than the cost of selecting the first list. Thus, the cost of list selection in redundant assignment is often close to that of the baseline single assignment.

4.3. Generalization to Multiple Assignments

In the above, RAIR performs two-assignment, assigning each vector to up to two IVF lists. In this subsection, we generalize RAIR to mm-assignment, where m3m\geq 3.

Given (m1)(m-1) selected lists, we select the mm-th centroid by considering the losses with regard to all prior selected centroids:

Lm(c,c1,,cm1,Q)=aggr1im1L(c,ci,Q)r2+λaggririTr\displaystyle L_{m}(c^{\prime},c_{1},.,c_{m-1},Q)=\mathop{aggr}_{1\leq i\leq m-1}\ L(c^{\prime},c_{i},Q)\propto||r^{\prime}||^{2}+\lambda\mathop{aggr}_{i}\ r_{i}^{T}r^{\prime}

For aggr\mathop{aggr}, we evaluate three functions (i.e., maxmax, minmin, and avgavg) in Section 6.3. Our results show that maxmax performs the best.

Note that assigning vectors to more than two lists does not necessarily improve ANNS performance. Distributing each vector across additional lists can reduce the required number of traversed lists (nprobenprobe) for queries. However, the average IVF list size grows, incurring larger number of distance computing operations. Our experiments show that two-assignment yields the smallest number of distance computations.

5. Shared-Cell Enhanced IVF Lists

We discuss the problems of the baseline list layout for supporting redundant assignment in Section 5.1. Next, we propose the SEIL layout optimization in Section 5.2. Finally, we describe the algorithms for the SEIL-enhanced index in Section 5.3.

Refer to caption
Figure 5. Characteristics of cells after redundant assignment.

5.1. Challenges of Baseline List Layout

Consider the case where vector xx is assigned to listilist_{i} and listjlist_{j}. In the baseline list layout, the vector item (including xx’s PQ code and its vector ID) is stored in both listilist_{i} and listjlist_{j}. PQ Fast Scan further packs every 32 vector items into a block to facilitate SIMD acceleration in each list. This baseline list layout suffers from two issues: 1) redundant distance computations at query time if both listilist_{i} and listjlist_{j} are among the nprobenprobe lists chosen for a query, and 2) increased space cost for storing the vector item twice.

The first issue lowers the query throughput. One naïve solution is to collect all vector IDs from the nprobenprobe chosen lists, deduplicate the vector IDs, then perform distance computation. Another solution is to build a hash table to keep track of the vector IDs whose distances have been computed, then check the hash table on the fly to avoid any redundant computation. However, both solutions require retrieving the vector IDs before distance computation. Unfortunately, this requires unpacking the packed blocks, which would break the SIMD computation steps and drastically reduce the benefit of PQ Fast Scan. Moreover, both solutions employ either sorting or hashing based algorithms for all vector items in nprobenprobe lists, which can lead to non-trivial additional time and space cost.

The second issue is mitigated by duplicating the vector items rather than the DD-dimensional vectors in the lists. Moreover, if the first and the second assigned lists are the same, RAIR avoids to store the vector item twice. Nevertheless, it would be nice to further reduce the space cost introduced by redundant assignment.

Refer to caption(a) Illustration of shared cellsRefer to caption(b) List structures after two batch of insertions\begin{array}[]{c}\includegraphics[width=433.62pt]{fig/shared_cell.pdf}\\ \mbox{(a) Illustration of shared cells}\vskip 3.61371pt\\ \includegraphics[width=433.62pt]{fig/seil_mem_layout.pdf}\\ \mbox{(b) List structures after two batch of insertions}\end{array}

Figure 6. SEIL list layout.

5.2. SEIL Layout

Characteristics of Cells. We study the characteristics of the cells in Figure 5. Recall that celli,jcell_{i,j} contains all vectors that are assigned to both listilist_{i} and listjlist_{j}. Since celli,jcell_{i,j} and cellj,icell_{j,i} are essentially the same, we ensure iji\leq j for celli,jcell_{i,j}. For a vector that is assigned to only a single list ii in RAIR, we set its cell to celli,icell_{i,i}. We count the vectors in each cell after redundant assignment. The x-axis shows the cell size (i.e., the number of vectors in a cell) in the logarithmic scale. The y-axis reports the Cumulative Distribution Function (CDF) of how vectors are distributed across cells for the SIFT1M data set. 1.0 corresponds to the sum of all cell sizes.

In Figure 5, we draw a dotted line at cell size = 32. Since a block contains 32 vector items in PQ Fast Scan, we consider a cell to be large if its cell size \geq 32. In the figure, large cells are to the right of the dotted line. From the figure, we observe that 1) large cells contain about 50% of all vectors, and 2) there exist very large cells that contain hundreds to thousands of vectors.

This observation of a high degree of concentration of vectors in large cells motivates us to exploit shared cells. Our idea is to share the vectors of a large celli,jcell_{i,j} in both listilist_{i} and listjlist_{j} as much as possible for reducing redundant distance computations at query time and decreasing the space cost due to redundant assignment.

SEIL List Structure. The SEIL list structure is depicted in Figure 6. Figure 6(a) illustrates the idea of shared cells. celli,jcell_{i,j} is drawn at row ii column jj. Vectors in cells are grouped into 32-item blocks. Suppose a cell contains nitemsnitems vectors. Then, we generate (nitems/32nitems/32) blocks. The remaining (nitems%32nitems\%32) vectors are appended to blocks in the miscellaneous area. The blocks of celli,jcell_{i,j} are stored only once in memory. They are shared by listilist_{i} and listjlist_{j}. For example, cell0,1cell_{0,1}’s blocks are shared by list0list_{0} and list1list_{1}. The blocks are physically stored in list0list_{0}. list1list_{1} contains a (the other list ID, block count, pointer) entry that references the shared blocks in list0list_{0}.

Figure 6(b) shows the physical list structures. Each list contains an array of reference entries and an array of 32-item blocks. Shared blocks of celli,jcell_{i,j} (i<ji<j) are physically stored in listilist_{i}, while listjlist_{j} stores the reference entry/entries pointing to the physical blocks. The figure also depicts the structures after two batches of insertions. The entries and blocks of the second batch are highlighted with dotted rounded rectangles. They are appended to the end of the entry arrays and block arrays.

We would like to point out several design considerations. First, a reference entry can point to multiple blocks of the same cell. For a batch of insertions, SEIL stores multiple blocks of the same cell contiguously. A reference entry points to the first block with the other list ID and pointer field. The block count is the number of contiguous blocks in the cell. Second, for the new batch of vectors, a new entry referencing the same other list can be generated. For example, for the second batch, cell0,1cell_{0,1} sees a new block. Then, the new block is stored in list0list_{0}, and a new reference entry pointing to list0list_{0}’s new block is appended to list1list_{1}. Now, list1list_{1} contains two reference entries both pointing to list0list_{0}. Third, the last miscellaneous block of a list may contain less than 32 vector items, which is depicted as half-filled orange rectangles. All other blocks are full. For a new batch, we fill the last miscellaneous block of the previous batch before generating new miscellaneous blocks. Finally, for a vector item stored in the miscellaneous blocks, we embed its other assigned list ID in the unused high-order bits of the vector ID. Suppose xx is one of the remaining (nitems%32nitems\%32) vectors in celli,jcell_{i,j}. Then, we embed jj (ii) when storing xx in a listilist_{i}’s (listjlist_{j}’s) miscellaneous block. This simplifies the deduplication of miscellaneous vectors.

SEIL-Optimized Deduplication. For blocks in shared cells, SEIL enables cell-level deduplication to reduce redundant distance computation. The basic idea is to determine if a shared cell has been processed in the same query, and skip the SIMD distance computation for all blocks of the shared cell if the cell has been seen.

To achieve this, we distinguish physical blocks from reference entries. For physically stored blocks of shared cells, we always perform distance computation. For reference entries, we need to determine whether their corresponding blocks have been accessed before in the same query. This is accomplished with a listVisitedlistVisited hash table that keeps track of the visited lists in the query. listVisitedlistVisited is probed with the other list ID of a reference entry. If it exists, then the corresponding blocks have already been processed and thus the reference entry is skipped. Otherwise, distance computation is carried out for the blocks represented by the reference entry.

For vectors in the miscellaneous area, we cannot avoid distance computation because of the packed blocks of PQ Fast Scan. However, it is still necessary to deduplicate the vectors after the distance computation so that the same vector won’t appear twice in the returned result. Instead of maintaining detailed records of all miscellaneous vectors accessed, we exploit the above listVisitedlistVisited hash table to simplify the deduplication. Basically, for each miscellaneous vector xx, we check whether the other list ID embedded in xx’s vector ID has already been accessed in listVisitedlistVisited, and immediately skip the vector if the check returns true.

5.3. SEIL Algorithms

Constructing SEIL-Optimized Lists. Algorithm 4 shows the procedure to insert a batch of vectors into SEIL-optimized lists. The algorithm sorts the assignment items in ascending order so that the items in the same cell are contiguous in the assignsassigns array (Line 8). Then, the algorithm scans the assignsassigns array twice in two for-loops.

The first for-loop computes the number of shared blocks and the number of items in the miscellaneous area for each list (Line 9–15). Each loop iteration processes the items in the same cell. It obtains the cell (Line 10), counts the number of items in the cell (Line 11), computes the number of blocks and the remaining items (Line 12), and updates the relevant statistics (Line 13–15). After that, this information is used to allocate space for the lists (Line 16).

The second for-loop populates the lists with the items (Line 17–32). It obtains cellcell, nitemsnitems, nblocksnblocks, and nmiscnmisc in the same way as in the first for-loop (Line 18–20), then appends shared blocks (Line 23–25) and miscellaneous items (Line 26–27) in the first list. If the second list is not the same as the first list, the algorithm also appends the miscellaneous items in the second list (Line 31–32), and creates a reference entry to point to the shared blocks of the first list (Line 30). Vectors within the same block have their PQ codes arranged in the PQ Fast Scan format.

Algorithm 4 Insert a batch of vectors to SEIL-optimized lists.

Input: index, assigns, codes, vec_ids


1:function GetCell(assign)
2:  return (assign.listID1, assign.listID2);
3:function CountItemsWSameCell(assigns, i, cell)
4:  for (j = i+1; j ¡ assigns.len; j++) do
5:    if (GetCell(assigns[j]) != cell) then break;   
6:  return j - i;
7:function SeilInsert(index, assigns, codes, vec_ids)
8:  Sort assigns in ascending order of {listID1, listID2, vec_id};
9:  for (i = 0; i ¡ assigns.len; i += nitems) do
10:    cell = GetCell(assigns[i]);
11:    nitems = CountItemsWSameCell(assigns, i, cell);
12:    nblocks = nitems / BLK_SZ; nmisc = nitems % BLK_SZ;
13:    list_nb[cell.listID1] += nblocks; list_nm[cell.listID1] += nmisc;
14:    if (cell.listID1 != cell.listID2) then
15:     list_nm[cell.listID2] += nmisc;       
16:  Allocate space according to list_nb and list_nm;
17:  for (i = 0; i ¡ assigns.len; i += nitems) do
18:    cell = GetCell(assigns[i]);
19:    nitems = CountItemsWSameCell(assigns, i, cell);
20:    nblocks = nitems / BLK_SZ; nmisc = nitems % BLK_SZ;
21:    list1 = index.lists[cell.listID1];
22:    bptr = list1.getNextBlockPointerInSharedCell();
23:    for (b = 0; b ¡ nblocks; b++) do
24:     begin = i + b*BLK_SZ; end = begin + BLK_SZ - 1;
25:     list1.appendBlockInSharedCell(assigns, begin, end);     
26:    for (j = i + nblocks*BLK_SZ; j ¡ i+nitems; j++) do
27:     list1.appendItemInMiscArea(assigns, j);     
28:    if (cell.listID1 != cell.listID2) then
29:     list2 = index.lists[cell.listID2];
30:     list2.appendReferenceEntry(cell.listID1, nblocks, bptr);
31:     for (j = i + nblocks*BLK_SZ; j ¡ i+nitems; j++) do
32:      list2.appendItemInMiscArea(assigns, j);          

Searching SEIL-Optimized Lists. Algorithm 5 uses two structures to facilitate the search: rqueuerqueue that maintains the top-bigKbigK vectors, and listVisitedlistVisited that keeps track of visited lists for deduplication purposes. The algorithm initializes the two structures (Line 2–3), then goes into a loop to search each selected list (Line 4–18).

For each list, the algorithm processes the reference entries (Line 6–10), shared blocks stored in the current list (Line 11–13), and blocks in the miscellaneous area (Line 14–17). For the reference entries, cell-level deduplication is used to check if all blocks pointed to by an entry can be skipped (Line 7). For the physically stored shared blocks, it is guaranteed that there is no duplication since the associated reference entry will be checked in the other list. For the miscellaneous area, the vectors are deduplicated after distance computation using listVisitedlistVisited (Line 16).

PQ Fast Scan is invoked for each block to efficiently compute the distances with SIMD instructions (Line 9, 12, 15). At the end of each loop iteration, the current list is added to listVisitedlistVisited (Line 18). Finally, the algorithm returns the top-bigKbigK candidates (Line 19).

Algorithm 5 Searching SEIL-optimized lists.

Input: index, LUTs, selected_lists, bigK
Output: candidates


1:function SeilSearch(index, LUT, selected_lists, bigK)
2:  rqueue.init(bigK);
3:  listVisited.init();
4:  for (each listID in selected_lists) do
5:    list = index.lists[listID]; /* process reference entries */
6:    for (each (otherLID, nblocks, bptr) in list.ref_entries) do
7:     if (! listVisited.exist(otherLID)) then
8:      for (b = 0; b ¡ nblocks; b ++) do
9:         results = PQFastScan(LUT, bptr[b]);
10:         rqueue.update(results);                /* process shared blocks without duplication checking */
11:    for (each block in list.shared_cell_blocks) do
12:     results = PQFastScan(LUT, block);
13:     rqueue.update(results);     /* process items in miscellaneous area */
14:    for (each block in list.misc_blocks) do
15:     results = PQFastScan(LUT, block);
16:     results = RemoveVectorIfVisited(listVisited, results);
17:     rqueue.update(results);     
18:    listVisited.add(listID);   
19:  return rqueue.output(bigK);

Implementation: Cache Optimization for Query Batch. ANNS can be invoked for a batch of queries, which is standard in various ANNS benchmarks (Aumüller et al., 2020; Zilliztech, 2023). Let each (query, selected list) be a computation task. One implementation is to group the tasks by queries, and process all the tasks of the same query before moving on to the next query. However, this can be suboptimal. In many cases, a list is visited by multiple queries in a query batch. Since the data accessed by a query is often much larger than the CPU cache, the implementation incurs a lot of CPU cache misses for accessing the same list in multiple queries.

To deal with this problem, our implementation employs an optimization technique to improve the CPU cache performance, which is available in Milvus (Wang et al., 2021) and Faiss (Douze et al., 2024). The idea is to group the tasks by lists, and process all the queries for the same list back to back. In this way, a list stays in the CPU cache for all but the first query searching it. For conciseness of presentation, we have omitted the pseudo-code of this optimization in Algorithm 5.

Cost Analysis. We consider the cost of SeilInsertSeilInsert in Algorithm 4. Suppose nn vectors are to be inserted. For sorting assignsassigns, we can employ the bucket sort with nlistnlist buckets in two passes, which takes O(n)O(n) time. In the two for-loops, CountItemsWSameCellCountItemsWSameCell visits each item in the assignsassigns array, resulting in O(n)O(n) cost. Packing the nn vector items with their PQ codes and vector IDs into blocks in appendBlockInSharedCellappendBlockInSharedCell and appendItemInMiscAreaappendItemInMiscArea takes O(nD)O(n\cdot D) cost. The rest of the operations are performed for each cell. Since the number of cells is bounded by nn, their cost is bounded by O(n)O(n). As a result, the cost of SeilInsertSeilInsert is O(nD)O(n\cdot D), which is dominated by packing vector items into blocks. Our SEIL optimization avoids redundant storage of blocks for shared cells, thereby reducing the constant factor of this cost.

Next, we consider the cost of AddVectorAddVector in Algorithm 1, which performs the complete insertion procedure, involving both RAIR and SEIL. As described in Section 4.2, the cost of RairAssignRairAssign for each vector is O((sublinear(nlist)+N_CANDS)D)O((sublinear(nlist)+N\_CANDS)D). For nn vectors, its cost is O((sublinear(nlist)+N_CANDS)nD)O((sublinear(nlist)+N\_CANDS)nD). The remaining operations in the for-loop of AddVectorAddVector, including PQ encoding for vectors, take O(nD)O(n\cdot D) time. SeilInsertSeilInsert takes O(nD)O(n\cdot D) time. Therefore, the overall cost is O((sublinear(nlist)+N_CANDS)nD)O((sublinear(nlist)+N\_CANDS)nD).

On the SIFT1M data set, inserting all data vectors to the RAIRS index takes 15.8s, while the single-assignment IVFPQfs baseline requires 14.0s and the HNSW index requires 136.0s. The training time is 13.3s for both RAIRS and IVFPQfs. Therefore, the index construction time is 29.1s and 27.3s for RAIRS and IVFPQfs, respectively. Although RAIRS incurs a 6.6% slowdown relative to IVFPQfs in index construction, it is still much faster than HNSW. Consequently, we consider the additional construction overhead introduced by RAIRS to be within practical bounds. Additional results on insertions are provided in Section 6.2.

For SeilSearchSeilSearch in Algorithm 5, let nvec_selectedn_{vec\_selected} be the total number of vectors in the selected lists. Then, the worst-case cost of SeilSearchSeilSearch is O(nvec_selectedD)O(n_{vec\_selected}D). Suppose nvec_sharedn_{vec\_shared} is the number of vectors in the blocks of cells shared by two selected lists. Then, the SEIL-optimized cell-level deduplication reduces the cost to O((nvec_selectednvec_shared)D)O((n_{vec\_selected}-n_{vec\_shared})D).

6. Evaluation

We start by describing the experimental setup in Section 6.1. Next, we report the overall performance of RAIRS in Section 6.2, followed by in-depth analysis of individual techniques and algorithm parameters in Section 6.3. Finally, we study the flexibility of the SEIL optimized list layout by applying it to SOAR in Section 6.4.

6.1. Experimental Setup

Machine Configuration. We conduct all experiments on a server equipped with an Intel(R) Xeon(R) Platinum 8360Y CPU (2.40GHz, 36 cores, 64KB L1 cache per core, 1.25MB L2 cache per core, 54MB shared L3 cache) and 1TB 3200MT/s DDR4 memory, running CentOS 7.9.2009. The CPU supports both AVX2 and AVX-512 SIMD instructions111The IndexIVFPQFastScan class in FAISS v1.8.0 uses AVX2 256-bit SIMD instructions, but the compilation of FAISS exploits AVX-512 instructions to generate libfaiss_avx512.so. Our experiments use libfaiss_avx512.so. In our experiments, downclocking does not occur.. C/C++ code is compiled using GCC 9.3.1 with -O3.

Implementation. We implement the RAIRS index based on Faiss v1.8.0 (Douze et al., 2024) and employ OpenBLAS v0.3.3 for linear algebra support. RAIRS is implemented as a number of subclasses of the IndexIVF class of Faiss. It interacts with Faiss through Faiss’s public APIs. We write approximately 2,200 lines of code.

In the experiments, both RAIR assignment and SEIL query routines are conducted with batches of vectors by default. For index construction with AddVectorsAddVectors, the set of all data vectors are provided in a batch to build the RAIRS index, which performs RAIR assignment and constructs SEIL-optimized lists. For query processing, each thread processes a batch of query vectors in parallel to maximize throughput. We perform a bulk execution of the FindNearestListsFindNearestLists function, retrieving the nearest lists for all query vectors in the query batch. Then, we employ the cache optimization for query batches as described in Section 5.3. We scan each list for all associated queries, which allows the current list to remain in the CPU cache, thereby improving overall performance.

To support deletion, the FAISS IVF index maintains a map from each vector ID to the vector’s list ID and in-list position. In RAIRS, we modify the map to include up to 2 list IDs and in-list positions for each vector ID. Given a vector ID to be deleted, if the in-list position is in a shared-block, we set the corresponding ID in the packed block to an invalid ID. If the position is in a misc-block, we replace this entry with the last vector ID in misc-blocks. Since each vector is assigned up to two lists, RAIRS modifies up to twice as many entries compared to the baseline IVFPQfs for a deletion.

Table 2. Data sets used in the experiments.
Data Set Distance #Dim #Item #Query Size
SIFT1M (Jegou and Amsaleg, 2010) Euclidean 128 1,000,000 10,000 488MB
SIFT1B (Jegou and Amsaleg, 2010) Euclidean 128 1,000,000,000 10,000 477GB
MSong (Lidy, 2015) Euclidean 420 994,185 1,000 1.56GB
GIST (Jegou and Amsaleg, 2010) Euclidean 960 1,000,000 1,000 3.58GB
OpenAI (Zilliztech, 2023) Euclidean 1536 5,000,000 1,000 28.61GB
T2I (Baranchuk and Babenko, 2021) Inner product 200 10,000,000 100,000 7.45GB

Solutions to Compare. We compare the following popular ANNS indexing methods and redundant assignment strategies:

  • IVF (Sivic and Zisserman, 2003): Class IVFFlat in Faiss 1.8.0. This is the plain IVF index, whose list traversal performs accurate distance computation. We use the same IVF parameters as RAIRS in each data set.

  • HNSW (Malkov and Yashunin, 2020): Class HNSW in Faiss 1.8.0. This is a widely-used graph-based ANNS index. We follow the default settings in its original work (Malkov and Yashunin, 2020) (e.g., efConstruction=500efConstruction=500) except for MM. We set M=32M=32, which is the default setting of Faiss, because it achieves better performance than M=16M=16 of the original work.

  • IVFPQfs (André et al., 2015): The IVFPQFastScan in Faiss 1.8.0 implements IVF-PQ Fast Scan. A refine layer is added to improve the recalls. The main distinctions between IVFPQfs and RAIRS are the RAIR assignment and the SEIL layout. IVFPQfs performs the baseline single assignment. We use the same parameters as RAIRS.

  • NaïveRA (Chen et al., 2021): We implement the naïve strategy of redundant assignment. It uses the IVF-PQ Fast Scan with Refine structure and the same parameters as RAIRS. SEIL is not enabled by default.

  • SOARL2 (Sun et al., 2023): We replace the redundant assignment strategy of NaïveRA with SOAR. Please note that SOAR is originally designed for the inner product distance. Here, we directly apply SOAR to the L2 distance in the Euclidean space.

  • RAIR and SRAIR: RAIR and Strict RAIR (cf. Section 4.2) without the SEIL list layout optimization.

  • RAIRS and SRAIRS: RAIR and Strict RAIR with the SEIL list layout optimization.

Thread-level parallelism is handled by the Faiss library. We ensure that all solutions run with the same number of threads and the same parallelization scheme. Hyper-threading is not used. For every index with the IVF-PQ Fast Scan + Refine structure, we consistently employ the cache optimization for query batches.

Data Sets. We use the following representative real-world data sets covering a diverse range of vector dimensions, vector counts, and application scenarios in the experiments. The features of the data sets are summarized in Table 2.

  • SIFT (Jegou and Amsaleg, 2010): SIFT (Scale-Invariant Feature Transform) descriptors are derived from an image data set. We use two SIFT data sets, i.e., SIFT1M and SIFT1B, which contain 1 million and 1 billion vectors, respectively. In SIFT1B, the original descriptors are stored as 8-bit integers; for consistency with the other data sets, we convert them to 32-bit floats during pre-processing.

  • MSong (Lidy, 2015): The million song data set contains features for a million popular music tracks.

  • GIST (Jegou and Amsaleg, 2010): GIST descriptors are generated from an image data set, which capture the spatial structure of scenes.

  • OpenAI: This OpenAI embedding data set is generated from an open sourced C4 data set from the Common Crawl data. We download it using the command line tool of VectorDBBench (Zilliztech, 2023) with L2 distance as the metric type.

  • T2I (Baranchuk and Babenko, 2021): The Yandex Text-to-Image data set contains image embeddings as data vectors and text embeddings as queries.

Since RAIRS targets the Euclidean space, our main experiments use SIFT, MSong, GIST, and OpenAI. Unlike the other data sets, the distance metric of T2I is inner product. We use T2I to study the applicability of SEIL to the original SOAR in Section 6.4.

Parameter Setting. By default, we set nlist=1024nlist=1024 except for data set OpenAI (Zilliztech, 2023) (nlist=2048nlist=2048), T2I (Baranchuk and Babenko, 2021) (nlist=3172nlist=3172), and SIFT1B (Jegou and Amsaleg, 2010)(nlist=32768nlist=32768). These nlistnlist values are close to O(#Item)O(\sqrt{\#Item}) of each data set, as suggested in the Faiss library (Douze, 2024). For PQ Encoding, we set the number of sub-groups MPQM_{PQ}=#Dim2\tfrac{\#Dim}{2}, and the bit length of the code word for each sub-group nbitsPQ=4nbits_{PQ}=4. In the refine module, we set K_FACTOR=10K\_FACTOR=10 for top-1 and top-10 queries. (10 is a common setting used for faiss-ivfpqfs experiments in ANN-Benchmark results (Aumüller et al., 2020).) For top-100 queries, we set K_FACTOR=4K\_FACTOR=4 to balance the time for list traversal and vector refinement. For RAIRS, we set λ=0.5\lambda=0.5 and N_CANDS=10N\_CANDS=10 based on our experimental study of parameter settings in Section 6.3.

At query time, we vary the search parameter (e.g., nprobenprobe in IVF-based indices and efSearchefSearch in HNSW) to achieve different trade-offs between query speed and search quality, which contribute to the points of the reported curves in the experimental results.

Refer to caption
(a) Comparison with popular ANNS methods
Refer to caption
(b) Varying assignment strategies
Refer to caption
(c) DCO of assignment strategies
Figure 7. Overall ANNS performance.
Table 3. Percentage of vectors whose 2nd-choice centroid under SOARL2 matches that under AIR.
SIFT1M SIFT1B MSong GIST OpenAI
SOARL2 95.14% 72.10% 93.93% 91.55% 93.40%

Performance Metrics. 1) Recall-QPS. Since ANNS inherently involves a trade-off between accuracy and speed, its performance cannot be fully captured by throughput alone. Hence, we report recall-QPS curves, which reflect the balance between retrieval quality and processing efficiency. For top-KK queries, the recall kk@KK is the average percentage of true top-KK nearest neighbors in the query result. The query throughput is reported as QPS (Queries Per Second). 2) Recall-DCO. An important cost of ANNS query processing is distance computation (Gao and Long, 2023). DCO is the number of Distance Computing Operations per query. We use DCO to understand the effectiveness of redundant assignment strategies. Similar to recall-QPS curves, a recall-DCO curve is constructed by plotting the recall kk@KK against the corresponding DCO while varying the search parameter (e.g., nprobenprobe).

6.2. Overall Performance

Comparison with Popular ANNS Methods. Figure 7a shows the Recall-QPS curves of IVF, HNSW, IVFPQfs, and RAIRS on five representative real-world data sets. Each row of sub-figures correspond to a data set. The left column reports results for top-1 queries, while the right column displays performance for top-10 queries. In each plot, the closer to the top-right corner, the better performance. For each search parameter and the resulting recall, we run the experiments for 10 times and report the average QPS.

As shown in Figure 7a, IVFPQfs and RAIRS achieve significantly higher performance than IVF and HNSW. This performance advantage comes mainly from PQ Fast Scan (André et al., 2015)’s SIMD acceleration, which processes packed blocks in the list traversal. This approach is substantially faster than distance computation for each individual vector. In addition, the refine layer compensates for the decrease of accuracy caused by the PQ encoding.

Among the ANNS methods, our proposed RAIRS achieves the best performance. It combines the RAIR redundant assignment and the SEIL optimized list layout to improve the IVF-based ANNS performance. Compared to the second best performing method (i.e., IVFPQfs), RAIRS achieves up to 1.33x speedup in query throughput with similar recalls across all the real-world data sets.

Comparison of Various Assignment Strategies. Figure 7b shows the Recall-QPS curves of five assignment strategies for top-1 and top-10 queries on the five real-world data sets. The five solutions are all based on IVF-PQ Fast Scan with refinement. From Figure 7b, we make the following observations. (1) Compared to the baseline single assignment (i.e., IVFPQfs), redundant assignment with NaïveRA is not better, especially at high recalls. NaïveRA is actually worse than IVFPQfs for top-10 queries on GIST. This indicates the importance of an optimized list selection strategy. (2) Among all redundant assignment strategies, RAIRS achieves the best performance across all the data sets. At 0.95 recall, RAIRS achieves throughput improvement of 1.07–1.33x, 1.11–1.32x, and 1.01–1.23x compared to IVFPQfs, NaïveRA, and SOARL2, respectively. (3) SRAIRS with strict two assignments per vector are comparable to RAIRS on SIFT1M and SIFT1B, but worse than RAIRS on MSong, GIST, and OpenAI. This means that when the first and second chosen lists are the same, it is better to store the vector item only once. (4) RAIRS is significantly better than SOARL2 in most cases because unlike SOAR, RAIRS is optimized for the Euclidean space. For SIFT1M at recall 1@1, and SIFT1M and OpenAI at recall 10@10, SOARL2 exhibits performance comparable to RAIRS. AIR prefers to select a cc^{\prime} such that r=cxr^{\prime}=c^{\prime}-x is closer to the inverse of r=cxr=c-x (i.e., the closer θ\theta to 180 degrees, the better). However, depending on the given data set and the vector in consideration, the angle θ\theta for the actual cc^{\prime} may be much smaller than 180 degrees, resulting in similar assignments as in SOARL2. Table 3 reports the percentage of vectors whose 2nd-choice centroid under SOARL2 matches the 2nd-choice under AIR. We see that AIR and SOARL2 chooses the same assignment for 72.10%–95.14% vectors. For the above cases where SOARL2 and RAIRS have similar performance (i.e., SIFT1M and OpenAI), there are high percentages of 2nd-choice matches. From another angle, the 4.86%–27.90% different assignments lead to the performance benefit of RAIRS over SOARL2. Generally, the larger the difference, the larger the potential benefit of RAIRS.

Refer to caption
Figure 8. Recalls varying nprobenprobe on SIFT1M.
Refer to caption
Figure 9. Distribution of recalls and DCOs on SIFT1B.

Understanding Assignment Strategies with DCO. Figure 7c shows the Recall-DCO curves of various assignment strategies. The closer to the bottom-right corner, the better. The Recall-DCO curves display similar trends as the Recall-QPS curves in Figure 7b. Since the solutions are all based on IVF-PQ Fast Scan with refinement, the difference in query throughput is primarily due to difference in the number of computed distances. At 0.95 recall, RAIRS reduces the DCO by factors of 0.64–0.83x, 0.62–0.78x, and 0.73–0.99x compared to IVFPQfs, NaïveRA, and SOARL2, respectively.

Note that the benefit of RAIRS comes not only from higher recalls but also from lower DCOs, as evidenced by Figure 7c. RAIRS can achieve better recalls with the same DCOs, and the same recalls with lower DCOs (which result from lower nprobenprobe’s). Figure 8 plots recalls while varying nprobenprobe. Comparing the nprobenprobe’s for achieving similar recalls, we see that the nprobenprobe’s of SRAIRS and RAIRS are 42.3%–46.5% and 48.1%–53.1% of the setting of the baseline single assignment IVFPQfs, respectively. For RAIRS, since a faction of vectors are assigned only once, achieving recalls comparable to SRAIRS can require a slightly larger nprobenprobe.

Given the search parameter, DCO is deterministic for a given set of queries on a given ANNS index. In contrast, QPS is less stable due to run-time variations. Therefore, for solutions based on IVF-PQ Fast Scan with refinement, we show the ANNS performance in DCO in the rest of the evaluation.

Distribution of Recalls and DCOs. For the SIFT1B + 0.95 recall experiment in Figure 7, we compute the recall and the DCO for each of the 10,000 queries in the top-10 search experiment. We plot the CDF (Cumulative Distribution Function) of recalls and DCOs in Figure 9. From the figure, we see that the recall CDF curves almost overlap, but RAIRS’s DCO curve is clearly to the left of IVFPQfs’s DCO curve. This means that compared to IVFPQfs, RAIRS significantly reduces the DCOs for almost all queries while achieving similar recalls. Moreover, the recall variance is low; over 89% of queries achieve 0.8–1.0 recalls. The p99 DCOs of RAIRS is 1.50x of the average. The DCO variance across queries is moderate.

Refer to caption
Figure 10. Performance for top-100 queries.
Refer to caption
Figure 11. Performance for one query at a time.

Performance for Top-100 Queries. Figure 10 compares the Recall-DCO curves of IVFPQfs, NaïveRA, SOARL2, and RAIRS for top-100 queries. We see that RAIRS achieves the best performance across all the data sets, which is consistent with the results for top-1 and top-10 queries in Figure 7c.

Performance for One-Query-at-a-Time. In this experiment, we run one query at a time to understand the single-threaded query latency without the cache optimization for query batches. Figure 11 shows the query latency-recall curves for SIFT1M and GIST1M. We see that RAIRS achieves the lowest latency among all single assignment and 2-assignment strategies. The benefit of RAIRS is similar to that seen in the batched-query scenarios.

Refer to caption
Figure 12. Performance of insertions and deletions.

Performance of Insertions and Deletions. Figure 12 shows the performance of vector insertions and deletions for RAIRS and IVFPQfs on SIFT1M. For insertions, we construct the index with 800,000 vectors, then perform five insertion batches, each inserting 40,000 new vectors. For deletions, we build the index with the full data set, then execute five batches of deletions, each removing 40,000 vectors. Compared to the baseline IVFPQfs, RAIRS’s insertion and deletion throughput is 12.2% and 4.4% lower. This is because each vector is assigned up to two lists and RAIRS modifies up to twice as many entries as IVFPQfs. Please note that ANNS is typically much more frequent than vector insertion and deletion operations. While incurring slight extra cost for insertions and deletions, RAIRS achieves higher ANNS query performance.

6.3. In-depth Analysis

Refer to caption
(a) ANNS performance
Refer to caption
(b) Memory cost
Figure 13. Ablation study for RAIR and SEIL on SIFT1M.
Table 4. Memory cost for data sets with Euclidean distance.
Data Set IVFPQfs NaïveRA NaïveRA+SEIL RAIR RAIRS
SIFT1M 38.2 MB 76.4 MB 52.0 MB 62.5 MB 56.25 MB
SIFT1B 37.3 GB 74.6 GB 42.9 GB 69.7 GB 43.3 GB
MSong 107.2 MB 214.5 MB 145.0 MB 162.8 MB 152.4 MB
GIST 240.6 MB 478.5 MB 325.0 MB 403.3 MB 358.7 MB
OpenAI 1.83 GB 3.66 GB 2.39 GB 3.15 GB 2.53 GB

Ablation Study for RAIR and SEIL. Figure 13a compares the DCO of NaïveRA, SRAIR, and RAIR with and without the SEIL list layout optimization on the SIFT1M data set. For each compared solution, we report the DCO for the setting where the query recall just exceeds 95%. In the figures, the lower the DCO, the better.

We see that RAIR out-performs NaïveRA with or without SEIL. Without SEIL, RAIR cuts down DCO by 24.0% and 14.9% compared to NaïveRA for top-1 and top-10 queries, respectively. With SEIL, the improvement is by 18.8% and 8.0%, respectively.

Moreover, SEIL effectively reduces redundant distance computation. For top-1 and top-10 queries, SEIL reduces DCO by 4.1%–12.0% for NaïveRA, SRAIR, and RAIR.

Finally, comparing RAIR and SRAIR, we see that RAIR achieves lower DCO. When the first and the second assigned lists are the same, RAIR performs single assignment, while SRAIR strictly selects a second list from the rest of the lists to assign the vector. The results in Figure 13a indicates that this strict strategy is sub-optimal.

Refer to caption
Figure 14. Multiple assignment on SIFT1M.
Refer to caption
(a) Recall-DCO curve varying λ\lambda
Refer to caption
(b) Distribution of rank of list minimizing AIR
Figure 15. Parameter study (SIFT1M).

Memory Cost. Figure 13b displays the memory cost of NaïveRA, SRAIR, and RAIR with and without SEIL on SIFT1M. Table 4 reports the memory cost of the baseline IVPQfs and four redundant assignment solutions for five representative data sets. Since the refine layer is the same, we exclude the refine layer and report only the space of the IVF-PQ module.

We see that NaïveRA doubles the memory space used in the baseline IVPQfs due to redundant assignment. SEIL stores shared blocks of large cells only once, thereby significantly reducing the memory cost of redundant assignment. Overall, SEIL reduces the memory cost by 6.4%–42.5% for NaïveRA, SRAIR, and RAIR.

Note that the refine layer stores all the vector data. Compared to the refine layer, the baseline IVF-PQ module takes 6.4%–7.5% space, and the additional space for RAIRS is 1.2%–3.7%. Thus, RAIR trades off slight extra space for significantly better ANNS performance.

Multiple Assignment. The left part of Figure 14 compares the three aggregation functions (i.e., maxmax, minmin, and avgavg) for 3-assignment on SIFT1M. maxmax achieves the lowest DCO. The right part of Figure 14 compares single assignment (i.e., IVFPQfs), 2-assignment, 3-assignment, and 4-assignment. We choose the maxmax aggregation function. SEIL is disabled since it is designed for 2-assignment. We consider the strict strategy (i.e., SRAIR) because RAIR essentially blurs the distinction among multiple assignments. From the figure, we see that SRAIR-2 is the best performing. This result indicates that over two assignment is unnecessary.

Parameter Study for λ\lambda. Figure 15a shows recall-DCO curves of RAIRS on SIFT1M while varying λ\lambda from 0 to 1. As λ\lambda increases, the curve shifts to the bottom-right, showing improved performance. The curve stops improving when λ\lambda reaches 0.5. Thus, we set the default value of λ\lambda to 0.5.

Parameter Study for N_CANDSN\_CANDS. We modify Algorithm 3 so that RairAssignRairAssign retrieves all the lists as the candidates in the ascending order of the distance between the list centroid and the vector to insert, and computes the list that minimizes the AIR metric. This gives the true rank in the sorted lists that should be assigned by AIR. Figure 15b depicts the Cumulative Distribution Function (CDF) of the true rank for all vectors in SIFT1M. We see that in 99.95% of the cases, the true rank \leq 10. This means that considering the top-10 nearest lists achieves the correct assignment for most cases. Hence, we set N_CANDS=10N\_CANDS=10 by default.

Refer to caption
Figure 16. Varying block size. (Default block size = 32)

Parameter Study for PQ Block Size. To understand the impact of block size on SEIL, we vary the block size, study the change of the number of vectors in misc-blocks, and compute the resulting DCOs for a given nprobenprobe. As shown in Figure 16, as the block size increases, there are more vectors in misc-blocks. This is because the number of large cells (whose size \geq block size) decreases. SEIL performs more redundant DCOs for the vectors in misc-blocks.

Refer to caption
Figure 17. Applying SEIL to SOAR on T2I.

6.4. Applying SEIL to SOAR

We apply SEIL to SOAR, which targets the inner product distance. Figure 17 compares the recall-DCO curves of SOAR and SOAR+SEIL on T2I, which uses the inner product distance as the distance metric.

As shown in Figure 17, we see that SEIL significantly reduces the DCO of SOAR. This result indicates the applicability of the SEIL list layout optimization to various redundant assignment strategies and distance metrics.

7. Related Work

ANNS Methods. ANNS support has been implemented in both vector databases and general-purpose database systems (Wang et al., 2021; Guo et al., 2022; Yang et al., 2020; Wei et al., 2020; Li et al., 2018a; Zhang et al., 2023b; Wang et al., 2024). Existing ANNS methods are divided into four categories: tree-based (Bentley, 1975; Leibe et al., 2006; Yianilos, 1993; Muja and Lowe, 2009, 2014), hashing-based (Andoni and Indyk, 2008; Bawa et al., 2005; Lv et al., 2007; Wang et al., 2017; Weiss et al., 2008; Wang et al., 2006; Li et al., 2018b), graph-based (Malkov and Yashunin, 2020; Toussaint, 1980; Harwood and Drummond, 2016; Peng et al., 2023; Fu et al., 2019; Subramanya et al., 2019; Ren et al., 2020; Zhao et al., 2020), and cluster-based (Sivic and Zisserman, 2003; Babenko and Lempitsky, 2014b; Gupta et al., 2022; Guo et al., 2020) approaches. IVF-based ANNS methods (Sivic and Zisserman, 2003), which are a representative type of cluster-based approaches, are among the most widely used and highest performing ANNS methods. In this paper, we mainly focus on IVF-based ANNS.

Optimization Approaches for IVF-Based ANNS Methods. There are four main approaches to optimizing IVF-based ANNS: 1) quantization, 2) distance computation optimization, 3) hardware acceleration, and 4) redundant assignment. We review related work on approach 1–3 in the following. Redundant assignment is the focus of this work. We have discussed and experimentally compared the existing redundant assignment strategies.

Approach 1: Quantization. Quantization techniques are widely used to mitigate storage overhead and accelerate similarity search. Existing quantization methods can be categorized by the granularity of dimensional grouping: single-dimension quantization (including VA_File (Weber et al., 1998a), ITQ (Gong et al., 2012), LVQ (Aguerrebere et al., 2023), and RabitQ (Gao and Long, 2024)), global quantization (including AQ (Babenko and Lempitsky, 2014a) and LSQ (Martinez et al., 2016)), and product quantization (including PQ (Jegou et al., 2010), OPQ (Ge et al., 2013), PolysemousCodes (Douze et al., 2016), NEQ (Dai et al., 2020), DeltaPQ (Wang and Deng, 2020), VAQ (Paparrizos et al., 2022), QAQ (Zhang et al., 2023a), and SparCode (Su et al., 2023)).

Approach 2: Sampling-Based Distance Computation. Given the high dimensionality, distance computation often dominates the ANNS query response time (Gao and Long, 2023). ADSampling (Gao and Long, 2023) leverages the Johnson–Lindenstrauss (JL) lemma and enable early termination by sampling a subset of dimensions during distance comparisons, effectively reducing the number of computed dimensions.

Approach 3: Hardware Acceleration. Various hardware features have been studied in the context of IVF-based ANNS. SPANN (Chen et al., 2021; Xu et al., 2023) exploits SSDs to reduce the in-memory footprint and provide data persistence. GPUs (Johnson et al., 2019) and FPGAs (Zhang et al., 2018; Danopoulos et al., 2019) have been shown to significantly accelerate the ANNS processing. PQ Fast Scan (André et al., 2015), Bolt (Blalock and Guttag, 2017), and Quicker ADC (André et al., 2019) exploit SIMD instructions to accelerate the distance computation.

Relationship to Our Work: In general, our proposed ideas (i.e., RAIR and SEIL) are complementary to the above three optimization approaches. In this paper, we specifically use IVF with PQ quantization, refinement, and PQ fast scan as the baseline, which is one of the best performing IVF-based methods in practice. We have tailored our SEIL optimization to support PQ fast scan.

8. Conclusion

In conclusion, we propose RAIRS, combining two optimization techniques (i.e., RAIR and SEIL), for IVF-based ANNS in this paper. RAIR exploits redundant assignment to improve performance for queries that are poorly served by the first assigned lists. We formally derive the AIR list selection metric. Moreover, SEIL optimizes the list layout and exploits shared cells to reduce redundant distance computation. Our experimental results on representative real-world data sets confirm that RAIR and SEIL can effectively reduce DCO and improve ANNS performance for IVF-based ANN search.

References

  • C. Aguerrebere, I. S. Bhati, M. Hildebrand, M. Tepper, and T. Willke (2023) Similarity search in the blink of an eye with compressed indices. Proceedings of the VLDB Endowment 16 (11), pp. 3433–3446. Cited by: §7.
  • A. Andoni and P. Indyk (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM 51 (1), pp. 117–122. Cited by: §7.
  • F. André, A. Kermarrec, and N. Le Scouarnec (2015) Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. Proceedings of the VLDB Endowment 9 (4), pp. 288–299. Cited by: 2nd item, 3rd item, §2.2, 3rd item, §6.2, §7.
  • F. André, A. Kermarrec, and N. Le Scouarnec (2019) Quicker adc: unlocking the hidden potential of product quantization with simd. IEEE transactions on pattern analysis and machine intelligence 43 (5), pp. 1666–1677. Cited by: 3rd item, §7.
  • M. Aumüller, E. Bernhardsson, and A. Faithfull (2020) ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. Information Systems 87, pp. 101374. Cited by: §2.2, §5.3, §6.1.
  • A. Babenko and V. Lempitsky (2014a) Additive quantization for extreme vector compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 931–938. Cited by: §7.
  • A. Babenko and V. Lempitsky (2014b) The inverted multi-index. IEEE transactions on pattern analysis and machine intelligence 37 (6), pp. 1247–1260. Cited by: §7.
  • D. Baranchuk and A. Babenko (2021) External Links: Link Cited by: 5th item, §6.1, Table 2.
  • M. Bawa, T. Condie, and P. Ganesan (2005) LSH forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web, pp. 651–660. Cited by: §7.
  • J. L. Bentley (1975) Multidimensional binary search trees used for associative searching. Communications of the ACM 18 (9), pp. 509–517. Cited by: §7.
  • D. W. Blalock and J. V. Guttag (2017) Bolt: accelerated data mining with fast vector compression. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–735. Cited by: 3rd item, §7.
  • S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J. Lespiau, B. Damoc, A. Clark, et al. (2022) Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pp. 2206–2240. Cited by: §1.
  • Q. Chen, B. Zhao, H. Wang, M. Li, C. Liu, Z. Li, M. Yang, and J. Wang (2021) SPANN: highly-efficient billion-scale approximate nearest neighborhood search. Advances in Neural Information Processing Systems 34, pp. 5199–5212. Cited by: §2.3, 4th item, §7.
  • T. Cover and P. Hart (1967) Nearest neighbor pattern classification. IEEE transactions on information theory 13 (1), pp. 21–27. Cited by: §1.
  • X. Dai, X. Yan, K. K. Ng, J. Liu, and J. Cheng (2020) Norm-explicit quantization: improving vector quantization for maximum inner product search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 51–58. Cited by: §7.
  • D. Danopoulos, C. Kachris, and D. Soudris (2019) Fpga acceleration of approximate knn indexing on high-dimensional vectors. In 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), pp. 59–65. Cited by: §7.
  • M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P. Mazaré, M. Lomeli, L. Hosseini, and H. Jégou (2024) The faiss library. External Links: 2401.08281 Cited by: §5.3, §6.1.
  • M. Douze, H. Jégou, and F. Perronnin (2016) Polysemous codes. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 785–801. Cited by: §7.
  • M. Douze (2024) External Links: Link Cited by: §6.1.
  • C. Fu, C. Xiang, C. Wang, and D. Cai (2019) Fast approximate nearest neighbor search with the navigating spreading-out graph. Proceedings of the VLDB Endowment 12 (5), pp. 461–474. Cited by: §7.
  • J. Gao and C. Long (2023) High-dimensional approximate nearest neighbor search: with reliable and efficient distance comparison operations. Proceedings of the ACM on Management of Data 1 (2), pp. 1–27. Cited by: §6.1, §7.
  • J. Gao and C. Long (2024) RaBitQ: quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search. Proceedings of the ACM on Management of Data 2 (3), pp. 1–27. Cited by: §7.
  • T. Ge, K. He, Q. Ke, and J. Sun (2013) Optimized product quantization for approximate nearest neighbor search. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2946–2953. Cited by: §7.
  • Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE transactions on pattern analysis and machine intelligence 35 (12), pp. 2916–2929. Cited by: §7.
  • R. Guo, X. Luan, L. Xiang, X. Yan, X. Yi, J. Luo, Q. Cheng, W. Xu, J. Luo, F. Liu, et al. (2022) Manu: a cloud native vector database management system. Proceedings of the VLDB Endowment 15 (12), pp. 3548–3561. Cited by: §7.
  • R. Guo, P. Sun, E. Lindgren, Q. Geng, D. Simcha, F. Chern, and S. Kumar (2020) Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, pp. 3887–3896. Cited by: §7.
  • G. Gupta, T. Medini, A. Shrivastava, and A. J. Smola (2022) Bliss: a billion scale index using iterative re-partitioning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 486–495. Cited by: §7.
  • B. Harwood and T. Drummond (2016) Fanng: fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5713–5722. Cited by: §7.
  • P. Indyk and R. Motwani (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 604–613. Cited by: §2.1.
  • H. Jegou and L. Amsaleg (2010) External Links: Link Cited by: §4.2, 1st item, 3rd item, §6.1, Table 2, Table 2, Table 2.
  • H. Jegou, M. Douze, and C. Schmid (2010) Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33 (1), pp. 117–128. Cited by: 1st item, §7.
  • J. Johnson, M. Douze, and H. Jégou (2019) Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7 (3), pp. 535–547. Cited by: §7.
  • B. Leibe, K. Mikolajczyk, and B. Schiele (2006) Efficient clustering and matching for object class recognition.. In BMVC, pp. 789–798. Cited by: §7.
  • J. Li, H. Liu, C. Gui, J. Chen, Z. Ni, N. Wang, and Y. Chen (2018a) The design and implementation of a real time visual search system on jd e-commerce platform. In Proceedings of the 19th International Middleware Conference Industry, pp. 9–16. Cited by: §7.
  • J. Li, X. Yan, J. Zhang, A. Xu, J. Cheng, J. Liu, K. K. Ng, and T. Cheng (2018b) A general and efficient querying method for learning to hash. In Proceedings of the 2018 International Conference on Management of Data, pp. 1333–1347. Cited by: §7.
  • T. Lidy (2015) External Links: Link Cited by: 2nd item, Table 2.
  • Y. Liu, D. Zhang, G. Lu, and W. Ma (2007) A survey of content-based image retrieval with high-level semantics. Pattern recognition 40 (1), pp. 262–282. Cited by: §1.
  • Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li (2007) Multi-probe lsh: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, pp. 950–961. Cited by: §7.
  • Y. A. Malkov and D. A. Yashunin (2020) Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42 (4), pp. 824–836. Cited by: 2nd item, §7.
  • J. Martinez, J. Clement, H. H. Hoos, and J. J. Little (2016) Revisiting additive quantization. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp. 137–153. Cited by: §7.
  • M. Muja and D. G. Lowe (2009) Fast approximate nearest neighbors with automatic algorithm configuration.. In International Conference on Computer Vision Theory and Applications, Vol. 2, pp. 2. Cited by: §7.
  • M. Muja and D. G. Lowe (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE transactions on pattern analysis and machine intelligence 36 (11), pp. 2227–2240. Cited by: §7.
  • J. Paparrizos, I. Edian, C. Liu, A. J. Elmore, and M. J. Franklin (2022) Fast adaptive similarity search through variance-aware quantization. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 2969–2983. Cited by: §7.
  • Y. Peng, B. Choi, T. N. Chan, J. Yang, and J. Xu (2023) Efficient approximate nearest neighbor search in multi-dimensional databases. Proceedings of the ACM on Management of Data 1 (1), pp. 1–27. Cited by: §7.
  • J. Ren, M. Zhang, and D. Li (2020) HM-ann: efficient billion-point nearest neighbor search on heterogeneous memory. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vol. 33, pp. 10672–10684. Cited by: §7.
  • J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen (2007) Collaborative filtering recommender systems. In The adaptive web: methods and strategies of web personalization, pp. 291–324. Cited by: §1.
  • Sivic and Zisserman (2003) Video google: a text retrieval approach to object matching in videos. In Proceedings ninth IEEE international conference on computer vision, pp. 1470–1477. Cited by: 1st item, §7.
  • L. Su, F. Yan, J. Zhu, X. Xiao, H. Duan, Z. Zhao, Z. Dong, and R. Tang (2023) Beyond two-tower matching: learning sparse retrievable cross-interactions for recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 548–557. Cited by: §7.
  • S. J. Subramanya, Devvrit, R. Kadekodi, R. Krishaswamy, and H. V. Simhadri (2019) DiskANN: fast accurate billion-point nearest neighbor search on a single node. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 13766–13776. Cited by: §7.
  • P. Sun, D. Simcha, D. Dopson, R. Guo, and S. Kumar (2023) SOAR: improved indexing for approximate nearest neighbor search. In Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 3189–3204. Cited by: §2.3, 5th item.
  • G. T. Toussaint (1980) The relative neighbourhood graph of a finite planar set. Pattern recognition 12 (4), pp. 261–268. Cited by: §7.
  • J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, et al. (2021) Milvus: a purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pp. 2614–2627. Cited by: §5.3, §7.
  • J. Wang, T. Zhang, N. Sebe, H. T. Shen, et al. (2017) A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 769–790. Cited by: §7.
  • M. Wang, W. Xu, X. Yi, S. Wu, Z. Peng, X. Ke, Y. Gao, X. Xu, R. Guo, and C. Xie (2024) Starling: an i/o-efficient disk-resident graph index framework for high-dimensional vector similarity search on data segment. Proceedings of the ACM on Management of Data 2 (1), pp. 1–27. Cited by: §7.
  • R. Wang and D. Deng (2020) DeltaPQ: lossless product quantization code compression for high dimensional similarity search. Proceedings of the VLDB Endowment 13 (13), pp. 3603–3616. Cited by: §7.
  • X. Wang, L. Zhang, F. Jing, and W. Ma (2006) Annosearch: image auto-annotation by search. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2, pp. 1483–1490. Cited by: §7.
  • R. Weber, H. Schek, and S. Blott (1998a) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98, pp. 194–205. Cited by: §7.
  • R. Weber, H. Schek, and S. Blott (1998b) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98, pp. 194–205. Cited by: §2.1.
  • C. Wei, B. Wu, S. Wang, R. Lou, C. Zhan, F. Li, and Y. Cai (2020) AnalyticDB-v: a hybrid analytical engine towards query fusion for structured and unstructured data. Proceedings of the VLDB Endowment 13 (12), pp. 3152–3165. Cited by: §7.
  • Y. Weiss, A. Torralba, and R. Fergus (2008) Spectral hashing. Advances in neural information processing systems 21. Cited by: §7.
  • Y. Wu, M. N. Rabe, D. Hutchins, and C. Szegedy (2022) Memorizing transformers. arXiv preprint arXiv:2203.08913. Cited by: §1.
  • Y. Xu, H. Liang, J. Li, S. Xu, Q. Chen, Q. Zhang, C. Li, Z. Yang, F. Yang, Y. Yang, et al. (2023) SPFresh: incremental in-place update for billion-scale vector search. In Proceedings of the 29th Symposium on Operating Systems Principles, pp. 545–561. Cited by: §7.
  • W. Yang, T. Li, G. Fang, and H. Wei (2020) Pase: postgresql ultra-high-dimensional approximate nearest neighbor search extension. In Proceedings of the 2020 ACM SIGMOD international conference on management of data, pp. 2241–2253. Cited by: §7.
  • P. N. Yianilos (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM symposium on Discrete algorithms, pp. 311–321. Cited by: §7.
  • J. Zhang, J. Li, and S. Khoram (2018) Efficient large-scale approximate nearest neighbor search on opencl fpga. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4924–4932. Cited by: §7.
  • J. Zhang, D. Lian, H. Zhang, B. Wang, and E. Chen (2023a) Query-aware quantization for maximum inner product search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, pp. 4875–4883. Cited by: §7.
  • Q. Zhang, S. Xu, Q. Chen, G. Sui, J. Xie, Z. Cai, Y. Chen, Y. He, Y. Yang, F. Yang, et al. (2023b) {\{vbase}\}: Unifying online vector similarity search and relational queries via relaxed monotonicity. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pp. 377–395. Cited by: §7.
  • W. Zhao, S. Tan, and P. Li (2020) Song: approximate nearest neighbor search on gpu. In 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1033–1044. Cited by: §7.
  • Zilliztech (2023) External Links: Link Cited by: §5.3, 4th item, §6.1, Table 2.

Appendix A Proof of Theorem 4.1

Proof.

Let QlQ_{l} be the set of queries that are uniformly distributed over the hypersphere centered at xx with radius ll.

We may expand the expectation as follows:

L(c,c,Ql)=ER=l[ReLU(cosα)(qc2qx2)]\displaystyle\quad L(c^{\prime},c,Q_{l})=E_{||R||=l}[{ReLU}(-\cos\alpha)\cdot(||q-c^{\prime}||^{2}-||q-x||^{2})]
=π2πcosαER=l[(Rr)2R2|R,rRr=α]\displaystyle=\int_{\frac{\pi}{2}}^{\pi}-\cos\alpha\ E_{||R||=l}[(R-r^{\prime})^{2}-R^{2}\ |\ \frac{\langle R,r\rangle}{||R||\cdot||r||}=\alpha]
dP(R,rRr<α)\displaystyle\ \ dP(\frac{\langle R,r\rangle}{||R||\cdot||r||}<\alpha)

To evaluate this integral, we decompose RR and rr^{\prime} into components parallel and orthogonal to rr. As shown in Figure 4, we call these components RR_{\parallel}, RR_{\perp}, rr_{\parallel}^{\prime}, rr_{\perp}^{\prime}, respectively. Also, we let θ=cxc\theta=\angle cxc^{\prime}, β\beta is the angle between RR_{\perp} and rr_{\perp}^{\prime}. Then we simplify this inner expectation given fixed rr, rr^{\prime}, θ\theta, and α\alpha:

ER=l[(Rr)2R2|R,rRr=α]\displaystyle\quad E_{||R||=l}[(R-r^{\prime})^{2}-R^{2}\ |\ \frac{\langle R,r\rangle}{||R||\cdot||r||}=\alpha]
=ER=l[(Rr)2+(Rr)2R2R2|||R||=lcosα]\displaystyle=E_{||R||=l}[(R_{\parallel}-r_{\parallel}^{\prime})^{2}+(R_{\perp}-r_{\perp}^{\prime})^{2}-R_{\parallel}^{2}-R_{\perp}^{2}\ |\ ||R_{\parallel}||=l\cos\alpha]
=ER=l[r(r2R)+r22rR|||R||=lcosα]\displaystyle=E_{||R||=l}[r_{\parallel}^{\prime}(r_{\parallel}^{\prime}-2R_{\parallel})+r_{\perp}^{\prime 2}-2r_{\perp}^{\prime}R_{\perp}\ |\ ||R_{\parallel}||=l\cos\alpha]
=ER=l[||r||cosθ(||r||cosθ2lcosα)+||r||2sin2θ2rR|\displaystyle=E_{||R||=l}[||r^{\prime}||\cos\theta(||r^{\prime}||\cos\theta-2l\cos\alpha)+||r^{\prime}||^{2}\sin^{2}\theta-2r_{\perp}^{\prime}R_{\perp}\ |
||R||=lcosα]\displaystyle\quad||R_{\parallel}||=l\cos\alpha]
=Eβ[r22lrcosθcosα2lrsinθsinαcosβ]\displaystyle=E_{\beta}[||r^{\prime}||^{2}-2l||r^{\prime}||\cos\theta\cos\alpha-2l||r^{\prime}||\sin\theta\sin\alpha\cos\beta]
=r(r2lcosθcosα2lsinθsinαE[cosβ])\displaystyle=||r^{\prime}||\cdot(||r^{\prime}||-2l\cos\theta\cos\alpha-2l\sin\theta\sin\alpha E[\cos\beta])
=r(r2lcosθcosα)\displaystyle=||r^{\prime}||\cdot(||r^{\prime}||-2l\cos\theta\cos\alpha)

Meanwhile, dP(R,rRr<α)dP(\frac{\langle R,r\rangle}{||R||\cdot||r||}<\alpha) is proportional to the surface area of a (D1)(D-1)-dimensional hypersphere of radius lsinαl\sin\alpha. Thus, we express it as AsinD2αA\sin^{D-2}\alpha for some constant AA. Our integral then becomes

L(c,c,Ql)\displaystyle L(c^{\prime},c,Q_{l}) =π2πcosαr(r2lcosθcosα)AsinD2αdα\displaystyle=\int_{\frac{\pi}{2}}^{\pi}-\cos\alpha||r^{\prime}||(||r^{\prime}||-2l\cos\theta\cos\alpha)A\sin^{D-2}\alpha d\alpha
=Ar2π2πcosαsinD2αdα\displaystyle=-A||r^{\prime}||^{2}\int_{\frac{\pi}{2}}^{\pi}\cos\alpha\sin^{D-2}\alpha d\alpha
+ 2Alrcosθπ2πcos2αsinD2αdα\displaystyle\quad+2Al||r^{\prime}||\cos\theta\int_{\frac{\pi}{2}}^{\pi}\cos^{2}\alpha\sin^{D-2}\alpha d\alpha
=Ar2D1+Alrcosθ0π(sinD2αsinDα)𝑑α\displaystyle=\frac{A||r^{\prime}||^{2}}{D-1}\ +\ Al||r^{\prime}||\cos\theta\int_{0}^{\pi}(\sin^{D-2}\alpha-\sin^{D}\alpha)d\alpha

Now we define ID=0πsinDαdαI_{D}=\int_{0}^{\pi}\sin^{D}\alpha d\alpha.

ID\displaystyle I_{D} =20π/2sinDαdα\displaystyle=2\int_{0}^{\pi/2}\sin^{D}\alpha d\alpha
=2cosαsinD1α|0π/2\displaystyle=-2\cos\alpha\sin^{D-1}\alpha\ \bigg|_{0}^{\pi/2}
20π/2cosα((D1)cosαsinD2α)𝑑α\displaystyle\quad-2\int_{0}^{\pi/2}\cos\alpha(-(D-1)\cos\alpha\sin^{D-2}\alpha)d\alpha
=2(D1)0π/2cos2αsinD2αdα\displaystyle=2(D-1)\int_{0}^{\pi/2}\cos^{2}\alpha\sin^{D-2}\alpha d\alpha
=(D1)ID2(D1)ID\displaystyle=(D-1)I_{D-2}-(D-1)I_{D}

Combining terms, we have ID=D1DID2I_{D}=\frac{D-1}{D}I_{D-2}. Then the loss satisfies:

L(c,c,Ql)\displaystyle L(c^{\prime},c,Q_{l}) =Ar2D1+Alrcosθ(ID2ID)\displaystyle=\frac{A||r^{\prime}||^{2}}{D-1}\ +\ Al||r^{\prime}||\cos\theta(I_{D-2}-I_{D})
=Ar2D1+AlrcosθIDD1\displaystyle=\frac{A||r^{\prime}||^{2}}{D-1}\ +\ Al||r^{\prime}||\cos\theta\frac{I_{D}}{D-1}
=AD1(r2+IDlcosθr)\displaystyle=\frac{A}{D-1}(||r^{\prime}||^{2}+I_{D}l\cos\theta||r^{\prime}||)
=AD1(r2+IDlrrTr)\displaystyle=\frac{A}{D-1}(||r^{\prime}||^{2}+\frac{I_{D}l}{||r||}r^{T}r^{\prime})

Note that the hypersphere QQ consists of many hyperspherical surface QlQ_{l}. We solve L(c,c,Q)L(c^{\prime},c,Q):

L(c,c,Q)\displaystyle L(c^{\prime},c,Q) =0lmL(c,c,Ql)DlD1lmD𝑑l\displaystyle=\int_{0}^{l_{m}}L(c^{\prime},c,Q_{l})\cdot\frac{Dl^{D-1}}{l_{m}^{D}}dl
=0lmAD1(r2+IDlrrTr)DlD1lmD𝑑l\displaystyle=\int_{0}^{l_{m}}\frac{A}{D-1}(||r^{\prime}||^{2}+\frac{I_{D}l}{||r||}r^{T}r^{\prime})\frac{Dl^{D-1}}{l_{m}^{D}}dl
=AD1(r2+0lmDIDlDlmDrrTr𝑑l)\displaystyle=\frac{A}{D-1}(||r^{\prime}||^{2}+\int_{0}^{l_{m}}\frac{DI_{D}l^{D}}{l_{m}^{D}||r||}r^{T}r^{\prime}dl)
=AD1(r2+DIDlm(D+1)rrTr)\displaystyle=\frac{A}{D-1}(||r^{\prime}||^{2}+\frac{DI_{D}l_{m}}{(D+1)||r||}r^{T}r^{\prime})
r2+λrTr\displaystyle\propto||r^{\prime}||^{2}+\lambda r^{T}r^{\prime}

where λ=DIDlm(D+1)r>0\lambda=\frac{DI_{D}l_{m}}{(D+1)||r||}>0.

Refer to caption
Figure 18. Detailed geometric relationship of vectors.