0% found this document useful (0 votes)
28 views11 pages

Dyslexic 1

read

Uploaded by

neeraj.barhate23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

Dyslexic 1

read

Uploaded by

neeraj.barhate23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented

Large Language Models


Wenqi Fan Yujuan Ding∗ Liangbo Ning
[email protected] [email protected] [email protected]
The Hong Kong Polytechnic The Hong Kong Polytechnic The Hong Kong Polytechnic
University, HK SAR University, HK SAR University, HK SAR

Shijie Wang Hengyun Li Dawei Yin


[email protected] [email protected] [email protected]
The Hong Kong Polytechnic The Hong Kong Polytechnic Baidu Inc, China
University, HK SAR University, HK SAR

Tat-Seng Chua Qing Li


[email protected] [email protected]
National University of Singapore, The Hong Kong Polytechnic
Singapore University, HK SAR

ABSTRACT CCS CONCEPTS


As one of the most advanced techniques in AI, Retrieval-Augmented • General and reference → Surveys and overviews; • Comput-
Generation (RAG) can offer reliable and up-to-date external knowl- ing methodologies → Natural language generation; • Infor-
edge, providing huge convenience for numerous tasks. Particularly mation systems → Retrieval models and ranking.
in the era of AI-Generated Content (AIGC), the powerful capacity of
retrieval in providing additional knowledge enables RAG to assist KEYWORDS
existing generative AI in producing high-quality outputs. Recently,
Retrieval Augmented Generation (RAG), Large Language Model
Large Language Models (LLMs) have demonstrated revolutionary
(LLM), Pre-training, Fine-tuning, In-context Learning, Prompting
abilities in language understanding and generation, while still fac-
ing inherent limitations such as hallucinations and out-of-date in- ACM Reference Format:
ternal knowledge. Given the powerful abilities of RAG in providing Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei
the latest and helpful auxiliary information, Retrieval-Augmented Yin, Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs:
Large Language Models (RA-LLMs) have emerged to harness exter- Towards Retrieval-Augmented Large Language Models. In Proceedings of
nal and authoritative knowledge bases, rather than solely relying the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
on the model’s internal knowledge, to augment the quality of the (KDD ’24), August 25–29, 2024, Barcelona, Spain. ACM, New York, NY, USA,
11 pages. https://doi.org/10.1145/3637528.3671470
generated content of LLMs. In this survey, we comprehensively
review existing research studies in RA-LLMs, covering three pri-
mary technical perspectives: architectures, training strategies, and 1 INTRODUCTION
applications. Furthermore, to deliver deeper insights, we discuss
As one of the most fundamental data mining techniques, retrieval
current limitations and several promising directions for future re-
aims to understand the input query and extract relevant information
search. Updated information about this survey can be found at https:
from external data sources [23, 29, 60, 118]. It has found extensive
// advanced-recommender-systems.github.io/ RAG-Meets-LLMs/ 1 .
application in various fields [8, 27, 90, 144], such as search, ques-
tion answering, and recommender systems. For instance, search
∗ Corresponding
engines (e.g., Google, Bing, and Baidu) are the most successful ap-
author: Yujuan Ding
1 For the long version of this survey, please refer to https://arxiv.org/abs/2405.06211
plications of retrieval in the industry; they can filter and retrieve
the most relevant web pages or documents that can match a user’s
query [19, 144], enabling users to find the desired information effec-
tively. Meanwhile, retrieval models, through effective data mainte-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed nance in external databases, can provide faithful and timely external
for profit or commercial advantage and that copies bear this notice and the full citation knowledge, thereby serving vital functions in various knowledge-
on the first page. Copyrights for components of this work owned by others than the intensive tasks. Due to their powerful capacities, retrieval tech-
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission niques have been successfully incorporated into advanced genera-
and/or a fee. Request permissions from [email protected]. tive models in the era of AI-Generated Content (AIGC) [68, 112, 134].
KDD ’24, August 25–29, 2024, Barcelona, Spain Notably, the integration of retrieval models with language mod-
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0490-1/24/08. els has given rise to Retrieval-Augmented Generation (RAG) [66],
https://doi.org/10.1145/3637528.3671470 which has emerged as one of the most representative techniques

6491
KDD ’24, August 25–29, 2024, Barcelona, Spain Wenqi Fan et al.

Output instance, a recent study has demonstrated that legal hallucinations


Which country won Pre-trained
the Women's World LLMs
are pervasive and disturbing, with hallucination rates ranging from
Cup 2023? Prompt 69% to 88% in responses to specific legal queries for state-of-the-art
w/o RAG
User LLMs [20]. Moreover, the challenges of tackling the hallucination

t
Context problem become even harder due to the substantial computational

pu
As of my last update

ut

pt
O
in January 2022, I resources required for fine-tuning LLMs with domain-specific or

m
o
can't provide which

Pr
country won ... 2023. the latest data. This, in turn, significantly hinders the widespread
External
Query Database adoption of LLMs in various real-world applications.
Spain won the
To address these limitations, recent efforts have been made to
Women's World Cup Additional information: take advantage of RAG to enhance the capabilities of LLMs in var-
2023.
New, Domain-specific, etc. ious tasks [6, 49, 56, 114], especially those demanding high for
with RAG
the latest and reliable knowledge such as Question Answer (QA),
AI4Science, and software engineering. For example, Lozano et al.
Figure 1: Retrieval-Augmented Generation (RAG) meets [80] introduces a scientific QA system based on retrieving scientific
Large Language Models (LLMs). When the user’s query is out- literature dynamically. MolReGPT leverages RAG to enhance the
of-scope, e.g., unseen content in training data or requiring in-context learning ability of ChatGPT for molecular discovery [68].
the latest information for the answer, LLMs might show in- It is also been demonstrated that RAG can effectively reduce halluci-
ferior generation performance. With the help of RAG, LLMs nations in conversational tasks [116, 139]. As illustrated in Figure 1,
can leverage additional relevant information from external an LLM-based dialog system will not be able to answer well for
database to enhance their text generation capability. out-of-scope queries. With the help of RAG to retrieve relevant
knowledge from external database and integrate it into the process
in the field of generative AI, aiming to enhance the quality of the of generation, the dialog system succeeds in giving correct answers.
generated text content with retrieved information [6, 66, 68]. Given the remarkable progress in advancing LLMs with RAG, there
More specifically, to facilitate the generation task in the NLP is an imperative need for a systematic review of recent advances in
area, RAG incorporates information or knowledge from external Retrieval-Augmented Large Language Models (RA-LLMs).
data sources, which serves as supplementary reference/instruction This survey provides a comprehensive overview of RA-LLMs
for the input query or the generated output [56, 87]. In general, RAG by summarizing representative methods from the aspects of the
first invokes the retriever to search and extract relevant documents architecture, training strategy, and application area respectively.
in the external database. These documents are then combined with It first review the architecture of existing RA-LLMs from three
the original query as the context to enhance the answer generation primary perspectives: retrieval, generation, and augmentation in
process [50]. In practice, RAG techniques are feasible and efficient Section 2. Training techniques are further summarized in Section 3.
to apply in various generation tasks, by simply adapting the re- Subsequently, various RA-LLMs applications are presented in Sec-
trieval component and requiring minimal or even no additional tion 4. In Section 5, key challenges and potential directions for
training[98]. Recent studies have demonstrated the great potential future exploration are further discussed. Due to the page limit, this
of RAG not only for knowledge-intensive tasks such as open do- published version omits a part content including background knowl-
main question answering (OpenQA) [6, 44, 92], but also for general edge of LLMs, details of RA-LLM architectures, visual illustrations,
language tasks and downstream applications [56, 78, 134, 138]. etc. Please refer to the long version for more information [32].
Recent years have witnessed the rapid development of pre-trained Concurrent to our survey, several related surveys have diverse fo-
foundation models, particularly Large Language Models (LLMs). cuses for RAG and LLMs. For example, Zhao et al. [156] specifically
These models have demonstrated impressive performance across review multi-modal information-based RAG techniques and Zhao
various tasks [1, 18], such as recommender systems [37], molecule et al. [155] discuss the RAG for AIGC. Gao et al. [39] conduct a rela-
discovery [68], and report generation [26]. Technically, the great tively comprehensive overview of RAG for LLMs. Our survey differs
success of LLMs can be attributed to the advanced architecture with from these surveys in concentrating on technical perspectives and
billion-level parameters pre-training on a huge amount of training systematically reviewing models according to the architecture and
corpus from various sources. These technical improvements lead training paradigm in RA-LLMs, as well as application tasks.
to the emergence of remarkable capabilities of LLMs [37, 157], par-
ticularly in language understanding and generation, in-context 2 RETRIEVAL-AUGMENTED LARGE
learning, and other aspects. For instance, GPT-FAR introduces LANGUAGE MODELS (RA-LLMS)
detailed prompts to teach GPT-4 to perform image tagging, sta-
tistical analysis and text analysis for multi-modal fashion report The RAG framework in the era of LLMs consists of several major
processes: retrieval, generation, and augmentation. In this section,
generation [26]. LLMs also achieve promising performance in rec-
we will introduce important techniques involved in each process.
ommender systems by understanding users’ preferences towards
items [37, 127]. Despite these success, LLMs still suffer from intrin-
sic limitations [37, 157], such as the lack of domain-specific knowl- 2.1 Retrieval
edge, the issue of “hallucination”, and the substantial computational Given the query from the input of LLMs, the retriever is an infor-
resources required for updating the LLMs. These problems are par- mation provider in RAG, aiming to return relevant knowledge by
ticularly notable in domain-specific fields like medicine and law. For measuring the distance between the query and documents from

6492
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models KDD ’24, August 25–29, 2024, Barcelona, Spain

the external knowledge sources. As shown in Figure 2, the retrieval Retrieval

component consists of several compulsory or optional procedures Indexing Chunking


to function as a whole for effective information retrieval. The spe-
cific pipeline of the retrieval part is jointly determined by several Embedding Searching Datastore
perspectives of design, such as retriever type and retrieval granu-
larity. In this subsection, we will introduce the existing retrieval Augmentation
methods in RA-LLMs based on these key aspects. Retrieved
Input Documents Output
2.1.1 Retriever Type. Retrieval methods can be generally catego- Input Layer Intermediate Layer Output Layer
Integration Integration Integration
rized into two types: sparse and dense, based on the information
encoding methods. Sparse retrieval is word-based and applied in Large Language
Models
text retrieval mostly, while dense retrieval embeds queries and ex- Generation
ternal knowledge into vector spaces and can be applied to various
data formats. Figure 2: Illustration of the basic Retrieval-Augmented Large
As a straightforward approach, sparse retrieval, e.g., TF-IDF and Language Models (RA-LLMs) framework, which consists of
BM25 [106, 121], usually relies on inverted index matching along three major components: retrieval, generation, and augmen-
with the raw data input. For example, many studies directly ap- tation. Retrieval includes different procedures depending
ply BM25 for passage-level retrieval to facilitate their RAG [10, 52, on the specific designs. The retrieved documents are fur-
98, 137, 159, 160], where passages are specifically represented as ther leveraged in generation with the augmentation module,
a bag of words and ranked based on term and inverse document which may be at different integration stages.
frequencies [50]. On top of offering supplementary to enhance
the input of the generator, sparse retrieval has also been used to
find demonstrations to function in In-Context Learning (ICL) for
RA-LLMs [2, 83, 107, 117, 141]. The main limitation of applying similarity-based retrieval, as well as for some special requirement
sparse retrieval in RAG is its no-training nature, which makes the in ICL, such as diverse example retrieval [141]. Another stream of
retrieval performance heavily rely on the quality of the database dense retrievers having been widely applied in RA-LLMs uses one
and the query. Moreover, such fixed term-based methods only sup- encoder only, which may be based on Transformer, BERT or other
port similarity-based retrieval, while cannot be adapted for other off-the-shelf sequence modeling backbones. These one-encoder
retrieval criteria possibly existing in LLM applications, such as the retrievers are generally pre-trained on large-scale unaligned doc-
diversity [30]. uments by contrastive learning [103], which may therefore excel
Dense retrieval, on the contrary, embeds the query and docu- for their versatility, meaning that they can transfer and generalize
ments into continuous vector space with certain criteria, for exam- better to new domains or tasks. Such general-purpose pre-trained
ple, semantic similarity [55]. Dense retrieval methods are usually retrievers, e.g., Contriever [40] and Spider [99], would be more flexi-
trainable, therefore holding more flexibility and potential in adap- ble to use in LLMs targeting on various tasks and have demonstrated
tation. As the key component of dense retriever, the embedding their effectiveness in many RA-LLM methods, such as In-Context
models have delicately different designs in existing RAG models. RALM [98], Atlas [51] and Self-RAG [5].
A simple design [56, 64, 136] is to directly use a part of the gener-
ation model as the embedding layer of the retriever, which might 2.1.2 Retrieval Granularity. Retrieval granularity denotes the re-
be able to enhance the alignment between the retrieval and gen- trieval unit in which the corpus is indexed, e.g., document, passage,
eration processes. BERT-based backbone [24] is widely applied token, or other levels like entity. For RAG, the choice of retrieval
in retrieval models. One common retriever design in RAG is to granularity can significantly impact the overall performance of the
construct two-stream encoders with the BERT structure (one en- model in terms of effectiveness and efficiency as they determine
coder for the query and the other for the documents), which is the saving space for the database as well as the computational cost
also called bi-encoder [114, 135]. Early-stage RAG methods tend to for searching [4]. Early stage retrieval-augmented language mod-
freeze [6, 98] or partially freeze [66] the parameters of the retriever els [10] propose to retrieve whole pieces of documents, and then
to perform general-level relevant knowledge extraction and pay apply a machine comprehension model trained to detect answer
more attention to the knowledge leveraging and generator fine- spans in the returned documents, which focuses more on language
tuning. Large-scale specialized pre-training further enhances RAG reading and key information locating in the document. In gener-
models to excel in more knowledge-intensive tasks. One typical ative language models, Chunk retrieval (also called passages in
success is Dense Passage Retriever (DPR) [55], which uses a BERT- some references [44, 52, 55]) is common, which has been used in
based backbone and is pre-trained specifically for the OpenQA task both traditional and LLM-based RAG models such as REALM [44],
with question-answer pair data. A recent study [103] has also dis- RAG [66] and Atlas [51]. A more fine-grained retrieval, i.e., token
covered that DPR training decentralizes how knowledge is stored retrieval, instead can be done with faster searching but will bring
in the network, creating multiple access pathways to the same in- more burden for the database saving. Token retrieval is more suit-
formation. With effective fine-tuning, bi-encoder retrievers are also able in cases requiring rare patterns or out-of-domain data [56],
applied widely in ICL-based RAG [72, 81, 86, 93, 107, 141]. Specif- meanwhile cooperates well with the every-token retrieval strat-
ically, they have been more often used for sentence embedding egy as applied in kNN-LM and other similar work [45, 88, 145]. In

6493
KDD ’24, August 25–29, 2024, Barcelona, Spain Wenqi Fan et al.

comparison, a text chunk may contain compact and complete infor- on the retrieval and augmentation processes, trying to enhance
mation with less redundancy and irrelevancy, therefore becoming the generator by augmenting the input (also called prompt in the
the mainstream retrieval text granularity in RAG. context of LLMs) with better knowledge, guidance, or examples for
Another major retrieval granularity proposed in RAG is entity the generation. For example, Rubin et al. [107] proposes to train a
retrieval. Unlike the above types of granularity, entity retrieval is prompt retriever with the data labeled by language models them-
designed from the perspective of knowledge rather than language. selves, which can be used to provide better examples for in-context
Févry et al. [38] introduce the Entities as Experts (EAE) model, learning, therefore enhancing the final generation performance. Xu
which divides the parameter space of language models according et al. [137] propose to compress the retrieved documents before
to the entity identity. The proposed EAE model aims to learn entity in-context integration, which can reduce the computational costs
representations from the text along with other model parameters and also relieve the burden of LMs to identify relevant information
with the Wikipedia database and represent knowledge with entity in long retrieved documents.
memory. At a more fine-grained level, de Jong et al. [21] propose to
build the knowledge base by learning and retrieving mention rather
2.3 Retrieval Integration for Generation
than entity. Overall, applying entity or mention-level retrieval in
RAG would be more effective for entity-centric tasks, and more Augmentation
efficient in space compared to token-wise retrieval. Augmentation describes the technical process that integrates re-
trieval and generation parts, which is the essential part of RA-LLMs.
2.2 Generation In this subsection, we introduce three main designs of augmenta-
tion, which are conducted at the input, output, and intermediate
The design of the generator heavily depends on the downstream
layers of generator respectively, as illustrated in Figure 2.
tasks. For most text generation tasks, Decoder-only and Encoder-
Decoder are two dominant structures [157]. The recent develop-
2.3.1 Input-Layer Integration. A common way to integrate retrieved
ment of commercial closed-sourced large foundation models makes
information/documents is to combine them with the original in-
black-box generation models mainstream in RA-LLMs. In this part,
put/query and jointly pass them to the generator, which is called
we will briefly review studies with these two types of genera-
input-layer integration. For example, In-Context RALM [98] applies
tors: parameter-accessible (white-box) and parameter-inaccessible
input-layer integration by specifically concatenating the original
(black-box).
input and all retrieved documents into a single sequence as the
2.2.1 Parameter-Accessible Generators (White-box). The structure new input for the generation model. Despite the effectiveness, such
of Encoder-Decoder processes the input and the target independently integration is limited to the number of retrieved documents, since
with different sets of parameters, in which a cross-attention compo- the concatenated new input may be too long to be processed by
nent is developed to connect input tokens to target tokens. Repre- the generation model. In-context RALM specifically alleviates this
sentative Encoder-Decoder models include T5 [97] and BART [65]. limitation by removing tokens from the beginning of the new in-
In comparison, Decoder-only models process inputs and targets after put. To avoid information loss with such a token removing strategy,
concatenation, which makes the representations of the two parts FID [50] employs a different integration method that processes each
concurrently built layer-by-layer as they propagate up the network. retrieved document independently in the encoder. This strategy
These two types of generators are widely applied in existing RAG is scalable to a large number of contexts as it only performs self-
work. For example, RAG [66] and Re2 G [42] employ BART; FID [50] attention over one context at a time in the follow-up processing.
and EMDR2 utilize T5. There are other models [6, 73] leveraging Atlas [51] and REPLUG [114] apply a similar parallel integration
Transformer-based Encoder-Decoder architecture but with some by concatenating the query and one retrieved document at a time.
customized design. Generators in RAG differ themselves from gen- In general, most black-box generation-based RAG methods apply
eral ones by incorporating retrieved data to enhance the generation input-layer integration since neither the intermediate layer of the
accuracy and relevance. Furthermore, white-box generators allow generation model or the output distribution is accessible.
parameter optimization, which can be trained to adapt to different More specially for LLMs, input-layer integration may use the
retrieval and augmentation approaches for a better performance of retrieved content as (additional) prompts or demonstrations on top
generation. of using it as supplementary to the original input as in traditional
RAGs [107]. Prompt retrieval aims to find suitable natural language
2.2.2 Parameter-Inaccessible Generators (Black-box). A certain pro- prompts automatically through retrieval to teach the LLM to learn
portion of LLMs are released without the disclosure of internal in context [7] or to induce the LLM to reason[133]. It may boost
structures or the accessibility of parameters, especially those par- the zero-shot ability of LLMs without delicate prompt engineering.
ticularly large-scale ones such as GPT series [1], Codex [12] and For example, Cheng et al. [16] propose to learn a prompt retriever
Claude, which are called black-box generation models. These gen- based on the input-prompt pair data with score labels resulting
erators only allow the operations of feeding queries (input) and re- from a frozen LLM.
ceiving responses (output) while not allowing the internal structure
to be altered or parameters to be updated. From another perspec- 2.3.2 Output-Layer Integration. Another kind of augmentation is
tive, LLMs, even those open for fine-tuning, are large in scale and post-hoc, i.e., output-layer integration, which joints retrieval and
difficult to tune for downstream domain-specific tasks with only a generation results. For example, kNN-LM [56] interpolates two
limited amount of data. Black-box RA-LLMs, therefore, focus more next-token distributions in prediction: one induced by the LM and

6494
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models KDD ’24, August 25–29, 2024, Barcelona, Spain

the other induced by the nearest neighbors from the retrieval cor- these training-based approaches into three classes: 1) Independent
pus. Output-layer linear integration [43, 159] is flexible to apply Training approaches independently train each component in the
since it can be plugged into most generation models without addi- RAG procedure, 2) Sequential Training methods train one module
tional training. However, the simplicity of output-layer integration first and freeze the well-trained component to guide the tuning
also limits the model’s ability to reason about the retrieved text. process of the other part, and 3) Joint Training approaches train
To tackle this limitation, Yogatama et al. [145] propose to add an retriever and generator simultaneously. In the following section,
extra gating network to post-process the retrieved data and achieve we will comprehensively review the training-free, independent
comparatively better performance. For LLMs, output-layer inte- training, sequential training, and joint training methods.
gration is as reasonable and adaptive as input-layer integration.
REFEED [148] proposes an answer refining mechanism that applies
an LLM to evaluate the retrieved information and adjust the initial 3.1 Training-free
answer accordingly to enhance the accuracy of the response. Sim- With the huge number of parameters, large language models have
ilarly, Zhang et al. [154] propose the COMBO framework, which exhibited human-level intelligence and achieved promising pre-
matches LLM-generated passages with retrieved counterparts into diction performance on various downstream tasks. However, it is
compatible pairs based on pre-trained discriminators. The passage extremely challenging to frequently perform fine-tuning and up-
pairs are then handled by a Fusion-in-Decoder-based [50] to derive date the knowledge stored in the model parameters [66] due to the
a final answer. considerable time and computational resources required. Recently,
2.3.3 Intermediate-Layer Integration. Compared to the above two numerous studies have suggested enhancing large language models
non-parametric approaches, a more engaging augmentation is to with retrieval mechanisms, enabling them to dynamically acquire
design a semi-parametric module to integrate the retrieved results new knowledge from external sources without extra training pro-
through the internal layers of the generation model, which is called cesses (i.e., training-free) [50, 52, 57], instead of relying solely on
intermediate-layer integration. Such integration might add addi- the implicit knowledge encoded in the model’s parameters. These
tional complexity and is promising to enhance the capability of the approaches have shown significant performance improvement for
generation model with effective training. Typically, a Transformer various knowledge-intensive tasks, such as open-domain question
module is introduced to leverage retrieved information (mostly answering [66] and document summarization [120]. According
encoded into dense representations) into the generation model to to the different ways in which large language models utilize re-
interact with the representations in the middle stage of the genera- trieved information, we categorize these training-free methods into
tion. For example, RETRO [6] introduces a Chunked Cross Attention two categories: 1) Prompt Engineering-based Methods inte-
(CCA) layer to process the retrieved chunks in the generator blocks, grate retrieved knowledge into the original prompt directly, and 2)
and Wu et al. [136] introduces the kNN-Augmented Attention Layer. Retrieval-Guided Token Generation Methods retrieve infor-
Similarly, EAE [38] and TOME [21] use Entity Memory and Mem- mation to calibrate the token generation process.
oryAttention layer to incorporate the retrieved Entity and Entity
Mentions, respectively. Such intermediate-layer integration can 3.1.1 Prompt Engineering-based Methods. As the LLMs’ genera-
use many blocks frequently and efficiently to enhance the capa- tion performance highly depends on the input query, numerous
bility of the whole RAG model. It offers an efficient alternative to training-free RAG approaches employ external knowledge by re-
incorporate a large number of text chunks frequently retrieved, fining the original prompts [52, 57, 71]. Specifically, the retrieved
which are challenging to process with input-layer integration due texts are usually used as contextual information and combined with
to the input length limit of LMs [6]. However, it also needs to be the original prompt to guide the generation of large language mod-
noted that intermediate-layer integration requires high access to els [50, 52, 57, 59, 71, 94, 129]. For example, In-Context RALM [98]
the generation models, which is not feasible for most LLMs that keeps the large language model parameters unchanged and directly
are accessible through inference APIs [85]. incorporates the retrieved document before the original prompt to
augment the generation process. IRCoT [124] interleaves chain-of-
3 RA-LLMS TRAINING thought (CoT) generation and knowledge retrieval steps, enabling
Based on whether training is required or not, existing RAG methods the retrieval of more relevant information for the subsequent rea-
can be categorized into two main classes: train-free approaches soning compared to standard retrieval methods that rely solely on
and training-based approaches. Training-free methods usually the question as the query. Instead of retrieving knowledge from a
directly leverage the retrieved knowledge during inference time large corpus, GENREAD [147] first prompts a large language model
without introducing extra training by inserting the retrieved text to generate contextual documents for the query, and then based
into the prompt, which is computationally efficient. However, one on them to generates answers. SKR [130] proposes guiding LMs to
potential challenge is that the retriever and generator components determine whether they can answer a given question based on their
are not specifically optimized for downstream tasks, which could internal knowledge, enabling flexible utilization of both internal and
easily lead to suboptimal utilization of the retrieved knowledge. external knowledge by selectively calling the retriever. TOC [59]
To fully exploit the external knowledge, extensive methods are first retrieves relevant knowledge for ambiguous questions and
proposed to fine-tune the retriever and generator, thereby guiding recursively constructs a tree structure by clarifying ambiguous
large language models to effectively adapt and integrate retrieved questions into multiple disambiguate questions, which is further
information. According to the training strategies, we categorize aggregated to generate long-form answers.

6495
KDD ’24, August 25–29, 2024, Barcelona, Spain Wenqi Fan et al.

3.1.2 Retrieval-Guided Token Generation Methods. In addition to 3.3.1 Retriever First. These methods first train the retrieval model
directly integrating external knowledge into the original prompt, and then fix it. Large language models are then trained by utiliz-
the auxiliary information can be employed to adjust the token gen- ing the retrieved knowledge. For instance, RETRO [6] adopts the
eration process. For example, KNN-KMs [56] first retrieves 𝑘 most BERT model that is pretrained independently as the retriever, and
relevant contexts from the datastore based on the given query, and an encoder-decoder architecture is trained to integrate retrieval
computes a neighbor distribution based on the distance. The output chunks into the model’s predictions. RALMs [146] adopts Google
distribution is calibrated by interpolating the neighbor distribution Search and the open-source COLBERTV2 [58] as the pretrained
and the original model’s output distribution. Rest [46] is proposed retriever and fine-tunes the large language model to effectively
to replace the parametric draft model with a non-parametric re- leverage the retrieved passages. ITER-RTGEN [105] utilizes the pre-
trieval datastore, and retrieves relevant tokens based on the current trained S-BERT [104] as the retriever and introduces an adaptive hy-
context for speculative decoding [9, 63, 122]. brid retrieval strategy for retrieving demonstrations. Additionally,
it leverages T5 [97] as the generator, which undergoes further fine-
tuning based on the target label and input combining the original
3.2 Independent Training prompt with retrieved demonstrations. SMALLCAP [102] proposes
Independent training refers to training the retriever and large lan- using the CLIP [95], which is a powerful pretrianed multimodal
guage models (LLMs) as two entirely independent processes, in network, to encode the input image and the textual data of the ex-
which there is no interaction between the retriever and the LLMs ternal datastore and retrieve the most relevant items based on the
during the training process [55, 62, 160]. For the training of large cosine similarity. A cross-attention layer is trained and GPT-2 [96]
language models, the negative loglikelihood loss is the most repre- is used as the decoder to produce captions.
sentative training objective [96, 123], which aims to guide the large
language models to generate desired output 𝑦 based on the given
input 𝑥, formulated as − log 𝑃𝐿𝐿𝑀 (𝑦|𝑥). Regarding the retriever, it 3.3.2 LLMs First. Similarly, we can also pre-train large language
can be categorized into two types: 1) Sparse retriever [101, 106], models first, and then tune the retriever under the supervision of the
and 2) Dense retriever [55, 62, 160]. The sparse retrievers usually well-trained LLMs. For example, DKRR [49] shows that attention
exploit sparse features, e.g., word frequencies, to represent the doc- scores from a sequence-to-sequence model can indicate the docu-
uments and calculate the relevance scores based on task-specific ment’s relevance. Therefore, they propose to leverage the attention
metrics [68, 101, 106] such as TF-IDF and BM25. As for the dense scores of a reader model to produce synthetic labels to train the
retrievers, deep neural networks are employed to encode the query retriever. AAR [149] proposes using a small language model to gen-
and documents into dense representations, and then the inner prod- erate the supervised signal for training retrievers. The well-trained
uct is usually used to calculate relevance scores and retrieve the retriever can be further leveraged to enhance the performance of
relevant external knowledge. For example, DPR [55] adopts two large black-box language models. RA-DIT [75] first fine-tune the
independent BERT [24] networks to encode the query and pas- large language models to enhance their ability to leverage retrieved
sages respectively, and trains these models by utilizing contrastive knowledge, and then train the retriever to better align its output
learning. CoG [62] proposes to train a prefix encoder and a phrase with large language models. UPRISE [16] proposes a lightweight
encoder for retrieval and reformulate the text generation as multiple method to enhance the zero-shot performance of LLMs in unseen
copy-and-paste operations from existing source text collection. tasks by introducing a prompt retriever. A frozen LLM is employed
to guide the fine-tuning process of the prompt retriever, and this
retriever then retrieves prompts for different tasks with various
3.3 Sequential Training LLMs during inference.
Independent training is an efficient approach to exploit the external
knowledge during the generation process, since the retriever and
generator can be trained offline and any off-the-shelf models can be 3.4 Joint Training
utilized, avoiding extra training costs. To better enhance the synergy Joint training methods [17, 47, 54, 70, 159] employ the end-to-end
between the retriever and generator, several methods have been paradigm to optimize the retriever and generator simultaneously.
proposed to train the retriever and large language models sequen- Instead of training each module sequentially, joint training methods
tially. In these sequential training methods, the process typically effectively enhance the retriever’s ability to locate external knowl-
begins with the independent pretraining of either the retriever or edge for generation and the generator’s capacity to effectively lever-
the generator, after which the pretrained module is fixed while the age the retrieved information. For instance, RAG [66] minimizes the
other module undergoes training. Note that various existing models negative loglikelihood to jointly train the retriever and generator.
(e.g., BERT [24], CLIP [95], T5 [97]) can be directly employed as the REALM [44] adopts a similar training paradigm to that of RAG [66],
fixed retriever and generator, thereby bypassing the first pertaining and Maximum Inner Product Search (MIPS) [15, 28, 100, 111] tech-
process. Compared to independent training, sequential training nique is used to locate the most relevant documents. To employ
involves coordinated training of the retriever and generator, where MIPS, all external documents are embedded first and a search index
the trainable module benefits from the assistance of the fixed mod- is produced for each embedding. An asynchronous index updating
ule. Based on different training orders, sequential training can be strategy [44, 48, 51, 119] is proposed to refresh the index every
categorized into two classes: 1) Retriever First [5, 108, 109, 126], several hundred training steps to avoid time consumption of re-
and 2) LLMs First [110, 114, 128]. indexing all documents.

6496
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models KDD ’24, August 25–29, 2024, Barcelona, Spain

4 APPLICATIONS 4.2 Downstream Tasks


In this section, we will introduce some representative applications RA-LLMs can also be applied to various downstream tasks, such as
of retrieval-augmented large language models (RA-LLMs). To pro- recommendations and software engineering.
vide a clear overview of the applications of RA-LLMs, we will review
them from three perspectives: NLP applications, downstream tasks, 4.2.1 Recommendations. Recommender systems play an impor-
and domain-specific applications. tant role in modeling users’ preferences and providing personalized
recommendations [33–35, 127, 153, 158]. Recently, RA-LLMs have
4.1 NLP Applications demonstrated great potential in providing personalized and con-
Due to the intrinsic capability in text generation, RA-LLMs have textually relevant recommendations by integrating retrieval and
various applications in the NLP field, such as Question Answer generation processes [25, 82, 134]. For example, Di Palma [25] pro-
(QA) Systems, ChatBot, and Fact Verification. poses a simple retrieval-augmented recommendation model, that
leverages knowledge from movie or book datasets to enhance rec-
4.1.1 QA Systems. QA Systems aim to provide precise answers ommendations. Additionally, Lu et al. [82] further retrieval from
to user’s queries. However, even when trained on extensive data, the reviews to enrich item information in recommender systems.
these systems may lack the latest information or specific domain CoRAL [134] utilizes reinforcement learning to retrieve collabo-
knowledge that is not included in their training data [50, 79]. To rative information from the dataset and align it with semantic
address this limitation, the integration of RA-LLMs has played a cru- information for more accurate recommendations.
cial role in advancing the capabilities of QA systems by enhancing
their ability to retrieve and synthesize relevant information [6, 50]. 4.2.2 Software Engineering. The rise of RA-LLMs has influenced
Specifically, RA-LLMs can provide coherent and contextually rele- many aspects of software engineering [89, 142, 160]. For example,
vant answers by leveraging their retrieval component to access a some studies propose the retrieval-augmented generation paradigm
vast knowledge base. For example, REALM [44] integrates a knowl- for code generation [160] and program repair [89]. Similarly, Parvez
edge retriever that can retrieve information from a large corpus et al. [91] retrieve top-ranked codes or summaries from the codebase
during pre-training, fine-tuning, and inference. This approach al- and aggregate them with input to enhance code generation and
lows REALM to effectively retrieve from a vast knowledge corpus, summarization. In addition, RA-LLMs show potential in tabular data
thereby improving the accuracy of its responses. Similarly, Fusion- processing [67, 142] and Text-to-SQL semantic parsing [93, 113].
in-Decoder [50] retrieves passages from support documents and
then fuses them with questions to generate the answer, achieving 4.3 Domain-specific Applications
higher accuracy. In addition, Borgeaud et al. [6] indicate that the RA-LLMs have been widely adopted for various domain-specific
quality of the answers may rely more on the result of retrieval. tasks, such as AI for Science and Finance.
4.1.2 ChatBot. ChatBot is designed to interact with users in a
natural and conversational manner [76]. Different from the QA 4.3.1 AI for Science. RA-LLMs have proven to be beneficial for
system, ChatBot focuses on maintaining a coherent and contextu- the realms of science, such as molecular and protein. Molecules
ally rich conversation with the user. To enhance these capabilities, include identifying the molecule’s property and predicting new
recent methods focus on integrating RA-LLMs [54, 61, 152] for its molecules, thereby favoring drug discovery. Currently, some RA-
ability to augment the ChatBot with relevant external knowledge, LLMs have been applied to molecules by integrating retrieval of
facilitating more engaging and context-rich interactions with users. molecule structure and biomedical entities like protein, molecule,
For example, some studies [14, 41] retrieve relevant knowledge and disease [78, 131, 132, 140], etc. Li et al. [68], Wang et al. [131]
from static databases (e.g., a Wikipedia dump) to augment conver- propose retrieval-based frameworks by retrieving from the database
sation. Komeili et al. [61] propose retrieving information from the to guide molecule generation. Liu et al. [78] introduce a multi-modal
internet search to further augment conversation performance. Con- molecule structure-text model by retrieving textual knowledge from
sidering the dynamic nature of knowledge in the world, another a large-scale dataset for molecular property prediction. In addition,
model [125] further accesses large amounts of dynamic information RA-LLMs also significantly influence Protein representation and
in search engines to generate responses. generation. For instance, RSA [84] queries protein sequences as-
sociated with a collection of structurally or functionally similar
4.1.3 Fact Verification. Fact Verification is a critical task in veri- sequences in the database to enhance protein representations. Fur-
fying the accuracy and reliability of information. With the need thermore, Lozano et al. [80] present a clinical QA system based on
for trusted evidence, RA-LLMs are being utilized to enhance the retrieving published review articles.
capabilities of fact verification [51, 66, 66]. Lewis et al. [66] first
propose retrieval of external knowledge to augment a range of 4.3.2 Finance. In the highly data-driven and information-intensive
knowledge-intensive tasks including fact verification. On the other field of finance, RA-LLMs have proved to be a significant technology
hand, Atlas [51] examines the performance of the RA-LLMs for fact for enhancing decision-making [69, 143, 151]. For example, Zhang
verification under few-shot learning. Recently, Self-RAG [5] has et al. [151] retrieve financial information from external sources,
greatly made a notable impression by incorporating a self-reflective such as news platforms (e.g., Bloomberg and Reuters) and social
mechanism. Specifically, Self-RAG reflects on whether retrieved media platforms (e.g., Twitter, Reddit), to combine with the original
information is helpful and judges the reliability of retrieved infor- query to enhance the precision of financial sentiment analysis. In
mation, thereby greatly improving the verification accuracy. addition, financial QA is another primary task of financial analysis,

6497
KDD ’24, August 25–29, 2024, Barcelona, Spain Wenqi Fan et al.

which usually extracts relevant knowledge from financial docu- data modalities such as images, videos, and audio. By integrating
ments. As professional documents are usually stored in PDF format, various modalities, LLMs can leverage richer contextual informa-
Lin [74] introduces a PDF parser combined with RA-LLMs to re- tion than single-modal RAG and develop a more comprehensive
trieve knowledge from financial reports. On the other hand, Yepes understanding of users’ needs, bringing precise, fine-grained, and
et al. [143] propose a document chunking method based on struc- high-quality generation. For instance, an image or video can provide
ture instead of chunking based on paragraphs, further improving valuable visual cues that complement textual information, leading
the quality of RA-LLMs outputs. to more precise language generation [47, 161]. By effectively fus-
ing multiple modalities, multimodal RA-LLMs can develop a more
comprehensive understanding of the world, leading to more accu-
5 FUTURE CHALLENGES AND rate and insightful outputs, benefiting a wide range of domains,
OPPORTUNITIES including healthcare [161], drug discovery [115], molecular analy-
Since the studies of RA-LLMs are still in the early stage, we present sis [3, 78, 115], etc.
some potential research directions that can be explored in the future Quality of External Knowledge. As a commonly used datastore
in the field of RA-LLMs. in current RAG systems, Wikipedia [55, 161] serves as a vast reposi-
Trustworthy RA-LLMs. The essential objective of developing tory of external textual knowledge used to augment the generation
RAG-empowered LLMs is to enhance the capability of the language process, which contains millions of articles covering various disci-
models, thereby benefiting users and society by alleviating redun- plines. However, the reliability and accuracy of individual articles
dant and meaningless labor, increasing conveniences, and spurring within Wikipedia vary significantly, and the introduction of some
social progress. However, recent research indicates that RA-LLMs texts that deviate from facts might even mislead the model’s gener-
can be maliciously and unintentionally manipulated to make un- ation process. Therefore, it is crucial to enhance the quality of the
reliable decisions and harm humans [22, 162], which may have external knowledge corpus and mitigate the negative impact of low-
serious consequences in safety-critical scenarios [11, 13, 31, 36, 77]. quality knowledge on the performance of LLMs. By enhancing the
In addition, private retrieval database has a risk of leakage, raising quality of the external knowledge and tailing robust mechanisms
concerns regarding the privacy of RA-LLMs [150]. Therefore, de- by filtering out low-quality or unreliable information, the RA-LLM
veloping trustworthy RA-LLMs is of paramount importance as it systems might produce more accurate, reliable outputs, thereby
can significantly mitigate the potential negative impacts of LLMs improving their effectiveness in various real-world applications.
technology and provide people with powerful AI models that can be
fully trusted. To be specific, the ideal trustworthiness in RA-LLMs 6 CONCLUSION
systems should possess the following characteristics: 1) robust-
Retrieval-augmented generation (RAG), a cutting-edge AI tech-
ness, 2) fairness, 3) explainability, and 4) privacy. For example,
nique, has achieved remarkable success across various applications,
robustness means a trustworthy RA-LLMs system should be ro-
including recommendation, molecule generation, protein represen-
bust against malicious or inadvertent perturbations introduced by
tation, and software engineering, owing to the potent capabilities of
attackers. Fairness indicates a trustworthy RA-LLMs system ought
retrieval in providing supplementary information to enhance gen-
to avoid discrimination during the decision-making process. Ex-
eration performance. Recently, increasing efforts have been made
plainability requires a complete understanding of the intrinsic
to alleviate the limitations of large language models (LLMs), such
workings of RA-LLMs systems, i.e., the predictions of RA-LLMs sys-
as hallucination and out-of-date internal knowledge, by leveraging
tems are explainable and transparent. Privacy entails safeguarding
retrieval to provide the latest auxiliary information and teaching
the safety of this private information housed within the datastore
LLMs to harness the retrieved external knowledge. With the rapid
when establishing trustworthy RA-LLMs systems.
advancements in retrieval-augmented large language models (RA-
Multi-Lingual RA-LLMs. The ability to leverage knowledge from
LLMs), there is a pressing need for a comprehensive and systematic
multiple languages can greatly enhance the capabilities of retrieval-
overview. To bridge this gap, in this paper, we comprehensively
augmented large language models. As the world becomes increas-
review the RA-LLMs from the perspectives of model architecture,
ingly interconnected, there is a growing need for AI systems that
training strategy, and application area, providing researchers with
can understand and communicate across different languages. By
an in-depth understanding. Moreover, since the studies of RA-LLMs
incorporating multilingual knowledge retrieval and generation,
are still in the early stage, we also discuss the current limitations
these models can access and synthesize information from diverse
and several potential research directions for future research.
linguistic sources, enabling more comprehensive and nuanced un-
derstanding and generation capabilities. Additionally, multilingual
models can facilitate cross-cultural communication and knowledge ACKNOWLEDGMENTS
sharing and breaking down language barriers, thereby bringing con- The research described in this paper has been partly supported
venience to people across different regions of the world, especially by the National Natural Science Foundation of China (project no.
those in areas with minority languages [53, 71]. For example, users 62102335), General Research Funds from the Hong Kong Research
from countries with less prevalent languages can utilize abundant Grants Council (project no. PolyU 15200021, 15207322, and 15200023),
English and Chinese corpora for knowledge retrieval, enhancing internal research funds from The Hong Kong Polytechnic Uni-
the performance of large language models in downstream tasks. versity (project no. P0036200, P0042693, P0048625, P0048752, and
Multimodal RA-LLMs. Multimodal retrieval-augmented genera- P0051361), Research Collaborative Project no. P0041282, and SHTM
tion extends the knowledge sources beyond text to include various Interdisciplinary Large Grant (project no. P0043302).

6498
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models KDD ’24, August 25–29, 2024, Barcelona, Spain

REFERENCES 103434.
[1] Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- [28] Yujuan Ding, Wai Keung Wong, Zhihui Lai, and Zheng Zhang. 2020. Bilinear
cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Supervised Hashing Based on 2D Image Features. IEEE Trans. Circuits Syst.
Anadkat, et al. 2023. Gpt-4 technical report. arXiv:2303.08774 (2023). Video Technol. 30, 2 (2020), 590–602.
[2] Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, and Marjan [29] Yujuan Ding, Wai Keung Wong, Zhihui Lai, and Zheng Zhang. 2020. Discrim-
Ghazvininejad. 2023. In-context Examples Selection for Machine Translation. inative dual-stream deep hashing for large-scale image retrieval. IP&M 57, 6
In ACL (Findings). 8857–8873. (2020), 102288.
[3] Miles C Andrews, Junna Oba, Chang-Jiun Wu, Haifeng Zhu, Tatiana Karpinets, [30] Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying
Caitlin A Creasy, Marie-Andrée Forget, Xiaoxing Yu, Xingzhi Song, Xizeng Song, Xinyun Chen, Olivier Bousquet, and Denny Zhou. 2022. Compositional
Mao, et al. 2022. Multi-modal molecular programs regulate melanoma cell state. semantic parsing with large language models. In ICLR.
Nature communications 13, 1 (2022), 4000. [31] Wenqi Fan, Tyler Derr, Xiangyu Zhao, Yao Ma, Hui Liu, Jianping Wang, Jiliang
[4] Akari Asai, Sewon Min, Zexuan Zhong, and Danqi Chen. 2023. Retrieval-based Tang, and Qing Li. 2021. Attacking black-box recommendations via copying
language models and applications. In ACL (Tutorial). 41–46. cross-domain user profiles. In ICDE. 1583–1594.
[5] Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. [32] Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin,
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards
In ICLR. Retrieval-Augmented Large Language Models. arXiv:2405.06211 (2024).
[6] Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Ruther- [33] Wenqi Fan, Xiaorui Liu, Wei Jin, Xiangyu Zhao, Jiliang Tang, and Qing Li. 2022.
ford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bog- Graph Trend Filtering Networks for Recommendation. In SIGIR. 112–121.
dan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving [34] Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin.
from trillions of tokens. In ICML. 2206–2240. 2019. Graph neural networks for social recommendation. In WWW. 417–426.
[7] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, [35] Wenqi Fan, Yao Ma, Qing Li, Jianping Wang, Guoyong Cai, Jiliang Tang, and
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Dawei Yin. 2020. A graph neural network framework for social recommenda-
Askell, et al. 2020. Language models are few-shot learners. In NeurIPS. tions. TKDE (2020).
[8] Stefan Buttcher, Charles LA Clarke, and Gordon V Cormack. 2016. Information [36] Wenqi Fan, Xiangyu Zhao, Xiao Chen, Jingran Su, Jingtong Gao, Lin Wang,
retrieval: Implementing and evaluating search engines. Mit Press. Qidong Liu, Yiqi Wang, Han Xu, Lei Chen, et al. 2022. A Comprehensive Survey
[9] Charlie Chen, Sebastian Borgeaud, Geoffrey Irving, Jean-Baptiste Lespiau, Lau- on Trustworthy Recommender Systems. arXiv:2209.10117 (2022).
rent Sifre, and John Jumper. 2023. Accelerating large language model decoding [37] Wenqi Fan, Zihuai Zhao, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang,
with speculative sampling. arXiv:2302.01318 (2023). Jiliang Tang, and Qing Li. 2023. Recommender systems in the era of large
[10] Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading language models (llms). arXiv:2307.02046 (2023).
Wikipedia to Answer Open-Domain Questions. In ACL. 1870–1879. [38] Thibault Févry, Livio Baldini Soares, Nicholas FitzGerald, Eunsol Choi, and Tom
[11] Jingfan Chen, Wenqi Fan, Guanghui Zhu, Xiangyu Zhao, Chunfeng Yuan, Qing Kwiatkowski. 2020. Entities as Experts: Sparse Memory Access with Entity
Li, and Yihua Huang. 2022. Knowledge-enhanced Black-box Attacks for Recom- Supervision. In EMNLP. 4937–4951.
mendations. In KDD. 108–117. [39] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai,
[12] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde Jiawei Sun, and Haofen Wang. 2023. Retrieval-augmented generation for large
de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, language models: A survey. arXiv:2312.10997 (2023).
Greg Brockman, et al. 2021. Evaluating large language models trained on code. [40] Izacard Gautier, Caron Mathilde, Hosseini Lucas, Riedel Sebastian, Bojanowski
arXiv:2107.03374 (2021). Piotr, Joulin Armand, and Grave Edouard. 2022. Unsupervised dense information
[13] Xiao Chen, Wenqi Fan, Jingfan Chen, Haochen Liu, Zitao Liu, Zhaoxiang Zhang, retrieval with contrastive learning. J Mach Learn Res (2022).
and Qing Li. 2023. Fairly adaptive negative sampling for recommendations. In [41] Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng
WWW. 3723–3733. Gao, Wen-tau Yih, and Michel Galley. 2018. A knowledge-grounded neural
[14] Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, and Jie conversation model. In AAAI, Vol. 32.
Zhou. 2020. Bridging the gap between prior and posterior knowledge selection [42] Michael R. Glass, Gaetano Rossiello, Md. Faisal Mahbub Chowdhury, Ankita
for knowledge-grounded dialogue generation. In EMNLP. 3426–3437. Naik, Pengshan Cai, and Alfio Gliozzo. 2022. Re2G: Retrieve, Rerank, Generate.
[15] Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, and Wai Keung Wong. 2019. In NAACL-HLT. 2701–2715.
Deep supervised hashing with anchor graph. In ICCV. 9796–9804. [43] Edouard Grave, Armand Joulin, and Nicolas Usunier. 2017. Improving Neural
[16] Daixuan Cheng, Shaohan Huang, Junyu Bi, Yuefeng Zhan, Jianfeng Liu, Yujing Language Models with a Continuous Cache. In ICLR.
Wang, Hao Sun, Furu Wei, Weiwei Deng, and Qi Zhang. 2023. UPRISE: Universal [44] Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang.
Prompt Retrieval for Improving Zero-Shot Evaluation. In EMNLP. 12318–12337. 2020. Retrieval augmented language model pre-training. In ICML. 3929–3938.
[17] Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan. [45] Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. 2021. Efficient Near-
2024. Lift yourself up: Retrieval-augmented text generation with self-memory. est Neighbor Language Models. In EMNLP (1). 5703–5714.
In NeurIPS. [46] Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, and Di He. 2023. Rest:
[18] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Retrieval-based speculative decoding. arXiv:2311.08252 (2023).
Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Se- [47] Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun,
bastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Cordelia Schmid, David A Ross, and Alireza Fathi. 2023. Reveal: Retrieval-
J Mach Learn Res 24, 240 (2023), 1–113. augmented visual-language pre-training with multi-source multimodal knowl-
[19] W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search engines: edge memory. In CVPR. 23369–23379.
Information retrieval in practice. Vol. 520. Addison-Wesley Reading. [48] Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang,
[20] Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E Ho. 2024. Large legal and Bryan Catanzaro. 2023. Raven: In-context learning with retrieval augmented
fictions: Profiling legal hallucinations in large language models. arXiv:2401.01301 encoder-decoder language models. arXiv:2308.07922 (2023).
(2024). [49] Gautier Izacard and Edouard Grave. 2021. Distilling Knowledge from Reader to
[21] Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Retriever for Question Answering. In ICLR.
Cohen. 2022. Mention Memory: incorporating textual knowledge into Trans- [50] Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with
formers through entity mention attention. In ICLR. Generative Models for Open Domain Question Answering. In EACL. 874–880.
[22] Gelei Deng, Yi Liu, Kailong Wang, Yuekang Li, Tianwei Zhang, and Yang Liu. [51] Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni,
2024. Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning. Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, and Edouard
arXiv:2402.08416 (2024). Grave. 2023. Atlas: Few-shot Learning with Retrieval Augmented Language
[23] Ziqing Deng, Zhihui Lai, Yujuan Ding, Heng Kong, and Xu Wu. 2024. Deep Models. J Mach Learn Res 24, 251 (2023), 1–43.
Scaling Factor Quantization Network for Large-scale Image Retrieval. In ICMR. [52] Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-
851–859. Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Active Retrieval
[24] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Augmented Generation. In EMNLP. 7969–7992.
Pre-training of Deep Bidirectional Transformers for Language Understanding. [53] Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Winata,
In NAACL-HLT (1). 4171–4186. Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, and Graham Neubig.
[25] Dario Di Palma. 2023. Retrieval-augmented recommender system: Enhancing 2023. Multi-lingual and Multi-cultural Figurative Language Understanding. In
recommender systems with large language models. In RecSys. 1369–1373. ACL.
[26] Yujuan Ding, Yunshan Ma, Wenqi Fan, Yige Yao, Tat-Seng Chua, and Qing Li. [54] Minki Kang, Jin Myung Kwak, Jinheon Baek, and Sung Ju Hwang. 2023. Knowl-
2024. FashionReGen: LLM-Empowered Fashion Report Generation. In WWW. edge graph-augmented language models for knowledge-grounded dialogue
[27] Yujuan Ding, P. Y. Mok, Yunshan Ma, and Yi Bin. 2023. Personalized fashion generation. arXiv:2305.18846 (2023).
outfit generation with user coordination preference learning. IP&M 60, 5 (2023),

6499
KDD ’24, August 25–29, 2024, Barcelona, Spain Wenqi Fan et al.

[55] Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, [82] Yu Lu, Junwei Bao, Yan Song, Zichen Ma, Shuguang Cui, Youzheng Wu, and
Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval Xiaodong He. 2021. RevCore: Review-Augmented Conversational Recommen-
for Open-Domain Question Answering. In EMNLP. 6769–6781. dation. In ACL/IJCNLP (Findings). 1161–1173.
[56] Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike [83] Man Luo, Xin Xu, Zhuyun Dai, Panupong Pasupat, Mehran Kazemi, Chitta Baral,
Lewis. 2020. Generalization through Memorization: Nearest Neighbor Language Vaiva Imbrasaite, and Vincent Y Zhao. 2023. Dr. icl: Demonstration-retrieved
Models. In ICLR. in-context learning. arXiv:2305.14128 (2023).
[57] Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, [84] Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhi-
Christopher Potts, and Matei Zaharia. 2022. Demonstrate-search-predict: hong Deng, Yang Lu, Qi Liu, and Lingpeng Kong. 2023. Retrieved Sequence
Composing retrieval and language models for knowledge-intensive nlp. Augmentation for Protein Representation Learning. bioRxiv (2023), 2023–02.
arXiv:2212.14024 (2022). [85] Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan. 2023. Query
[58] Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage rewriting for retrieval-augmented large language models. arXiv:2305.14283
search via contextualized late interaction over bert. In SIGIR. 39–48. (2023).
[59] Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, and Jaewoo [86] Aristides Milios, Siva Reddy, and Dzmitry Bahdanau. 2023. In-context learning
Kang. 2023. Tree of Clarifications: Answering Ambiguous Questions with for text classification with many labels. In Proceedings of the 1st GenBench
Retrieval-Augmented Large Language Models. In EMNLP. Workshop on (Benchmarking) Generalisation in NLP. 173–184.
[60] Mei Kobayashi and Koichi Takeda. 2000. Information retrieval on the web. CSUR [87] Sewon Min, Julian Michael, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2020.
32, 2 (2000), 144–173. AmbigQA: Answering Ambiguous Open-domain Questions. In EMNLP.
[61] Mojtaba Komeili, Kurt Shuster, and Jason Weston. 2022. Internet-Augmented [88] Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Ha-
Dialogue Generation. In ACL. 8460–8478. jishirzi, and Luke Zettlemoyer. 2023. Nonparametric Masked Language Model-
[62] Tian Lan, Deng Cai, Yan Wang, Heyan Huang, and Xian-Ling Mao. 2022. Copy ing. In ACL (Findings). 2097–2118.
is All You Need. In ICLR. [89] Noor Nashid, Mifta Sintaha, and Ali Mesbah. 2023. Retrieval-based prompt
[63] Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from selection for code-related few-shot learning. In ICSE. 2450–2462.
transformers via speculative decoding. In ICML. 19274–19286. [90] Neil O’Hare, Paloma De Juan, Rossano Schifanella, Yunlong He, Dawei Yin, and
[64] Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Yi Chang. 2016. Leveraging user interaction signals for web image search. In
Wang, and Luke Zettlemoyer. 2020. Pre-training via paraphrasing. In NeurIPS. SIGIR. 559–568.
[65] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman [91] Md. Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray,
Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: and Kai-Wei Chang. 2021. Retrieval Augmented Code Generation and Summa-
Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, rization. In EMNLP (Findings). 2719–2734.
Translation, and Comprehension. In ACL. 7871–7880. [92] Fabio Petroni, Patrick S. H. Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang
[66] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Wu, Alexander H. Miller, and Sebastian Riedel. 2020. How Context Affects
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- Language Models’ Factual Predictions. In AKBC.
täschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive [93] Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher
nlp tasks. In NeurIPS. 9459–9474. Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from
[67] Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and ZHAO-XIANG ZHANG. Pre-trained Language Models. In ICLR.
2024. SheetCopilot: Bringing Software Productivity to the Next Level through [94] Anupam Purwar and Rahul Sundar. 2023. Keyword Augmented Retrieval:
Large Language Models. In NeurIPS. Novel framework for Information Retrieval integrated with speech interface.
[68] Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and arXiv:2310.04205 (2023).
Qing Li. 2023. Empowering Molecule Discovery for Molecule-Caption Transla- [95] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand-
tion with Large Language Models: A ChatGPT Perspective. arXiv:2306.06615 hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.
(2023). 2021. Learning transferable visual models from natural language supervision.
[69] Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, In ICML. 8748–8763.
and Wei Lin. 2024. AlphaFin: Benchmarking Financial Analysis with Retrieval- [96] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya
Augmented Stock-Chain Framework. arXiv:2403.12582 (2024). Sutskever, et al. 2019. Language models are unsupervised multitask learners.
[70] Xinze Li, Zhenghao Liu, Chenyan Xiong, Shi Yu, Yu Gu, Zhiyuan Liu, and Ge Yu. OpenAI blog 1, 8 (2019), 9.
2023. Structure-Aware Language Model Pretraining Improves Dense Retrieval [97] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
on Structured Data. In ACL. Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits
[71] Xiaoqian Li, Ercong Nie, and Sheng Liang. 2023. From Classification to Gen- of transfer learning with a unified text-to-text transformer. J Mach Learn Res
eration: Insights into Crosslingual Retrieval Augmented ICL. In NeurIPS 2023 21, 140 (2020), 1–67.
Workshop on Instruction Tuning and Instruction Following. [98] Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin
[72] Xiaonan Li and Xipeng Qiu. 2023. MoT: Memory-of-Thought Enables ChatGPT Leyton-Brown, and Yoav Shoham. 2023. In-context retrieval-augmented lan-
to Self-Improve. In EMNLP. 6354–6374. guage models. Trans. Assoc. Comput. Linguist. 11 (2023), 1316–1331.
[73] Zonglin Li, Ruiqi Guo, and Sanjiv Kumar. 2022. Decoupled context processing [99] Ori Ram, Gal Shachaf, Omer Levy, Jonathan Berant, and Amir Globerson. 2022.
for context augmented language modeling. In NeurIPS. 21698–21710. Learning to Retrieve Passages without Supervision. In NAACL-HLT. 2687–2700.
[74] Demiao Lin. 2024. Revolutionizing Retrieval-Augmented Generation with En- [100] Parikshit Ram and Alexander G Gray. 2012. Maximum inner-product search
hanced PDF Structure Recognition. arXiv:2401.12599 (2024). using cone trees. In KDD. 931–939.
[75] Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Richard [101] Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document
James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, et al. 2023. queries. In Proceedings of the first instructional conference on machine learning,
RA-DIT: Retrieval-Augmented Dual Instruction Tuning. In ICLR. Vol. 242. Citeseer, 29–48.
[76] Haochen Liu, Jamell Dacon, Wenqi Fan, Hui Liu, Zitao Liu, and Jiliang Tang. [102] Rita Ramos, Bruno Martins, Desmond Elliott, and Yova Kementchedjhieva. 2023.
2020. Does Gender Matter? Towards Fairness in Dialogue Systems. In ACL. Smallcap: lightweight image captioning prompted with retrieval augmentation.
[77] Haochen Liu, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao In CVPR. 2840–2849.
Liu, Anil K Jain, and Jiliang Tang. 2021. Trustworthy ai: A computational [103] Benjamin Z. Reichman and Larry Heck. 2024. Retrieval-Augmented Genera-
perspective. arXiv:2107.06641 (2021). tion: Is Dense Passage Retrieval Retrieving? https://arxiv.org/html/2402.11035v1
[78] Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, (2024).
Jian Tang, Chaowei Xiao, and Animashree Anandkumar. 2023. Multi-modal [104] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings
molecule structure–text model for text-based retrieval and editing. Nature using Siamese BERT-Networks. In EMNLP-IJCNLP. 3982–3992.
Machine Intelligence 5, 12 (2023), 1447–1457. [105] Yubing Ren, Yanan Cao, Ping Guo, Fang Fang, Wei Ma, and Zheng Lin. 2023.
[79] Ye Liu, Semih Yavuz, Rui Meng, Dragomir Radev, Caiming Xiong, and Yingbo Retrieve-and-sample: Document-level event argument extraction via hybrid
Zhou. 2022. Uni-Parser: Unified Semantic Parser for Question Answering on retrieval augmentation. In ACL. 293–306.
Knowledge Base and Database. In EMNLP. 8858–8869. [106] Stephen Robertson, Hugo Zaragoza, et al. 2009. The probabilistic relevance
[80] Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, and Nigam Shah. 2023. framework: BM25 and beyond. Foundations and Trends® in Information Retrieval
Clinfo. ai: An open-source retrieval-augmented large language model system for 3, 4 (2009), 333–389.
answering medical questions using scientific literature. In PACIFIC SYMPOSIUM [107] Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022. Learning To Retrieve
ON BIOCOMPUTING 2024. 8–23. Prompts for In-Context Learning. In NAACL-HLT. 2655–2671.
[81] Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay [108] Sara Sarto, Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. 2022. Retrieval-
Rajpurohit, Peter Clark, and Ashwin Kalyan. 2023. Dynamic Prompt Learning augmented transformer for image captioning. In CBMI. 1–7.
via Policy Gradient for Semi-structured Mathematical Reasoning. In ICLR. [109] Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli,
Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2024.
Toolformer: Language models can teach themselves to use tools. In NeurIPS.

6500
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models KDD ’24, August 25–29, 2024, Barcelona, Spain

[110] Zhihong Shao, Yeyun Gong, Minlie Huang, Nan Duan, Weizhu Chen, et al. EMNLP. 6397–6407.
2023. Enhancing Retrieval-Augmented Large Language Models with Iterative [136] Yuhuai Wu, Markus Norman Rabe, DeLesley Hutchins, and Christian Szegedy.
Retrieval-Generation Synergy. In EMNLP. 2022. Memorizing Transformers. In ICLR.
[111] Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, and Heng Tao Shen. 2015. [137] Fangyuan Xu, Weijia Shi, and Eunsol Choi. 2023. RECOMP: Improving retrieval-
Learning binary codes for maximum inner product search. In ICCV. 4148–4156. augmented LMs with context compression and selective augmentation. In ICLR.
[112] Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya [138] Jitao Xu, Josep-Maria Crego, and Jean Senellart. 2020. Boosting neural machine
Nachmani, and Yaniv Taigman. 2023. kNN-Diffusion: Image Generation via translation with similar translations. In ACL. 1570–1579.
Large-Scale Retrieval. In ICLR. [139] Jing Xu, Arthur Szlam, and Jason Weston. 2022. Beyond Goldfish Memory:
[113] Peng Shi, Rui Zhang, He Bai, and Jimmy Lin. 2022. XRICL: Cross-lingual Long-Term Open-Domain Conversation. In ACL (1). 5180–5197.
Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Se- [140] Ling Yang, Zhilin Huang, Xiangxin Zhou, Minkai Xu, Wentao Zhang, Yu Wang,
mantic Parsing. In EMNLP (Findings). 5248–5259. Xiawu Zheng, Wenming Yang, Ron O Dror, Shenda Hong, et al. 2023. Prompt-
[114] Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike based 3d molecular diffusion models for structure-based drug design. (2023).
Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. Replug: Retrieval-augmented [141] Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, and Lingpeng Kong. 2023.
black-box language models. arXiv:2301.12652 (2023). Compositional exemplars for in-context learning. In ICML. 39818–39833.
[115] Guy Shtar. 2021. Multimodal machine learning for drug knowledge discovery. [142] Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, and Yongbin Li. 2023.
In WSDM. 1115–1116. Large Language Models are Versatile Decomposers: Decomposing Evidence and
[116] Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. 2021. Questions for Table-based Reasoning. In SIGIR. 174–184.
Retrieval Augmentation Reduces Hallucination in Conversation. In EMNLP [143] Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li.
(Findings). 3784–3803. 2024. Financial Report Chunking for Effective Retrieval Augmented Generation.
[117] Suzanna Sia and Kevin Duh. 2023. In-context learning as maintaining co- arXiv:2402.05131 (2024).
herency: A study of on-the-fly machine translation using large language models. [144] Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang,
arXiv:2305.03573 (2023). Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et al. 2016.
[118] Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Ranking relevance in yahoo search. In KDD. 323–332.
Data Eng. Bull. 24, 4 (2001), 35–43. [145] Dani Yogatama, Cyprien de Masson d’Autume, and Lingpeng Kong. 2021. Adap-
[119] Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kalu- tive semiparametric language models. TACL 9 (2021), 362–373.
arachchi, Rajib Rana, and Suranga Nanayakkara. 2023. Improving the domain [146] Ori Yoran, Tomer Wolfson, Ori Ram, and Jonathan Berant. 2023. Making
adaptation of retrieval augmented generation (RAG) models for open domain Retrieval-Augmented Language Models Robust to Irrelevant Context. In ICLR.
question answering. TACL 11 (2023), 1–17. [147] Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya
[120] Mingyang Song, Yi Feng, and Liping Jing. 2023. Hisum: Hyperbolic interaction Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. 2023. Generate rather
model for extractive multi-document summarization. In WWW. 1427–1436. than Retrieve: Large Language Models are Strong Context Generators. In ICLR.
[121] Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its [148] Wenhao Yu, Zhihan Zhang, Zhenwen Liang, Meng Jiang, and Ashish Sabhar-
application in retrieval. Journal of documentation 28, 1 (1972), 11–21. wal. 2023. Improving language models via plug-and-play retrieval feedback.
[122] Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu arXiv:2305.14002 (2023).
Jain, and Felix Yu. 2024. Spectr: Fast speculative decoding via optimal transport. [149] Zichun Yu, Chenyan Xiong, Shi Yu, and Zhiyuan Liu. 2023. Augmentation-
In NeurIPS. Adapted Retriever Improves Generalization of Language Models as Generic
[123] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Plug-In. In ACL. 2421–2436.
Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti [150] Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu,
Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang, et al. 2024. The Good and
arXiv:2307.09288 (2023). The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG).
[124] Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. arXiv:2402.16893 (2024).
2023. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge- [151] Boyu Zhang, Hongyang Yang, Tianyu Zhou, Muhammad Ali Babar, and Xiao-
Intensive Multi-Step Questions. In ACL. Yang Liu. 2023. Enhancing financial sentiment analysis via retrieval augmented
[125] Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, large language models. In CM International Conference on AI in Finance. 349–356.
Jinsong Su, and Dong Yu. 2023. Search-engine-augmented dialogue response [152] Houyu Zhang, Zhenghao Liu, Chenyan Xiong, and Zhiyuan Liu. 2020. Grounded
generation with cheaply supervised query production. Artificial Intelligence 319 Conversation Generation as Guided Traverses in Commonsense Knowledge
(2023), 103874. Graphs. In ACL. 2031–2043.
[126] Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad [153] Jiahao Zhang, Rui Xue, Wenqi Fan, Xin Xu, Qing Li, Jian Pei, and Xiaorui Liu.
Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, et al. 2023. Shall We 2024. Linear-Time Graph Neural Networks for Scalable Recommendations.
Pretrain Autoregressive Language Models with Retrieval? A Comprehensive arXiv:2402.13973 (2024).
Study. In EMNLP. 7763–7786. [154] Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee,
[127] Hanbing Wang, Xiaorui Liu, Wenqi Fan, Xiangyu Zhao, Venkataramana Kini, Honglak Lee, and Lu Wang. 2023. Merging generated and retrieved knowledge
Devendra Yadav, Fei Wang, Zhen Wen, Jiliang Tang, and Hui Liu. 2024. Re- for open-domain QA. arXiv:2310.14393 (2023).
thinking Large Language Model Architectures for Sequential Recommendations. [155] Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng,
arXiv:2402.09543 (2024). Fangcheng Fu, Ling Yang, Wentao Zhang, and Bin Cui. 2024. Retrieval-
[128] Liang Wang, Nan Yang, and Furu Wei. 2024. Learning to Retrieve In-Context Augmented Generation for AI-Generated Content: A Survey. arXiv:2402.19473
Examples for Large Language Models. In EACL. 1752–1767. (2024).
[129] Xintao Wang, Qianwen Yang, Yongting Qiu, Jiaqing Liang, Qianyu He, [156] Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do,
Zhouhong Gu, Yanghua Xiao, and Wei Wang. 2023. Knowledgpt: Enhanc- Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, et al.
ing large language models with retrieval and storage access on knowledge bases. 2023. Retrieving multimodal information for augmented generation: A survey.
arXiv:2308.11761 (2023). arXiv:2303.10868 (2023).
[130] Yile Wang, Peng Li, Maosong Sun, and Yang Liu. 2023. Self-Knowledge Guided [157] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou,
Retrieval Augmentation for Large Language Models. In EMNLP. Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey
[131] Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard G. Baraniuk, and of large language models. arXiv:2303.18223 (2023).
Anima Anandkumar. 2023. Retrieval-based Controllable Molecule Generation. [158] Zihuai Zhao, Wenqi Fan, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen
In ICLR. Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, et al. 2024. Recommender systems
[132] Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N Ioannidis, in the era of large language models (llms). TKDE (2024).
Huzefa Rangwala, and Rishita Anubhai. 2023. BioBridge: Bridging Biomedical [159] Zexuan Zhong, Tao Lei, and Danqi Chen. 2022. Training Language Models with
Foundation Models via Knowledge Graph. arXiv:2310.03320 (2023). Memory Augmentation. In EMNLP.
[133] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, [160] Shuyan Zhou, Uri Alon, Frank F Xu, Zhengbao Jiang, and Graham Neubig. 2022.
Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits rea- Docprompting: Generating code by retrieving the docs. In ICLR.
soning in large language models. In NeurIPS. 24824–24837. [161] Yinghao Zhu, Changyu Ren, Shiyun Xie, Shukai Liu, Hangyuan Ji, Zixiang
[134] Junda Wu, Cheng-Chun Chang, Tong Yu, Zhankui He, Jianing Wang, Yupeng Wang, Tao Sun, Long He, Zhoujun Li, Xi Zhu, et al. 2024. REALM: RAG-Driven
Hou, and Julian McAuley. 2024. CoRAL: Collaborative Retrieval-Augmented Enhancement of Multimodal Electronic Health Records Analysis via Large
Large Language Models Improve Long-tail Recommendation. arXiv:2403.06447 Language Models. arXiv:2402.07016 (2024).
(2024). [162] Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. 2024. PoisonedRAG:
[135] Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettle- Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large
moyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Language Models. arXiv:2402.07867 (2024).

6501

You might also like