0% found this document useful (0 votes)
89 views10 pages

Exploring The Impact of Large Language Models On Recommender Systems: An Extensive Review

This paper reviews the transformative role of Large Language Models (LLMs) in recommender systems, highlighting their unique reasoning capabilities that surpass traditional methods. It introduces a systematic taxonomy for categorizing LLMs in recommendation contexts, discusses essential techniques, and addresses challenges faced by conventional systems while proposing LLM-based solutions. The findings indicate that LLMs enhance contextual understanding and adaptability in recommendations, despite ongoing challenges that require further refinement.

Uploaded by

Kaustubh Sathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views10 pages

Exploring The Impact of Large Language Models On Recommender Systems: An Extensive Review

This paper reviews the transformative role of Large Language Models (LLMs) in recommender systems, highlighting their unique reasoning capabilities that surpass traditional methods. It introduces a systematic taxonomy for categorizing LLMs in recommendation contexts, discusses essential techniques, and addresses challenges faced by conventional systems while proposing LLM-based solutions. The findings indicate that LLMs enhance contextual understanding and adaptability in recommendations, despite ongoing challenges that require further refinement.

Uploaded by

Kaustubh Sathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Exploring the Impact of Large Language Models on

Recommender Systems: An Extensive Review


Arpita Vats Vinija Jain*
Santa Clara University Stanford University
Santa Clara, USA Amazon
Palo Alto, USA

Rahul Raja Aman Chadha*


arXiv:2402.18590v3 [cs.IR] 19 Mar 2024

Carnegie Mellon University Stanford University


Pittsburgh, USA Amazon GenAI
Palo Alto, USA
Abstract gap between strengths and challenges for improved system perfor-
The paper underscores the significance of Large Language Models mance. Essentially, this paper makes three significant contributions
(LLMs) in reshaping recommender systems, attributing their value to the realm of LLMs in recommenders:
to unique reasoning abilities absent in traditional recommenders. • Introducing a systematic taxonomy designed to categorize
Unlike conventional systems lacking direct user interaction data, LLMs for recommenders.
LLMs exhibit exceptional proficiency in recommending items, show- • Systematizing the essential and primary techniques illustrat-
casing their adeptness in comprehending intricacies of language. ing how LLMs are utilized in recommender systems, provid-
This marks a fundamental paradigm shift in the realm of recom- ing a detailed overview of current research in this domain.
mendations. Amidst the dynamic research landscape, researchers • Deliberating on the challenges and limitations associated with
actively harness the language comprehension and generation capa- traditional recommender systems, accompanied by solutions
bilities of LLMs to redefine the foundations of recommendation using LLMs in recommenders.
tasks. The investigation thoroughly explores the inherent strengths
of LLMs within recommendation frameworks, encompassing nu-
anced contextual comprehension, seamless transitions across diverse 2 Background and Related Work
domains, adoption of unified approaches, holistic learning strate-
gies leveraging shared data reservoirs, transparent decision-making, In this section, we provide a concise overview of pertinent literature
and iterative improvements. Despite their transformative potential, concerning recommender systems and methods involving LLMs.
challenges persist, including sensitivity to input prompts, occasional
misinterpretations, and unforeseen recommendations, necessitating 2.1 Recommender Systems
continuous refinement and evolution in LLM-driven recommender Traditional recommender systems follow Candidate Generation, Re-
systems. trieval, and Ranking phases. However, the advent of LLMs brings
a new perspective. Unlike conventional models, LLMs do not re-
1 Introduction quire separate embeddings for each user/item interaction. Instead,
Recommendation systems are crucial for personalized content dis- they use task-specific prompts encompassing user data, item infor-
covery, and the integration of Large Language Models (LLMs) in mation, and previous preferences. This adaptability allows LLMs
Natural Language Processing (NLP) is revolutionizing these sys- to generate recommendations directly, dynamically adapting to var-
tems [20]. LLMs, with their language comprehension capabilities, ious contexts without explicit embeddings. While departing from
recommend items based on contextual understanding, eliminating traditional models, this unified approach retains the capacity for
the need for explicit behavioral data. The current research land- personalized and contextually-aware recommendations, offering a
scape is witnessing a surge in efforts to leverage LLMs for refining more cohesive and adaptable alternative to segmented retrieval and
recommender systems, transforming recommendation tasks into ex- ranking structures.
ercises in language understanding and generation. These models ex-
cel in contextual understanding, adapting to zero and few-shot do-
2.2 Large Language Models (LLMs)
mains [7], streamlining processes, and reducing environmental im-
pact. This paper delves into how LLMs enhance transparency and LLMs, such as Llama [51], GPT [43], T5 [44], etc., are versatile in
interactivity in recommender systems, continuously refining perfor- NLP. BERT is an encoder-only model with bidirectional attention,
mance through user feedback. Despite challenges, researchers pro- GPT employs a transformer decoder for one-directional processing,
pose approaches to effectively leverage LLMs, aiming to bridge the and T5 transforms NLP problems into text generation tasks. Recent
LLMs like GPT-3 [4], Language Model for Dialogue Applications
* Work does not relate to position at Amazon. (LaMDA) [50], Pathways Language Model (PaLM) [6], and Vicuna
February, 2024 excel in understanding human-like textual knowledge, employing
2024. In-Context Learning (ICL) [11] for context-based responses.
February, 2024 Arpita Vats, Vinija Jain* , Rahul Raja, and Aman Chadha*

LlamaRec [68]
CoLLM [74]
RecMind [54]
RecRec [52]
P5 [17]
RecExplainer [25]
LLM-Powered Recommender Systems
DOKE [67]
§3.1
RLMRec [45]
RARS [9]
GenRec [22]
RIF [72]
Recommender AI Agent [21]
POSO [12]

RecAgent [53]
AnyPredict [60]
ZRRS [18]
LGIR [12]
MINT [39]
Off-the-shelf Recommender Systems
LLM4Vis [54]
§3.2
ONCE [34]
GPT4SM [40]
TransRec [32]
Agent4Rec [70]
Collaborative LLMs [76]

PDRec [38]
G-Meta [63]
ELCRec [35]
GPTRec [41]
Sequential Recommender Systems DRDT [59]
§3.3 LLaRA [31]
E4SRec [28]
RecInterpreter [66]
VQ-Rec [19]
One Model for All [49]

CRSs [48]
Conversational Recommender Systems LLMCRS [14]
LLMs in Recommender Systems §3.4 Chat-REC [16]
§3 ChatQA [36]

PALR [65]
Enhanced Recommendation [71]
Personalized Recommender Systems PAP-REC [30]
§3.5 Health-LLM [23]
Music Recommender [3]
ControlRec [42]

KoLA [57]
Knowledge Graph-enhanced Recommender Systems
KAR [61]
§3.6
LLMRG [55]

Diverse Reranking [5]


Reranking in Recommender Systems
Multishot Reranker [62]
§3.7
Ranking GPT [72]

Reprompting [46]
ProLLM4Rec [64]
UEM [10]
Prompt Engineered LLMs in Recommender Systems POD [30]
§4 M6-Rec [7]
PBNR [30]
LLMRG [58]
RecSys [13]

TALLRec [2]
Flan-T5 [24]
Fine-tuned LLMs for Recommender Systems InstrucRec [73]
§5 RecLLM [15]
DEALRec [33]
INTERS [77]

iEvaLM [58]
Evaluating LLMs in Recommender Systems
FaiRLLM [72]
§6
Ranking GPT [70]

Figure 1: Taxonomy of Recommendation in LLMs, encompassing LLMs in Recommendation Systems, Sequential & Conversational
Recommender Systems, Personalized recommender system Knowledge Graph enhancements, Reranking, Prompts, and Fine-Tuned
LLMs. Offers a condensed overview of models and methods within the Recommendation landscape.
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review February, 2024

3 LLMs in Recommender Systems from embedding-based models, which often lack transparency. The
In this section, we will investigate how LLMs enhance deep learning- authors introduce three methods for aligning LLMs with recom-
based recommender systems by playing crucial roles in user data mender models: behavior alignment, replicating recommendations
collection, feature engineering, and scoring/ranking functions. Go- using language; intention alignment, directly comprehending rec-
ing beyond their role as mere components, LLMs actively govern ommender embeddings; and hybrid alignment, combining both ap-
the system pipeline, fostering interactive and explainable recom- proaches. LLMs can adeptly comprehend and generate high-quality
mendation processes. They possess the capability to comprehend explanations for recommendations, thereby addressing the conven-
user preferences, orchestrate ranking stages, and contribute to the tional tradeoff between interpretability and complexity.
evolution of integrated conversational recommender systems. DOKE: Yao et al. [67], introduce DOKE, a paradigm enhancing
LLMs with domain-specific knowledge for practical applications.
DOKE utilizes an external domain knowledge extractor to prepare
and express knowledge in LLM-understandable formats, avoiding
3.1 LLM-Powered Recommender Systems Across
costly fine-tuning. Demonstrated in recommender systems, DOKE
Domains provides item attributes and collaborative filtering signals.
In this section, we delve into the wide-reaching applications of RLMRec: Ren et al. [45] introduce recommender system challenges
LLM-powered recommender systems across diverse domains. From related to graph-based models and ID-based data limitations. Intro-
entertainment to shopping and conversational agents, witness the ducing RLMRec, a model-agnostic framework, the paper integrates
versatile impact of LLMs. LLMs to improve recommendations by leveraging representation
LlamaRec: Yue et al. [68] introduce LlamaRec, a two-stage recom- learning and aligning semantic spaces. The study establishes a theo-
mendation system. In the first stage, a sequential recommender uses retical foundation and practical evaluations demonstrate RLMRec’s
user history to efficiently choose candidate items. The selected can- robustness to noise and incomplete data in enhancing the recom-
didates and user history are then fed into an LLM using a tailored mendation process.
prompt template. Instead of autoregressive generation, a verbalizer RARS: Di et al. [9] highlights that LLMs exhibit contextual aware-
is used to convert LLM output logits into probability distributions, ness and a robust ability to adapt to previously unseen data.By amal-
speeding up the inference process and enhancing efficiency. This gamating these technologies, a potent tool is formed for delivering
novel approach overcomes the text generation sluggishness typi- contextual and pertinent recommendations, particularly in cold sce-
cally encountered, resulting in more efficient recommendations. narios marked by extensive data sparsity. It introduces a innova-
RecMind: Wang et al. [58] introduce RecMind, an innovative rec- tive approach named Retrieval-augmented Recommender Systems,
ommender agent fueled by LLMs. RecMind, as an autonomous which merges the advantages of retrieval-based and generation-based
agent, is intricately designed to furnish personalized recommenda- models to augment the capability of RSs in offering pertinent sug-
tions through strategic planning and the utilization of external tools. gestions.
RecMind incorporates the Self-Inspiring (SI) planning algorithm. GenRec: Ji et al. [22] intorduces a new application of LLMs in
This algorithm empowers the agent by enabling it to consider all recommendation systems, particularly enhancing user engagement
previously explored planning paths, thereby enhancing the genera- with diverse data. The paper introduces GenRec model which lever-
tion of more effective recommendations. The agent’s comprehen- ages descriptive item information, leading to more sophisticated
sive framework includes planning, memory, and tools, collectively personalized recommendations. Experimental results confirms Gen-
bolstering its capacities in reasoning, acting, and memory retention. Rec’s effectiveness, indicating its potential for various applications
RecRec: Verma et al. [52] introduce RecRec, aiming to create al- and encouraging further research on LLMs in generative recom-
gorithmic recourse-based explanations for content filtering-based mender systems.
recommender systems. RecRec offers actionable insights, suggest- Recommendation as Instruction Following: Zhang et al. [71] in-
ing specific actions to modify recommendations based on desired troduce a novel recommendation concept by expressing user prefer-
preferences. The authors advocate for an optimization-based ap- ences through natural language instructions for LLMs. It involves
proach, validating RecRec’s effectiveness through empirical eval- fine-tuning a 3B Flan-T5-XL LLM to align with recommender sys-
uations on real-world datasets. The results demonstrate its ability to tems, using a comprehensive instruction format. This paradigm treats
generate valid, sparse, and actionable recourses that provide valu- recommendation as instruction-following, allowing users flexibility
able insights for improving product rankings. in expressing diverse information needs.
P5: Geng et al. [17] introduce a groundbreaking contribution made Recommender AI Agent: Huang et al. [21] introduce RecAgent,
by introducing a unified “Pretrain, Personalized Prompt & Predict a pioneering framework merging LLMs and recommender models
Paradigm." This paradigm seamlessly integrates various recommen- for an interactive conversational recommender system. Addressing
dation tasks into a cohesive conditional language generation frame- the strengths and weaknesses of each, RecAgent uses LLMs for
work. It involves the creation of a carefully designed set of per- language comprehension and reasoning (the ’brain’) and recom-
sonalized prompts covering five distinct recommendation task fam- mender models for item recommendations (the ’tools’). The paper
ilies.Noteworthy is P5’s robust zero-shot generalization capability, outlines essential tools, including a memory bus for communication,
demonstrating its effectiveness in handling new personalized prompts dynamic demonstration-augmented task planning, and a reflection
and previously unseen items in unexplored domains. strategy for quality evaluation, creating a flexible and comprehen-
RecExplainer: Lei et al. [25] introduce utilization of LLMs as al- sive system.
ternative models for interpreting and elucidating recommendations
February, 2024 Arpita Vats, Vinija Jain* , Rahul Raja, and Aman Chadha*

CoLLM: Zhang et al. [74] introduce LLMRec, emphasizing col- combination of synthetic queries and original interaction data. Ex-
laborative information modeling alongside text semantics in rec- periments show the approach’s effectiveness, making it a successful
ommendation systems. CoLLM, seamlessly incorporating collabo- strategy for training compact retrieval models that outperform alter-
rative details into LLMs using external traditional models for im- natives and LLM baselines in narrative-driven recommendation.
proved recommendations in cold and warm start scenarios. PEPLER: Li et al. [26] introduce PEPLER, a system that gener-
POSO: Dai et al. [8] introduce a Personalized COld Start MOdules ates natural language explanations for recommendations by treating
(POSO),enhancing pre-existing modules with user-group-specialized user and item IDs as prompts. Two training strategies are proposed
sub-modules and personalized gates for comprehensive represen- to bridge the gap between continuous prompts and the pre-trained
tations. Adaptable to various modules like Multi-layer Perceptron model, aiming to enhance the performance of explanation genera-
and Multi-head Attention, POSO shows significant performance im- tion. The researchers suggest that this approach could inspire others
provement with minimal computational overhead. in tuning pre-trained language models more effectively. Evaluation
of the generated explanations involves not only text quality metrics
such as BLEU and ROUGE but also metrics focused on explainabil-
ity from the perspective of item features. Results from extensive
3.2 Off-the-shelf LLM-based Recommender
experiments indicate that PEPLER consistently outperforms state-
Systems of-the-art baselines.
In this section we examine recommender systems that operate with- LLM4Vis: Wang et al. [54] introduce LLM4Vis, a ChatGPT-based
out tuning, specifically focusing on whether adjustments have been method for accurate visualization recommendations and human-like
made to the LLM. explanations with minimal demonstration examples. The method-
RecAgent: Wang et al. [53] proposes the potential of models for ology involves feature description, demonstration example selec-
robust user simulation, particularly in reshaping traditional user be- tion, explanation generation, demonstration example construction,
havior analysis. They focus on recommender systems, employing and inference steps. A new explanation generation bootstrapping
LLMs to conceptualize each user as an autonomous agent within method refines explanations iteratively, considering previous gener-
a virtual simulator named RecAgent. It introduces global functions ations and using template-based hints.
for real-human playing and system intervention, enhancing simu- ONCE: Liu et al. [34] introduce a combination of open-source
lator flexibility. Extensive experiments are conducted to assess the and closed-source LLMs to enhance content-based recommenda-
simulator’s effectiveness from both agent and system perspectives. tion systems . Open-source LLMs contribute as content encoders,
MediTab: Wang et al. [60] introduce MediTab, a method enhanc- while closed-source LLMs enrich training data using prompting
ing scalability for medical tabular data predictors across diverse in- techniques. Extensive experiments show significant effectiveness,
puts. Using an LLM-based data engine, they merge tabular sam- with a relative improvement of up to 19.32% compared to existing
ples with distinct schemas. Through a "learn, annotate, and refine- models, highlighting the potential of both LLM types in advancing
ment" pipeline, MediTab aligns out-domain data with the target content-based recommendations.
task, enabling it to make inferences for arbitrary tabular inputs with- GPT4SM: Peng et al. [40]proposes three strategies to integrate the
out fine-tuning. Achieving impressive results on patient and trial knowledge of LLMs into basic PLMs, aiming to improve their over-
outcome prediction datasets, MediTab demonstrates substantial im- all performance. These strategies involve utilizing GPT embedding
provements over supervised baselines and outperforms XGBoost as a feature (EaaF) to enhance text semantics, using it as a regular-
models in zero-shot scenarios. ization (EaaR) to guide the aggregation of text token embeddings,
ZRRS: Hou et al. [19] introduce a recommendation problem as a and incorporating it as a pre-training task (EaaP) to replicate the ca-
conditional ranking task, using LLMs to address it with a designed pabilities of LLMs. The experiments conducted by the researchers
template. Experimental results on widely-used datasets showcase demonstrate that the integration of GPT embeddings enables basic
promising zero-shot ranking capabilities.The authors propose spe- PLMs to enhance their performance in both advertising and recom-
cial prompting and bootstrapping strategies, demonstrating effec- mendation tasks.
tiveness. These insights position zero-shot LLMs as competitive TransRec: Lin et al. [32] introduce TransRec, a pioneering multi-
challengers to conventional recommendation models, particularly facet paradigm designed to establish a connection between LLMs
in ranking candidates from multiple generators. and recommendation systems. TransRec employs multi-facet iden-
LGIR: Du et al. [12] proposes an innovative job recommendation tifiers, including ID, title, and attribute information, to achieve a
method based on LLMs that overcomes limitations in fabricated balance of distinctiveness and semantic richness. Additionally, the
generation. This comprehensive approach enhances the accuracy authors present a specialized data structure for TransRec, ensuring
and meaningfulness of resume completion, even for users with lim- precise in-corpus identifier generation. The adoption of substring
ited interaction records. To address few-shot problems, the authors indexing is implemented to encourage LLMs to generate content
suggest aligning low-quality resumes with high-quality generated from various positions. The researchers conduct the implementa-
ones using Generative Adversarial Networks (GANs), refining rep- tion of TransRec on two core LLMs, specifically BART-large and
resentations and improving recommendation outcomes. LLaMA-7B.
MINT: Mysore et al. [39] use LLMs for data augmentation in train- Agent4Rec: Zhang et al. [69] introduce Agent4Rec, a movie rec-
ing Narrative-Driven Recommendation (NDR) models. LLMs gen- ommendation simulator using LLM-empowered generative agents.
erate synthetic narrative queries from user-item interactions using These agents have modules for user profiles, memory, and actions
few-shot prompting. Retrieval models for NDR are trained with a tailored for recommender systems. The study explores how well
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review February, 2024

LLM-empowered generative agents simulate real human behavior GPTRec: Petrov et al. [41] introduce GPTRe, a GPT-2-based se-
in recommender systems, evaluating alignment and deviations be- quential recommendation model using the SVD Tokenisation algo-
tween agents and user preferences. Experiments delve into the filter rithm to address vocabulary challenges. The paper presents a Next-
bubble effect and uncover causal relationships in recommendation K recommendation strategy, demonstrating its effectiveness. Exper-
tasks. imental results on the MovieLens-1M dataset show that GPTRec
Collaborative LLMs for Recommender Systems: Zhu et al. [76] matches SASRec’s quality while reducing the embedding table by
introduce a Collaborative LLMscite, as a novel recommendation 40%.
system. It combines pretrained LLMs with traditional ones to ad- DRDT: Wang et al. [59] introduce the improvement of LLM reason-
dress challenges in spurious correlations and language modeling. ing in sequential recommendations. The methodology introduces
The methodology extends the LLM’s vocabulary with special to- Dynamic Reflection with Divergent Thinking (DRDT), within a
kens for users and items, ensuring accurate user-item interaction retriever-reranker framework. Leveraging a collaborative demon-
modeling. Mutual regularization in pretraining connects collabora- stration retriever, it employs divergent thinking to comprehensively
tive and content semantics, and stochastic item reordering manages analyze user preferences. The dynamic reflection component emu-
non-crucial item order. lates human learning, unveiling the evolution of users’ interests.
LLaRA: Liao et al. [31] introduce LLaRA, a framework for mod-
eling sequential recommendations within LLMs. LLaRA adopts a
hybrid approach, combining ID-based item embeddings from con-
ventional recommenders with textual item features in LLM prompts.
Addressing the "sequential behavior of the user" as a novel modal-
3.3 LLMs in Sequential Recommender Systems ity, an adapter bridges the gap between traditional recommender ID
It is a recommendation approach that focuses on the order of a embeddings and LLM input space. Utilizing curriculum learning,
user’s past interactions to predict their next likely actions. It con- researchers gradually transition from text-only to hybrid prompting
siders the sequence of items a user has viewed, purchased, or inter- during training, enabling LLMs to adeptly handle sequential recom-
acted with, rather than just overall popularity or static preferences. mendation tasks.
This allows for more personalized and timely recommendations that E4SRec: Li et al. [28] introduce E4SRec, seamlessly integrating
reflect the user’s current interests and evolving tastes. LMs with traditional recommender systems using item IDs. E4SRec
PDRec: Ma et al. [38] proposes a Plug-in Diffusion Model for Se- efficiently generates ranking lists in a single forward process, ad-
quential Recommendation. This innovative framework employs dif- dressing challenges in integrating IDs with LLMs and proposing an
fusion models as adaptable plugins, capitalizing on user preferences industrial-level recommender system with demonstrated superiority
related to all items generated through the diffusion process to ad- in real-world applications.
dress the challenge of data sparsity. By incorporating time-interval RecInterpreter: Yang et al. [66] propose RecInterpreter, evaluat-
diffusion, PDRec infers dynamic user preferences, adjusts historical ing LLMs for interpreting sequential recommender representation
behavior weights, and enhances potential positive instances. Addi- space. Using multimodal pairs and lightweight adapters, RecInter-
tionally, it samples noise-free negatives from the diffusion output to preter enhances LLaMA’s understanding of ID-based sequential rec-
optimize the model. ommenders, particularly with sequence-residual prompts. It also en-
G-Meta: Xio et al. [63] introduce G-Meta framework, which is ables LLaMA to determine the ideal next item for a user from gen-
designed for the distributed training of optimization-based meta- erative recommenders like DreamRec
learning recommendation models on GPU clusters. It optimizes com- VQ-Rec: Hou et al. [18] propose VQ-Rec, a novel method for trans-
putation and communication efficiency through a combination of ferable sequential recommenders using Vector-Quantized item rep-
data and model parallelism, and includes a Meta-IO pipeline for ef- resentations. It translates item text into indices, generates represen-
ficient data ingestion. Experimental results demonstrate G-Meta’s tations, and employs enhanced contrastive pre-training with mixed-
ability to achieve accelerated training without compromising sta- domain code representations. A differentiable permutation-based
tistical performance. Since 2022, it has been implemented in Al- network guides a unique cross-domain fine-tuning approach, demon-
ibaba’s core systems, resulting in a notable 4x reduction in model strating effectiveness across six benchmarks in various scenarios.
delivery time and significant improvements in business metrics. Its K-LaMP: Baek et al. [1] proposes enhancing LLMs by incorporat-
key contributions include hybrid parallelism, distributed meta-learning ing user interaction history with a search engine for personalized
optimizations, and the establishment of a high-throughput Meta-IO outputs. The study introduces an entity-centric knowledge store de-
pipeline. rived from users’ web search and browsing activities, forming the
ELCRec: Liu et al. [35] presents ELCRec, a novel approach for in- basis for contextually relevant LLM prompt augmentations.
tent learning in sequential recommendation systems. ELCRec inte- One Model for All: Tang et al. [49] presents LLM-Rec, address-
grates representation learning and clustering optimization within an ing challenges in multi-domain sequential recommendation. They
end-to-end framework to capture users’ intents effectively. It uses use an LLM to capture world knowledge from textual data, bridg-
learnable parameters for cluster centers, incorporating clustering ing gaps between recommendation scenarios. The task is framed
loss for concurrent optimization on mini-batches, ensuring scala- as a next-sentence prediction task for the LLM, representing items
bility. The learned cluster centers serve as self-supervision signals, and users with titles. Increasing the pre-trained language model size
enhancing representation learning and overall recommendation per- enhances both fine-tuned and zero-shot domain recommendation,
formance. with slight impacts on fine-tuning performance. The study explores
February, 2024 Arpita Vats, Vinija Jain* , Rahul Raja, and Aman Chadha*

different fine-tuning methods, noting performance variations based knowledge and reasoning abilities. Deep mutual learning during
on model size and computational resources. joint training enhances this collaboration, bridging information gaps
and leveraging the strengths of both models.
3.4 LLMs in Conversational Recommender PAP-REC: Li et al. [30] proposes PAP-REC a framework that auto-
Systems matically generates personalized prompts for recommendation lan-
guage models (RLMs) to enhance their performance in diverse rec-
This section explores the role of LLMs in Conversational Recom-
ommendation tasks. Instead of depending on inefficient and sub-
mender Systems (CRSs) [48]. CRSs aim to provide quality recom-
optimal manually designed prompts, PAP-REC utilizes gradient-
mendations through dialogue interfaces, covering tasks like user
based methods to effectively explore the extensive space of poten-
preference elicitation, recommendation, explanation, and item in-
tial prompt tokens. This enables the identification of personalized
formation search. However, challenges arise in managing sub-tasks,
prompts tailored to each user, contributing to improved RLM per-
ensuring efficient problem-solving, and generating user-interaction-
formance.
friendly responses in the development of effective CRS.
Health-LLM: Jin et al. [23] introduce a framework integrating LLM
LLMCRS: Feng et al. [14] introduce LLMCRS, a pioneering LLM-
and medical expertise for enhanced disease prediction and health
based Conversational Recommender System. It strategically uses
management using patient health reports. Health-LLM involves ex-
LLM for sub-task management, collaborates with expert models,
tracting informative features, assigning weighted scores by medi-
and leverages LLM’s generation capabilities. LLMCRS incorpo-
cal professionals, and training a classifier for personalized predic-
rates instructional mechanisms and proposes fine-tuning with rein-
tions. Health-LLM offers detailed modeling, individual risk assess-
forcement learning from CRSs performance feedback (RLPF) for
ments, and semi-automated feature engineering, providing profes-
conversational recommendations. Experimental findings show LLM-
sional and tailored intelligent healthcare.
CRS with RLPF outperforms existing methods, demonstrating pro-
Personalized Music Recommendation: Brain et al. [3] introduce
ficiency in handling conversational recommendation tasks.
a Deezer’s shift to a fully personalized system for improved new
Chat-REC: Gao et al. [16] introduce Chat-REC which uses an
music release discoverability. The use of cold start embeddings and
LLM to enhance its conversational recommender by summarizing
contextual bandits leads to a substantial boost in clicks and expo-
user preferences from profiles. The system combines traditional rec-
sure for new releases through personalized recommendations.
ommendation methods with OpenAI’s Chat-GPT for multi-round
GIRL: Zheng et al. [75] introduce GIRL, a job recommendation
recommendations, interactivity, and explainability. To handle cold
approach inspired by LLMs. It uses Supervised Fine-Tuning (SFT)
item recommendations, an item-to-item similarity approach using
for generating Job Descriptions (JDs) from CVs, incorporating a
external current information is proposed. In experiments, Chat-REC
reward model for CV-JD matching. Reinforcement Learning fine-
performs well in zero-shot and cross-domain recommendation tasks.
tunes the generator with Proximal Policy Optimization, creating a
ChatQA: Liu et al. [36] introduce ChatQA conversational question-
candidate-set-free, job seeker-centric model. Experiments on a real-
answering models rivaling GPT-4’s performance, without synthetic
world dataset demonstrate significant effectiveness, signaling a par-
data. They propose a two-stage instruction tuning method to en-
adigm shift in personalized job recommendation.
hance zero-shot capabilities. Demonstrating the effectiveness of fine-
ControlRec: Qiu et al. [42] introduce ControlRec, a framework for
tuning a dense retriever on multi-turn QA datasets, they show it
contrastive prompt learning integrating LLMs into recommendation
matches complex query rewriting models with simpler deployment.
systems. User/item IDs and natural language prompts are treated
as heterogeneous features and encoded independently. The frame-
3.5 LLMs in Personalized Recommenders
work introduces two contrastive objectives: Heterogeneous Feature
Systems Matching (HFM), aligning item descriptions with IDs based on user
PLAR: Yang et al. [65] introduce PALR, a versatile personalized interactions, and Instruction Contrastive Learning (ICL), merging
recommendation framework addressing challenges in the domain. ID and language representations by contrasting output distributions
The process involves utilizing an LLM and user behavior to gener- for recommendation tasks.
ate user profile keywords, followed by a retrieval module for can-
didate pre-filtering. PALR, independent of specific retrieval algo-
3.6 LLMs for Knowledge Graph-enhanced
rithms, leverages the LLM to provide recommendations based on
user historical behaviors. To tailor general-purpose LLMs for rec- Recommender Systems
ommendation scenarios, user behavior data is converted into prompts, This section explores how Language Models (LLMs) are utilized
and an LLaMa 7B model is fine-tuned. PALR demonstrates com- to enhance knowledge graphs within recommender systems, lever-
petitive performance against state-of-the-art methods on two public aging natural language understanding to enrich data representation
datasets, making it a flexible recommendation solution. and recommendation outcomes.
Bridging LLMs and Domain-Specific Models for Enhanced Rec- KoLA: Wang et al. [57] introduce the significance of incorporat-
ommendation: Zhang et al. [73] introduce information disparity ing knowledge graphs in recommender systems, as indicated by
between domain-specific models and LLMs for personalized rec- the benchmark evaluation KoLA [68]. Knowledge graphs enhance
ommendation. The approach incorporates an information sharing recommendation explainability, with embedding-based, path-based,
module, acting as a repository and conduit for collaborative train- and unified methods improving recommendation quality. Challenges
ing. This enables a reciprocal exchange of insights: domain mod- in dataset sparsity, especially in private domains, are noted, and
els provide user behavior patterns, and LLMs contribute general LLMs like symbolic-kg help address data scarcity, though issues
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review February, 2024

like the phantom problem and common sense limitations must be 4 Prompt Engineered LLMs in Recommender
tackled for optimal recommender system performance Systems
KAR: Xi et al. [61] introduce KAR, the Open-World Knowledge
This section investigates the incorporation of prompt engineering
Augmented Recommendation Framework with LLM. KAR extracts
by LLMs in the context of recommendation systems, emphasizing
reasoning knowledge on user preferences and factual knowledge
their role in refining and enhancing user prompts for improved rec-
on items from LLMs using factorization prompting. The resulting
ommendations.
knowledge is transformed into augmented vectors using a hybrid-
Reprompting: Spur et al. [46] introduce ChatGPT as a conver-
expert adaptor, enhancing the performance of any recommendation
sational recommendation system, emphasizing realistic user inter-
model. Efficient inference is ensured by preprocessing and prestor-
actions and iterative feedback for refining suggestions. The study
ing LLM knowledge.
investigates popularity bias in ChatGPT’s recommendations, high-
LLMRG: LLM Reasoning Graphs (LLMRG) by Wang et al. [57]
lighting that iterative reprompting significantly improves relevance.
utilize LLMs to create personalized reasoning graphs for robust and
Results show ChatGPT outperforms random and traditional sys-
interpretable recommender systems. LLMRG connects user profiles
tems, suggesting effective mitigation of popularity bias through strate-
and behavioral sequences, incorporating chained graph reasoning,
gic prompt engineering.
divergent extension, self-verification, and self-improvement of knowl-
ProLLM4Rec: Xu et al. [64] introduce ProLLM4Rec, as a compre-
edge bases. LLMRG enhances recommendations without extra user
hensive framework utilizing LLMs as recommender systems through
or item information, offering adaptive reasoning for a comprehen-
prompting engineering. The framework focuses on selecting the
sive understanding of user preferences and behaviors.
LLM based on factors like availability, architecture, scale, and tun-
ing strategies, and crafting effective prompts that include task de-
scriptions, user modeling, item candidates, and prompting strate-
gies.
3.7 LLMs for Reranking in Recommender UEM: Doddapaneni et al. [10] introduce User Embedding Mod-
Systems ule (UEM), This module is designed to effectively handle extensive
user preference histories expressed in free-form text, with the goal
This section focuses on the utilization of LLMs for reranking within
of incorporating these histories into LLMs to enhance recommen-
recommendation systems, emphasizing their role in enhancing the
dation performance. The UEM transforms user histories into con-
overall recommendation process.
cise representative embeddings, acting as soft prompts to guide the
Diverse Reranking: Carraro et al. [5] introduce LLMs for rerank-
LLM. This approach addresses the computational complexity asso-
ing recommendations to enhance diversity beyond mere relevance.
ciated with directly concatenating lengthy histories.
In an initial study, the authors confirm LLMs’ ability to proficiently
POD: Chen et al. [27] introduce PrOmpt Distillation (POD), is pro-
interpret and perform reranking tasks with a focus on diversity. They
posed to enhance the efficiency of training models. Experimental
then introduce a more rigorous methodology, prompting LLMs in a
results on three real-world datasets show the effectiveness of POD
zero-shot manner to create a diverse reranking from an initial can-
in both sequential and top-N recommendation tasks.
didate ranking generated by a matrix factorization recommender.
M6-REC: Cui et al. [7] proposes M6-Rec, an efficient model for
MultiSlot ReRanker: Xiao et al. [62] proposes MultiSlot, ReRanker,
sample-efficient open-domain recommendation, integrating all sub-
a model-based framework for re-ranking in recommendation sys-
tasks within a recommender system. It excels in prompt tuning,
tems, addressing relevance, diversity, and freshness. It employs the
maintaining parameter efficiency. Real-world deployment insights
efficient Sequential Greedy Algorithm and utilizes an OpenAI Gym
include strategies like late interaction, parameter sharing, pruning,
simulator to evaluate learning algorithms for re-ranking under di-
and early-exiting. M6-Rec excels in zero/few-shot learning across
verse assumptions.
tasks like retrieval, ranking, personalized product design, and con-
Zero-Shot Ranker: Hou et al. [19] introduce a recommendation as
versational recommendation. Notably, it is deployed on both cloud
a conditional ranking task, utilizing LLMs with carefully designed
servers and edge devices.
prompts based on sequential interaction histories. Despite promis-
PBNR: Li et al. [29] introduce PBNR, as a distinctive news rec-
ing zero-shot ranking abilities, challenges include accurately per-
ommendation approach using personalized prompts to predict user
ceiving interaction order and susceptibility to popularity biases. The
article preferences, accommodating variable user history lengths
study suggests that specially crafted prompting and bootstrapping
during training. The study applies prompt learning, treating news
strategies can alleviate these issues, enabling zero-shot LLMs to
recommendation as a text-to-text language task. PBNR integrates
competitively rank candidates from multiple generators compared
language generation and ranking loss for enhanced performance,
to traditional recommendation models.
aiming to personalize news recommendations, improve user expe-
RankingGPT: Zhang et al. [72] introduce a two-stage training strat-
riences, and contribute to human-computer interaction and inter-
egy to enhance LLMs for efficient text ranking. The initial stage
pretability.
involves pretraining the LLM on a vast weakly supervised dataset,
LLM-REC: Lyu et al. [37] introduce LLM-Rec, a method with four
aiming to improve its ability to predict associated tokens without
altering its original objective. Subsequently, supervised fine-tuning
is applied with constraints to enhance the model’s capability in dis-
cerning relevant text pairs for ranking while preserving its text gen-
eration abilities.
February, 2024 Arpita Vats, Vinija Jain* , Rahul Raja, and Aman Chadha*

prompting strategies: basic, recommendation-driven, engagement- a controllable LLM-based user simulator for synthetic conversa-
guided, and a combination of recommendation-driven and engagement- tions. LLMs interpret user profiles as natural language for personal-
guided prompting. Empirical experiments demonstrate that integrat- ized sessions. The retrieval approach involves a dual-encoder archi-
ing augmented input text from LLM significantly enhances recom- tecture with ranking LLMs explaining item selection. The models
mendation performance, emphasizing the importance of diverse prompts undergo fine-tuning with recommendation data, resulting in a suc-
and input augmentation techniques for improved recommendation cessful proof-of-concept CRS using LaMDA for recommendations
capabilities with LLMs. from public YouTube videos.
Recommender Systems in the Era of LLMs: Fan et al. [13] sur- DEALRec: Lin et al. [33] introduce DEALRec, an innovative data
vey the innovative use of prompting in tailoring LLMs for specific pruning approach for efficient fine-tuning of LLMs in recommenda-
downstream tasks, focusing on recommendation systems. They re- tion tasks. DEALRec identifies representative samples from exten-
view methods involving task-specific prompts to guide LLMs, align- sive datasets, facilitating swift LLM adaptation to new items and
ing downstream tasks with language generation during pre-training. user behaviors while minimizing training costs. The method com-
The study discusses techniques like ICL and CoT in the context of bines influence and effort scores to select influential yet manage-
recommendation tasks within RecSys. It also covers prompt tuning able samples, significantly reducing fine-tuning time and resource
and instruction tuning, incorporating prompt tokens into LLMs and requirements. Experimental results demonstrate LLMs outperform-
updating them based on task-specific recommendation datasets. ing full data fine-tuning with just 2% of the data.
INTERS: Zhu et al. [77] proposes INTERS as an instruction tuning
dataset designed to enhance the capabilities of LLMs in information
retrieval tasks. Spanning 21 search-related tasks distributed across
three key categories—query understanding, document understand-
5 Fine-tuned LLMs in Recommender Systems ing, and query-document relationship understanding—INTERS in-
This section delves into the application of fine-tuned LMs in recom- tegrates data from 43 distinct datasets, including manually crafted
mendation systems, exploring their effectiveness in tailoring recom- templates and task descriptions.INTERS significantly boosts the
mendations for enhanced user satisfaction. performance of various publicly available LLMs in information re-
TALLRec: Bao et al. [2] introduce a Tuning framework for Align- trieval tasks, addressing both in-domain and out-of-domain scenar-
ing LLMs with Recommendations (TALLRec) to optimize LLMs ios.
using recommendation data. This framework combines instruction
tuning and recommendation tuning, enhancing the overall model ef-
fectiveness. Initially, a set of instructions is crafted, specifying task
input and output. The authors implement two tuning stages: instruct- 6 Evaluating LLMs in Recommender System
tuning focuses on generalization using self-instruct data from Stan- This section assesses the performance of LLMs in the context of
ford Alpaca, while rec-tuning structures limited user interactions recommendation systems, evaluating their effectiveness and impact
into Rec Instructions. on recommendation quality.
Flan-T5: Kang et al. [24] conduct a study to assess various LLMs iEvaLM: Wang et al. [56] introduce an interactive evaluation ap-
with parameter sizes ranging from 250 million to 540 billion on proach called iEvaLM, centered on LLMs and utilizing LLM-based
tasks related to predicting user ratings. The evaluation encompassed user simulators. This method enables diverse simulation of system-
zero-shot, few-shot, and fine-tuning scenarios. For fine-tuning ex- user interaction scenarios. The study emphasizes evaluating explain-
periments, the researchers utilized Flan-T5-Base (250 million pa- ability, with ChatGPT demonstrating compelling generation of rec-
rameters) and Flan-U-PaLM (540 billion parameters). In zero-shot ommendations. This research enhances understanding of LLMs’ po-
and few-shot experiments, GPT-3 models from OpenAI and the 540 tential in CRSs and presents a more adaptable evaluation approach
billion parameters Flan-U-PaLM were employed. The experiments for future studies involving LLM-based CRSs.
revealed that the zero-shot performance of LLMs significantly lags FaiRLLM: Zhang et al. [70] posit that there is a need to assess Re-
behind traditional recommender models. cLLM’s fairness. The evaluation focuses on various user-side sen-
InstructRec: Zhang et al. [71] introduce InstructRec for LLM-based sitive attributes, introducing a novel benchmark called Fairness of
recommender systems, framing the task as instruction following. Recommendation via LLM (FaiRLLM). This benchmark includes
Using a versatile instruction format with 39 templates, generated well-designed metrics and a dataset with eight sensitive attributes
fine-grained instructions cover user preferences, intentions, task forms, across music and movie recommendation scenarios. Using FaiR-
and contextual information. The 3-billion-parameter Flan-T5-XL LLM, an evaluation of ChatGPT reveals persistent unfairness to
model is tuned for efficiency, with the LLM serving as a reranker specific sensitive attributes during recommendation generation.
during inference. Selected instruction templates, along with opera- RankGPT: Sun et al. [47] proposed a instructional techniques tai-
tions like concatenation and persona shift, guide the ranking of the lored for LLMs in the context of passage re-ranking tasks. They
candidate item set. present a pioneering permutation generation approach. The evalua-
RecLLM: Friedman et al. [15] propose a roadmap for a large-scale tion process involves a comprehensive assessment of ChatGPT and
Conversational Recommender System (CRS) using LLM technol- GPT-4 across various passage re-ranking benchmarks, including a
ogy. The CRS allows users to control recommendations through newly introduced NovelEval test set. Additionally, the researchers
real-time dialogues, refining suggestions based on direct feedback. propose a distillation approach for acquiring specialized models uti-
To overcome the lack of conversational datasets, the authors use lizing permutations generated by ChatGPT.
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review February, 2024

7 Conclusion [18] Yupeng Hou, Zhankui He, Julian McAuley, and Wayne Xin Zhao. 2023. Learn-
ing Vector-Quantized Item Representation for Transferable Sequential Recom-
In summary, this paper explores the transformative impact of LLMs menders (WWW ’23). Association for Computing Machinery, New York, NY,
in recommender systems, highlighting their versatility and cohesive USA, 1162–1171. https://doi.org/10.1145/3543507.3583434
[19] Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian
approach compared to traditional methods. LLMs enhance trans- McAuley, and Wayne Xin Zhao. 2024. Large Language Models are Zero-Shot
parency and interactivity, reshaping user experiences dynamically. Rankers for Recommender Systems. arXiv:2305.08845 [cs.IR]
The study delves into fine-tuning LLMs for optimized personalized [20] Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. 2023.
Recommender AI Agent: Integrating Large Language Models for Interactive Rec-
content suggestions and addresses evaluating LLM models in rec- ommendations. arXiv:2308.16505 [cs.IR]
ommendations, providing guidance for researchers and practition- [21] Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. 2023.
ers. Lastly, it touches upon the intricate ranking process in LLM- Recommender AI Agent: Integrating Large Language Models for Interactive Rec-
ommendations. arXiv:2308.16505 [cs.IR]
driven recommendation systems, offering valuable insights for de- [22] Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan,
signers and developers aiming to leverage LLMs effectively. This and Yongfeng Zhang. 2023. GenRec: Large Language Model for Generative
Recommendation. arXiv:2307.00457 [cs.IR]
comprehensive exploration not only underscores their current im- [23] Mingyu Jin, Qinkai Yu, Chong Zhang, Dong Shu, Suiyuan Zhu, Mengnan Du,
pact but also lays the groundwork for future advancements in rec- Yongfeng Zhang, and Yanda Meng. 2024. Health-LLM: Personalized Retrieval-
ommender systems. Augmented Disease Prediction Model. arXiv:2402.00746 [cs.CL]
[24] Wang-Cheng Kang, Jianmo Ni, and Nikhil Mehta. 2023. Do LLMs Un-
derstand User Preferences? Evaluating LLMs On User Rating Prediction.
arXiv:2305.06474 [cs.IR]
[25] Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, and Xing Xie. 2023.
References RecExplainer: Aligning Large Language Models for Recommendation Model In-
[1] Jinheon Baek, Nirupama Chandrasekaran, Silviu Cucerzan, Allen herring, and terpretability. arXiv:2311.10947 [cs.IR]
Sujay Kumar Jauhar. 2023. Knowledge-Augmented Large Language Models for [26] Lei Li, Yongfeng Zhang, and Li Chen. 2023. Personalized Prompt Learning for
Personalized Contextual Query Suggestion. arXiv:2311.06318 [cs.IR] Explainable Recommendation. arXiv:2202.07371 [cs.IR]
[2] Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xi- [27] Lei Li, Yongfeng Zhang, and Li Chen. 2023. Prompt Distillation for Efficient
angnan He. 2023. TALLRec: An Effective and Efficient Tuning Frame- LLM-based Recommendation. Association for Computing Machinery, New York,
work to Align Large Language Model with Recommendation. In Pro- NY, USA. https://doi.org/10.1145/3583780.3615017
ceedings of the 17th ACM Conference on Recommender Systems. ACM. [28] Xinhang Li, Chong Chen, Xiangyu Zhao, Yong Zhang, and Chunxiao Xing. 2023.
https://doi.org/10.1145/3604915.3608857 E4SRec: An Elegant Effective Efficient Extensible Solution of Large Language
[3] Léa Briand, Théo Bontempelli, and Walid Bendada. 2024. Let’s Get Models for Sequential Recommendation. arXiv:2312.02443 [cs.IR]
It Started: Fostering the Discoverability of New Releases on Deezer. [29] Xinyi Li, Yongfeng Zhang, and Edward C. Malthouse. 2023. PBNR: Prompt-
arXiv:2401.02827 [cs.IR] based News Recommender System. arXiv:2304.07862 [cs.IR]
[4] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- [30] Zelong Li, Jianchao Ji, Yingqiang Ge, Wenyue Hua, and Yongfeng Zhang.
plan, and Prafulla Dhariwal. 2020. Language Models are Few-Shot Learners. 2024. PAP-REC: Personalized Automatic Prompt for Recommendation Lan-
arXiv:2005.14165 [cs.CL] guage Model. arXiv:2402.00284 [cs.IR]
[5] Diego Carraro and Derek Bridge. 2024. Enhancing Recommendation Diversity [31] Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, and Xiang
by Re-ranking with Large Language Models. arXiv:2401.11506 [cs.IR] Wang. 2023. LLaRA: Aligning Large Language Models with Sequential Recom-
[6] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav menders. arXiv:2312.02445 [cs.IR]
Mishra, Adam Roberts, and Paul Barham. 2022. PaLM: Scaling Language Mod- [32] Xinyu Lin, Wenjie Wang, Yongqi Li, Fuli Feng, See-Kiong Ng, and Tat-Seng
eling with Pathways. arXiv:2204.02311 [cs.CL] Chua. 2023. A Multi-facet Paradigm to Bridge Large Language Model and Rec-
[7] Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. M6- ommendation. arXiv:2310.06491 [cs.IR]
Rec: Generative Pretrained Language Models are Open-Ended Recommender [33] Xinyu Lin, Wenjie Wang, Yongqi Li, Shuo Yang, Fuli Feng, Yinwei Wei, and Tat-
Systems. arXiv:2205.08084 [cs.IR] Seng Chua. 2024. Data-efficient Fine-tuning for LLM-based Recommendation.
[8] Shangfeng Dai, Haobin Lin, Zhichen Zhao, Jianying Lin, Honghuan Wu, Zhe arXiv:2401.17197 [cs.IR]
Wang, Sen Yang, and Ji Liu. 2021. POSO: Personalized Cold Start Modules for [34] Qijiong Liu, Nuo Chen, Tetsuya Sakai, and Xiao-Ming Wu. 2023. ONCE: Boost-
Large-scale Recommender Systems. arXiv:2108.04690 [cs.IR] ing Content-based Recommendation with Both Open- and Closed-source Large
[9] Dario Di Palma. 2023. Retrieval-augmented Recommender System: Language Models. arXiv:2305.06566 [cs.IR]
Enhancing Recommender Systems with Large Language Models (Rec- [35] Yue Liu, Shihao Zhu, Jun Xia, Yingwei Ma, Jian Ma, Wenliang Zhong, Guannan
Sys ’23). Association for Computing Machinery, New York, NY, USA. Zhang, Kejun Zhang, and Xinwang Liu. 2024. End-to-end Learnable Clustering
https://doi.org/10.1145/3604915.3608889 for Intent Learning in Recommendation. arXiv:2401.05975 [cs.IR]
[10] Sumanth Doddapaneni, Krishna Sayana, Ambarish Jash, Sukhdeep Sodhi, and [36] Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Chankyu Lee, Mohammad Shoeybi,
Dima Kuzmin. 2024. User Embedding Model for Personalized Language Prompt- and Bryan Catanzaro. 2024. ChatQA: Building GPT-4 Level Conversational QA
ing. arXiv:2401.04858 [cs.CL] Models. arXiv:2401.10225 [cs.CL]
[11] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu [37] Hanjia Lyu, Song Jiang, Hanqing Zeng, Qifan Wang, Si Zhang, Ren Chen,
Sun, Jingjing Xu, Lei Li, and Zhifang Sui. 2023. A Survey on In-context Learn- Chris Leung, Jiajie Tang, Yinglong Xia, and Jiebo Luo. 2023. LLM-
ing. arXiv:2301.00234 [cs.CL] Rec: Personalized Recommendation via Prompting Large Language Models.
[12] Yingpeng Du, Di Luo, Rui Yan, Hongzhi Liu, Yang Song, Hengshu Zhu, and Jie arXiv:2307.15780 [cs.CL]
Zhang. 2023. Enhancing Job Recommendation through LLM-based Generative [38] Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, and
Adversarial Networks. arXiv:2307.10747 [cs.IR] Zhanhui Kang. 2024. Plug-in Diffusion Model for Sequential Recommendation.
[13] Wenqi Fan, Zihuai Zhao, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi arXiv:2401.02913 [cs.IR]
Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, and Qing Li. [39] Sheshera Mysore, Andrew McCallum, and Hamed Zamani. 2023.
2023. Recommender Systems in the Era of Large Language Models (LLMs). Large Language Model Augmented Narrative Driven Recommendations.
arXiv:2307.02046 [cs.IR] arXiv:2306.02250 [cs.IR]
[14] Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, [40] Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao.
Kun Gai, and Fei Sun. 2023. A Large Language Model Enhanced Conversational 2023. Instruction Tuning with GPT-4. arXiv:2304.03277 [cs.CL]
Recommender System. arXiv:2308.06212 [cs.IR] [41] Aleksandr V. Petrov and Craig Macdonald. 2023. Generative Sequential Recom-
[15] Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, and Hakim. 2023. mendation with GPTRec. arXiv:2306.11114 [cs.IR]
Leveraging Large Language Models in Conversational Recommender Systems. [42] Junyan Qiu, Haitao Wang, Zhaolin Hong, Yiping Yang, Qiang Liu, and Xingxing
arXiv:2305.07961 [cs.IR] Wang. 2023. ControlRec: Bridging the Semantic Gap between Language Model
[16] Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei and Personalized Recommendation. arXiv:2311.16441 [cs.IR]
Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented [43] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Im-
Recommender System. arXiv:2303.14524 [cs.IR] proving language understanding by generative pre-training. (2018).
[17] Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. [44] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
2023. Recommendation as Language Processing (RLP): A Unified Pretrain, Per- Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. Explor-
sonalized Prompt and Predict Paradigm (P5). arXiv:2203.13366 [cs.IR] ing the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
February, 2024 Arpita Vats, Vinija Jain* , Rahul Raja, and Aman Chadha*

arXiv:1910.10683 [cs.LG] [64] Lanling Xu, Junjie Zhang, Bingqian Li, Jinpeng Wang, Mingchen Cai,
[45] Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Wayne Xin Zhao, and Ji-Rong Wen. 2024. Prompting Large Language Models
Yin, and Chao Huang. 2024. Representation Learning with Large Language Mod- for Recommender Systems: A Comprehensive Framework and Empirical Analy-
els for Recommendation. arXiv:2310.15950 [cs.IR] sis. arXiv:2401.04997 [cs.IR]
[46] Kyle Dylan Spurlock, Cagla Acun, Esin Saka, and Olfa Nasraoui. 2024. ChatGPT [65] Fan Yang, Zheng Chen, Ziyan Jiang, Eunah Cho, Xiaojiang Huang, and Yan-
for Conversational Recommendation: Refining Recommendations by Reprompt- bin Lu. 2023. PALR: Personalization Aware LLMs for Recommendation.
ing with Feedback. arXiv:2401.03605 [cs.IR] arXiv:2305.07622 [cs.IR]
[47] Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, [66] Zhengyi Yang, Jiancan Wu, Yanchen Luo, Jizhi Zhang, Yancheng Yuan, An
Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good Zhang, Xiang Wang, and Xiangnan He. 2023. Large Language Model Can In-
at Search? Investigating Large Language Models as Re-Ranking Agents. terpret Latent Space of Sequential Recommender. arXiv:2310.20487 [cs.IR]
arXiv:2304.09542 [cs.CL] [67] Jing Yao, Wei Xu, Jianxun Lian, Xiting Wang, Xiaoyuan Yi, and Xing Xie. 2023.
[48] Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
arXiv:1806.03277 [cs.IR] Recommendations. arXiv:2311.10779 [cs.IR]
[49] Zuoli Tang, Zhaoxin Huan, Zihao Li, Xiaolu Zhang, Jun Hu, Chilin Fu, Jun Zhou, [68] Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, and
and Chenliang Li. 2023. One Model for All: Large Language Models are Domain- Even Oldridge. 2023. LlamaRec: Two-Stage Recommendation using Large Lan-
Agnostic Recommendation Systems. arXiv:2310.14304 [cs.IR] guage Models for Ranking. arXiv:2311.02089 [cs.IR]
[50] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kul- [69] An Zhang, Leheng Sheng, Yuxin Chen, Hao Li, Yang Deng, Xiang Wang,
shreshtha, Heng-Tze Cheng, and Alicia Jin. 2022. LaMDA: Language Models and Tat-Seng Chua. 2023. On Generative Agents in Recommendation.
for Dialog Applications. arXiv:2201.08239 [cs.CL] arXiv:2310.10108 [cs.IR]
[51] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- [70] Jizhi Zhang, Keqin Bao, Yang Zhang, Wenjie Wang, Fuli Feng, and Xi-
mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- angnan He. 2023. Is ChatGPT Fair for Recommendation? Evaluating
ale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cu- Fairness in Large Language Model Recommendation. In Proceedings of
curull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cyn- the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM.
thia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, https://doi.org/10.1145/3604915.3608860
Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel [71] Junjie Zhang, Ruobing Xie, Yupeng Hou, Wayne Xin Zhao, Leyu Lin, and Ji-
Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Rong Wen. 2023. Recommendation as Instruction Following: A Large Language
Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Model Empowered Recommendation Approach. arXiv:2305.07001 [cs.IR]
Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, [72] Longhui Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang,
Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, and Min Zhang. 2024. TSRankLLM: A Two-Stage Adaptation of LLMs for Text
Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Ranking. arXiv:2311.16720 [cs.IR]
Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, [73] Wenxuan Zhang, Hongzhi Liu, Yingpeng Du, Chen Zhu, Yang Song, Heng-
Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Ro- shu Zhu, and Zhonghai Wu. 2023. Bridging the Information Gap Between
driguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Domain-Specific Model and General LLM for Personalized Recommendation.
Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL] arXiv:2311.03778 [cs.IR]
[52] Sahil Verma, Ashudeep Singh, Varich Boonsanong, John P. Dickerson, and Chi- [74] Yang Zhang, Fuli Feng, Jizhi Zhang, Keqin Bao, Qifan Wang, and Xiangnan
rag Shah. 2023. RecRec: Algorithmic Recourse for Recommender Systems. He. 2023. CoLLM: Integrating Collaborative Embeddings into Large Language
In Proceedings of the 32nd ACM International Conference on Information and Models for Recommendation. arXiv:2310.19488 [cs.IR]
Knowledge Management. ACM. https://doi.org/10.1145/3583780.3615181 [75] Zhi Zheng, Zhaopeng Qiu, Xiao Hu, Likang Wu, Hengshu Zhu, and Hui
[53] Lei Wang, Jingsen Zhang, Hao Yang, Zhiyuan Chen, Jiakai Tang, Zeyu Zhang, Xiong. 2023. Generative Job Recommendations with Large Language Model.
Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, arXiv:2307.02157 [cs.IR]
Jun Wang, and Ji-Rong Wen. 2023. When Large Language Model based [76] Yaochen Zhu, Liang Wu, Qi Guo, Liangjie Hong, and Jundong Li.
Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm. 2023. Collaborative Large Language Model for Recommender Systems.
arXiv:2306.02552 [cs.IR] arXiv:2311.01343 [cs.IR]
[54] Lei Wang, Songheng Zhang, Yun Wang, Ee-Peng Lim, and Yong Wang. [77] Yutao Zhu, Peitian Zhang, Chenghao Zhang, Yifei Chen, Binyu Xie,
2023. LLM4Vis: Explainable Visualization Recommendation using ChatGPT. Zhicheng Dou, Zheng Liu, and Ji-Rong Wen. 2024. INTERS: Unlocking
arXiv:2310.07652 [cs.HC] the Power of Large Language Models in Search with Instruction Tuning.
[55] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. arXiv:2401.06532 [cs.CL]
Neural Graph Collaborative Filtering. In Proceedings of the 42nd International
ACM SIGIR Conference on Research and Development in Information Retrieval.
ACM. https://doi.org/10.1145/3331184.3331267
[56] Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Jingyuan Wang, and Ji-Rong Wen.
2023. Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models. arXiv:2305.13112 [cs.CL]
[57] Yan Wang, Zhixuan Chu, Xin Ouyang, Simeng Wang, Hongyan Hao, and Yue.
2024. Enhancing Recommender Systems with Large Language Model Reasoning
Graphs. arXiv:2308.10835 [cs.IR]
[58] Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eu-
nah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, and Yingzhen Yang.
2023. RecMind: Large Language Model Powered Agent For Recommendation.
arXiv:2308.14296 [cs.IR]
[59] Yu Wang, Zhiwei Liu, Jianguo Zhang, Weiran Yao, Shelby Heinecke, and
Philip S. Yu. 2023. DRDT: Dynamic Reflection with Divergent Thinking for
LLM-based Sequential Recommendation. arXiv:2312.11336 [cs.IR]
[60] Zifeng Wang, Chufan Gao, Cao Xiao, and Jimeng Sun. 2023. MediTab: Scal-
ing Medical Tabular Data Predictors via Data Consolidation, Enrichment, and
Refinement. arXiv:2305.12081 [cs.LG]
[61] Yunjia Xi, Weiwen Liu, Jianghao Lin, and Xiaoling Cai. 2023. Towards Open-
World Recommendation with Knowledge Augmentation from Large Language
Models. arXiv:2306.10933 [cs.IR]
[62] Qiang Charles Xiao, Ajith Muralidharan, Birjodh Tiwana, Johnson Jia, Fe-
dor Borisyuk, Aman Gupta, and Dawn Woodard. 2024. MultiSlot ReRanker:
A Generic Model-based Re-Ranking Framework in Recommendation Systems.
arXiv:2401.06293 [cs.AI]
[63] Youshao Xiao, Shangchun Zhao, Zhenglei Zhou, Zhaoxin Huan, Lin Ju,
Xiaolu Zhang, Lin Wang, and Jun Zhou. 2024. G-Meta: Distributed
Meta Learning in GPU Clusters for Large-Scale Recommender Systems.
arXiv:2401.04338 [cs.LG]

You might also like