Knowledge Graphs for Enhanced Recommendations
Knowledge Graphs for Enhanced Recommendations
Taicheng Guo1 * , Chaochun Liu2 , Hai Wang2 , Varun Mannam2 , Fang Wang2 , Xin Chen2 ,
Xiangliang Zhang1 , Chandan K. Reddy2
1
University of Notre Dame 2 Amazon
{tguo2, xzhang33}@nd.edu, {chchliu, ihaiwang, mannamvs, fwfang, xcaa, ckreddy}@amazon.com
arXiv:2410.19627v1 [cs.AI] 25 Oct 2024
Abstract process, previous work has only considered adding short, ba-
sic descriptions of users and items as prompts during the in-
Language agents have recently been used to simulate human teractions. As a result, the updated memory for user agents
behavior and user-item interactions for recommendation sys-
is nonspecific and general due to a lack of rational informa-
tems. However, current language agent simulations do not un-
derstand the relationships between users and items, leading to tion about the user’s choice (see Figure 1). With these insuf-
inaccurate user profiles and ineffective recommendations. In ficient memory profiles, LLMs struggle to identify precise
this work, we explore the utility of Knowledge Graphs (KGs), user preferences and may recommend irrelevant items. In
which contain extensive and reliable relationships between this work, we analyze the reason for insufficient user agent
users and items, for recommendation. Our key insight is that and item agent profiles, primarily due to the simulation rely-
the paths in a KG can capture complex relationships between ing solely on simple descriptions without rationalizing why
users and items, eliciting the underlying reasons for user pref- users like or dislike certain items. Consequently, most user
erences and enriching user profiles. Leveraging this insight, or item agent memories are generated by LLMs based on
we propose Knowledge Graph Enhanced Language Agents limited information. Such profiles heavily rely on the pre-
(KGLA), a framework that unifies language agents and KG
trained knowledge of LLMs, making them susceptible to
for recommendation systems. In the simulated recommenda-
tion scenario, we position the user and item within the KG generating generic profiles. User profiling is a critical task
and integrate KG paths as natural language descriptions into for recommender systems. Hence, addressing the question
the simulation. This allows language agents to interact with of ‘how to provide user agents with sufficient information
each other and discover sufficient rationale behind their in- during Agent simulation to obtain more rational and precise
teractions, making the simulation more accurate and aligned user profiles?’ remains a crucial but unresolved problem.
with real-world cases, thus improving recommendation per-
formance. Our experimental results show that KGLA signif-
To tackle this issue, we aim to leverage a Knowledge
icantly improves recommendation performance (with a 33%- Graph (KG) containing entities with specific meanings to
95% boost in NDCG@1 among three widely used bench- provide extensive rationales to enhance agent simulation
marks) compared to the previous best baseline method. for recommendation. For example, KG paths can provide
mentions
rationales for why a user may like an item (User −→
describe as
Introduction features −→ CD), thus helping to build better user and
item agent profiles. Most of the previous work on using KGs
Large Language Model (LLM) agents have demonstrated for recommendation represents the knowledge from a KG in
strong capabilities in conversation and decision-making the form of embeddings and trains a graph neural network
across various tasks (Xi et al. 2023; Guo et al. 2024, to obtain graph embeddings, which are then concatenated
2023a). By enabling multiple LLM agents to cooperate, each with user embeddings to represent user profiles. In contrast,
agent can have its profile, actions, and behaviors, simulat- since we use LLM Agents to simulate users and items when
ing diverse human behaviors. Recent works have employed the number of interactions is limited and the user or item
LLM Agents to simulate real-world recommendation sce- profiles are represented by text, the KG information should
narios (Zhang et al. 2024b). In these simulations, each agent also be represented by text to effectively influence the agent
represents a user or an item in the recommendation system, simulation process.
and each agent maintains a memory that records the user
or item’s profiles. The interactions between agents simu- Based on this motivation, we first identify the informa-
late user interactions with items, dynamically updating the tion from the KG required for recommendation. We for-
agents’ memories to reflect changes in user preferences or mulate this as a ‘path-to-text’ problem, where the user and
the recognition of unique item features. item can be regarded as nodes in a KG, and paths between
While LLM Agents can capture more explainable and ex- the user and item indicate the rationales for why the user
plicit user profiles through text during the recommendation chooses that item, thus helping to build more precise user
agent memory. There are a few previous works that focus on
* Work was done as an intern at Amazon. leveraging KGs for LLM agents (Jiang et al. 2024; Xu et al.
(a) Without KG (b) With KG
Figure 1: Examples of user agent memory generated (a) without KG and (b) with KG. The user agent memory generated without
KG only contains some general descriptions, while the memory generated with KG includes more specific terms (highlighted
in red), demonstrating more precise user preferences.
2024; Luo et al. 2023). However, to the best of our knowl- the recommendation process and is also applicable to var-
edge, most previous KG+Agent methods primarily focus on ious other KG+Agent simulation tasks.
the question-answering task, where given a question, the • We demonstrate the effectiveness of our method through
agent starts from a node in the KG and traverses the graph experiments on various public datasets. The results
to find the answer node deductively. Contrary to the previ- show that our method consistently outperforms baselines
ous works on KG-enhanced LLM agent, in our setting, we (achieving relative improvements of 95.34%, 33.24%,
know the start and destination entities (user and item, respec- and 40.79% in NDCG@1 on three widely used bench-
tively) in the KG, and we need the LLM agent to analyze marks, compared to the previous best baseline), and the
paths between the nodes and summarize the key rationales. agent memories are more precise than previous baselines.
To achieve this, we propose KGLA: Knowledge Graph En-
hanced Autonomous Language Agents, consisting of three
modules: Path Extraction, Path Translation, and Path Incor- Related Work
poration to extract, translate, and incorporate KG paths, re- LLM Agents
spectively, for providing faithful rationale information for
improving agent profiles in recommendation. Based on our Large language model-based agents are widely researched
proposed framework, we conduct an in-depth analysis of for various scenarios. These agents can take actions, interact
how our proposed method improves agent simulation for with the environment, and obtain feedback. The key capa-
recommendation, thereby enhancing recommendation per- bilities of LLM agents are: 1) Take Action (Yao et al. 2022):
formance. Prompting the agent to perform an action or make a deci-
To summarize, our key contributions in this work are: sion; 2) Reflection (Shinn, Labash, and Gopinath 2023): Af-
ter taking an action, the agent receives feedback from the
• To the best of our knowledge, we are the first to lever- environment, which allows it to reflect on the task and im-
age KGs and demonstrate the effectiveness of KGs in en- prove decision making in subsequent trials; and 3) Mem-
hancing agent profiles for recommendation. ory (Zhang et al. 2024c): Agents store the lessons learned
• We identify the challenges and propose a novel frame- from reflection. In our work, during the simulation stage, the
work to extract, translate, and incorporate KG informa- user agent first selects a preferred item from a list of items
tion into language agents-based recommendations. The (Taking Action). Given the ground-truth item, the agent then
KG-enhanced framework provides better explanations of reflects on its previous choice (Reflection). Subsequently, it
updates its profile to record its preference (Memory). In the Methodology
ranking stage, the user agent provides a preferred rank list Notation and Problem Definition
of items (Taking Action) based on the given list.
We will first introduce the required notations and define the
LLM Agents for Recommendation problem.
Due to the strong capabilities of LLM agents, they have suc- Recommender System. Let the user set be denoted by U ,
cessfully been employed in recommendation tasks. (Wang and the item set be denoted by I. Each user u ∈ U has an
et al. 2024) proposes two categories in agent-based recom- interaction history [i1 , i2 , . . . , in ]. We have a ground-truth
mendation: recommender-oriented and simulation-oriented. label y indicating whether the user likes an item (y = 1) or
The recommender-oriented approach aims to develop a rec- does not like it (y = 0).
ommender agent equipped with enhanced planning, reason-
ing, memory, and tool-using capabilities to assist in rec- Knowledge Graph. A KG is a structured representation
ommending items to users. RecMind (Wang et al. 2023b) of knowledge that contains abundant facts in the form of
and InteRecAgent (Huang et al. 2023) both develop sin- triples G = {(eh , r, et ) | eh , et ∈ E, r ∈ R}, where eh
gle agents with improved capabilities for recommendation. and et are head and tail entities, respectively, and r is the re-
RAH (Shu et al. 2023) and MACRec (Wang et al. 2024) sup- lation between them. E is the entities set in the KG and R is
port collaboration among different types of agents to recom- the relations set in the KG. A path in a KG is a sequence of
r1 r2 rl
mend items to users. triples: p = e0 −→ e1 −→ . . . −→ el , connecting the entity
In contrast, the simulation-oriented approach focuses on e0 to the entity el .
using agents to simulate individual user behaviors and
item characteristics. RecAgent (Wang et al. 2023a) and LLM Agent and LLM Recommendation. For the LLM
Agent4Rec (Zhang et al. 2024a) employ agents as user sim- agent, we denote its memory as M . During the simulation
ulators to simulate interactions between users and recom- stage, the reflection process for the agent is represented by
mender systems, investigating the plausibility of simulated the function Ref lection. In the ranking stage, given a user
user behaviors. AgentCF (Zhang et al. 2024b) explores sim- and a list of items, the LLM used to generate the ranked list
ulating user-item interactions using user agents and item is represented by the function LLM .
agents. The memories of user agents and item agents are dy-
namically updated during the simulation, and the final mem- Overall Framework
ory is used to recommend items to users. The architecture of our proposed framework (see Figure 2
Our work falls under the category of simulation-oriented and Algorithm 1) has three stages: Initialization, Simula-
approaches. Previous simulation work only focused on us- tion and Ranking. It is designed to combine LLM agents
ing agent memory to simulate user profiles for the recom- with KGs to enhance agent memory during simulation and
mendation, neglecting the construction of high-quality user improve recommendation performance during ranking. In
agent memories. Poor user agent memory can impair rec- the initialization stage, the memories of all users are set us-
ommendation performance. Our work leverages KGs to in- ing the template “I enjoy ...,” while item memories are ini-
corporate comprehensive rationale information into the en- tialized with the titles and categories of the items.
tire process of LLM Agents-based recommendation, thereby The simulation stage consists of two phases: Autonomous
building well-grounded and precise user agent memories. Interaction and Reflection. Given a user u with a chrono-
logical sequence of behavioral interactions [i1 , i2 , . . . , in ],
Knowledge Graph and LLM Agents the simulation stage aims to optimize the memories repre-
With the latest advances in LLMs, they are utilized in syn- senting the user and item profiles by simulating real-world
ergy with KGs to provide accurate information. Most previ- user-item interactions. At the beginning of the simulation,
ous works (Jiang et al. 2024; Xu et al. 2024; Luo et al. 2023) we initialize the user and item agent memories Mu and Mi ,
focus on synergizing LLM and KGs for question-answering using basic properties. Then similar to most sequential rec-
tasks. The synergizing paradigm is deductive - Given a ques- ommendation settings (Wang et al. 2019; Guo et al. 2023b),
tion, LLMs identify the starting entity node and then act for each user u, we use items [i1 , i2 , . . . , in−1 ], except the
as planners to generate relation paths, traversing the KG to last item in of the behavior sequences, as positive items for
reach potential entities that can answer the question. How- simulation. At each step of the interactions between u and
ever, the synergizing paradigm between LLMs and KGs in ij ∈ [i1 , i2 , . . . , in−1 ], we randomly sample a negative item
recommendation systems is inductive and differs from previ- i− with high popularity among all items I to help the user
ous methods. In this context, the starting entity node (user) agent refine their profiles better through comparison. We use
and the target entity node (item) are already known within the last item in as the ground-truth item in the ranking stage
the KG. LLMs are responsible for analyzing all the paths for testing.
between these two nodes and summarizing the key reasons For each user u and item i, we first position these nodes in
why the user node can reach the target node. These ratio- the KG. Then, we use our Path Extraction module to extract
nales are then used to update the user agent’s memory. In all 2-hop paths P2ui = u → e1 → i and 3-hop paths P3ui =
contrast to previous work, our approach is the first to lever- u → e1 → e2 → i from the KG. Based on the extracted
age LLM agents to inductively analyze KG paths for reflec- paths, we apply our Path Translator module to convert P2ui
tion, thereby enhancing recommendation performance. to text T2ui and P3ui to text T3ui . Finally, we apply our Path
Figure 2: The framework of our proposed KG-enhanced Agent Simulation for recommendation. Given (u, i+ , i− ), our frame-
work can retrieve and translate KG information which would be incorporated into the Simulation, guiding LLM agents to
analyze the possible reasons for user choices based on KG, summarize the user’s precise preferences, and update the user
agent’s memory.
Incorporation module to incorporate T2ui and T3ui into the tionale behind why u chooses or ignores item i. These ratio-
LLM Agents’ simulation and ranking process. nales can provide detailed preference information that helps
In the Autonomous Interaction phase, we ask the user the user agent build a more accurate and comprehensive pro-
agent to choose an item from i+ and i− based on the current file. In G, paths between two nodes can explain why one
user and item memories Mu , Mi , and Mi− , as well as the node has a relation to another node, which can actually help
−
2-hop relations between the user and items T2ui and T2ui . explain why the user node u chooses or does not choose the
After each interaction, the user agent selects an item iselect item node i in G. Hence, our first task is to extract paths
and we also ask the user agent to provide explanations yexp as additional contexts for the following prompt. To achieve
for this choice. To optimize the user agent and item agents, this, we propose the following path extraction procedure: for
we derive the feedback signal by comparing the user agent’s each user u and item i, we first extract the 2-hop knowledge
chosen item with the actual interacted item. In the Reflec- path set P2ui and 3-hop path set P3ui , where P2 contains all
tion phase, we inform the user agent whether the previous 2-hop paths and P3 contains all 3-hop paths from u to i.
choice was correct or incorrect. Simultaneously, we incorpo-
rate the knowledge graph information T2ui and T3ui to enable Path Translation: Expressing 2-hop Paths via Text
the user agents to analyze the reasons behind their choices
abductively and summarize key factors to update their mem- All 2-hop and 3-hop paths extracted from KG are repre-
ory. We also update the memory of item agents based on the sented as triples or quaternions. Since the input to LLM is in
information from KG paths. text format, our objective is to express these 2-hop and 3-hop
paths in a textual form that LLM can understand. We iden-
In the ranking stage, we have user and item agents repre-
tify two critical challenges in this translation process: De-
senting real-world user and item profiles. We focus on rank-
scription Simplification and easily comprehensible to LLM.
ing the candidate items given a user u and a set of candidates
{c1 , ..., cn }, the user agent’s memory Mu , the memories of Description Simplification refers to the need to simplify
all candidates {Mc1 , ..., Mcn }, and 2-hop KG information the textual representation of the extracted paths, as the num-
for each candidates. R is the final ranking list. In the follow- ber of 2-hop and 3-hop paths can be large. If we do not
ing sections, we will describe our detailed design to incorpo- shorten the text, its length may exceed the token limit of
rate KG information with LLM agents for recommendation. the LLM. Furthermore, LLMs may struggle to capture im-
portant factors from longer contexts (Liu et al. 2024). Eas-
ily comprehensible to LLM refers to the need for presenting
Knowledge Graph Path Extraction the path information in a way that the LLM can easily com-
This module aims to extract useful information from the prehend. While directly adding all 2-hop path triples to the
knowledge graph G for building better user and item pro- prompt is one approach (Shu et al. 2024), this method neces-
files. In recommendation systems, given a user u and his sitates introducing the entire KG to LLM, which is feasible
interaction history, the critical information needed is the ra- only for very small graphs. Additionally, LLMs often strug-
Algorithm 1: KG-enhanced Recommendation with edge type r1 = mentions, r2 = describe as, we de-
Require: Knowledge Graph G, user u, training samples for scribe it as “User mentions features e1 , e2 , ..., en which are
u: described as this item”. This approach reduces the length
{(u, i+ − + − of the 2-hop path information and explicitly prompts the
1 , i1 ), . . . , (u, in , in )}
LLM to perform better reasoning by highlighting the rela-
where i+ is a positive item and i− is a negative item tionships.
sampled during autonomous interaction. Testing sam-
ples for u:
Algorithm 2: EXP-2HOP (G, u, i)
{c1 , . . . , cn }
Functions EXP-2HOP and EXP-3HOP which express Require: Knowledge Graph G, user u and item i. Function
2-hop and 3-hop paths between u and i for recommen- FIND-2HOP(G, u, i) which retrieves all 2-hop paths
dation. between u and i
Ensure: Recommend rank list R to user u. Ensure: Text T2ui containing natural language descriptions
1: Stage 1: Initialization
of 2-hop paths between u and i
1: P2ui ← FIND-2HOP(G, u, i)
2: Initialize user memory Mu and item memory Mi for
2: Group paths by relation types (r1 , r2 ) in P2ui to dictio-
each item.
3: Stage 2: Simulation
nary D[(r1 , r2 )] where keys are relation pairs and values
4: for each (i+ − are lists of entities
k , ik ) in training samples do 3: for each (r1 , r2 ) in D do
+ +
5: T2ui , T3ui ← EXP-2HOP(G, u, i+ ), EXP-3HOP 4: Formulate sentence: “The user { r1 } D[(r1 , r2 )]
−
(G, u, i+ , {(u, i+ + −
1 , i1 ), . . . , (u, in , in )}) (list of entities), which are { r2 } by the item.”
ui− ui− Append sentence to T2ui
6: T2 , T3 ← EXP-2HOP(G, u, i− ), EXP-3HOP 5:
−
(G, u, i− , {(u, i+ + −
1 , i1 ), . . . , (u, in , in )})
6: end for
7: // Autonomous Interaction 7: return T2ui
+ −
8: iselect , yexp ← fLLM (Mu , Mi+ , Mi− , T2ui , T2ui )
9: // Reflection
10: Mu ← Reflection(iselect , yexp , Mi+ , Mi− ,
+ − + − Path Translation: Expressing 3-hop Paths via Text
T2ui , T2ui , T3ui , T3ui )
11: Mi+ , Mi− ← Reflection(iselect , yexp , Mi+ , − + −
+ − + − Algorithm 3: EXP-3HOP(G, u, i, {(u, i+
1 , i1 ), . . . , (u, in , in )})
Mi− , T2ui , T2ui , T3ui , T3ui )
12: end for Require: Knowledge Graph G, user u and item i, train-
−
13: Stage 3: Ranking ing samples for u: {(u, i+ + −
1 , i1 ), . . . , (u, in , in )}, where
14: {T2c1 , . . . , T2cn } ← {EXP-2HOP(u, c1 ), . . . , + −
i is a positive item and i is a negative item
EXP-2HOP(u, cn )} sampled during autonomous interaction. Functions
15: R ← fLLM (Mu , {Mc1 , . . . , Mcn }, {T2c1 , . . . , T2cn }) FIND-3HOP(G, u, i) which retrieves all 3-hop paths
16: Recommend rank list R to user u. between u and i, and GET-DESC(P ) which returns en-
tities except u and i involved in path P
Ensure: Text T3ui containing natural language descriptions
of 3-hop paths between u and i
gle to effectively process and analyze these direct triples.
1: Initialize: positive entity set E + ← ∅, negative entity
In recommendation scenarios, we require the LLM to act
set E − ← ∅, non-informative entity set Su ← ∅
as an analyzer, examining the potential critical reasons be- −
2: for each (i+k , ik ) in training samples do
hind user choices based on the relationships between users
and items. Thus, we aim to translate the triples into more 3: E ← E ∪ GET-DESC(FIND-3HOP(G, u, i+
+ +
k ))
natural, human-like language to improve the LLM agent’s 4: E − ← E − ∪ GET-DESC(FIND-3HOP(G, u, i− k ))
understanding of their underlying meaning. 5: end for
6: Su ← E + ∩ E − ▷ Identify non-informative common
Algorithm 2 provides the details of the expression of 2-
hops via text, and some examples of converting 2-hop and entities between positive and negative items.
7: T3ui ← GET-DESC(FIND-3HOP(G, u, i))
3-hop to text are shown in Figure 2. We first group all paths
8: Remove non-informative entities: T3ui ← T3ui \ Su
for each user-item pair by the overall edge type of the paths.
9: return T3ui
For each edge type (r1 , r2 ), we have a subset of paths asso-
ciated with it. Since the start entity (user) and the end entity
(item) are the same within each subset, we merge all paths by Simplifying the description and making it simpler for the
concatenating the second entities as a list. This approach re- LLM to understand becomes more challenging for 3-hop
duces the number of 2-hop paths while still representing the paths because the number and relations of 3-hop paths are
relevant information. To make the text better understandable much larger compared to 2-hop paths. Unlike emphasizing
for the LLM, we describe the merged formulas by emphasiz- the relations of 2-hop paths, the objective of using 3-hop
ing the relationships between the user and the item. For ex- paths is to incorporate more descriptive information for up-
r1 r2
ample, given a merged paths set u −→ (e1 , e2 , ..., en ) −→ i dating agents’ memories in the Reflection stage.
Algorithm 3 provides the details of expressing 3-hops settings (Zhang et al. 2024b), we sample dense recommen-
through text. The number of 3-hop paths is too large, as dation subsets, where most users have rated a large portion
shown in Table 2. In the Autonomous Interaction and Re- of items. Specifically, we sample 100 users along with their
flection phase for the simulation stage, where the user needs corresponding items from each dataset for experiments. The
to select one item among the positive and negative items, dataset statistics are presented in Table 2. The Knowledge
the motivation for adding KG information is to incorporate Graph comprises five types of entities:
discriminative factors between these two items for the user
agent. Therefore, we only need the discriminative features • User: User in recommender system
between the user’s positive item pair and the user’s negative • Item: Product to be recommended to users
item pair. Hence, we first construct a non-informative entity
set Su for each user u. Specifically, entities that appear in • Feature: A product feature word from reviews
both positive items and negative items indicate that they do • Brand: Brand or manufacturer of the product
not provide useful information for distinguishing user pref-
erences, and thus they will be added to this set. Then, for • Category: Category of the product
user u and item i, we extract the 3-hop paths from the KG, and eight kinds of relations:
extract the descriptive entities, and filter based on Su . The
purchase
descriptive entities are pre-defined. For example, if a KG • U ser −−−−−−→ Item
contains four types of entities: user U , item I, category C, mention
and words W , we only select category C and words W as • U ser −−−−−→ F eature
descriptive entities. Finally, we obtain the filtered descrip- describe as
• Item −−−−−−−→ F eature
tive entities T3ui which may indicate the user’s potential ra-
belong to
tionales for their choice. • Item −−−−−−→ Category
produced by
Path Incorporation: Incorporating Text • Item −−−−−−−−→ Brand
After obtaining 2-hop and 3-hop text descriptions from the also bought
• Item −−−−−−−→ anotherItem
KG G, we need to incorporate these descriptions into the also viewed
overall framework. As shown in Algorithm 1, we incorpo- • Item −−−−−−−→ anotherItem
rate translated 2-hop KG text into the Autonomous Interac- bought together by same user
tion stage (line 8), 2-hop and 3-hop KG text into the Re- • Item −−−−−−−−−−−−−−−−−−−→ anotherItem
flection stage (lines 10 and 11), and 2-hop KG text into the
Ranking stage (line 14). Evaluation metrics. Similar to previous studies, we used
NDCG@K as an evaluation metric for comparing different
Experiments methods, where K is set to 1, 5, and 10. We considered the
items excluding the last item of the user behavior sequences
Experiment Settings as the training data for the simulation stage in the recom-
We conducted extensive experiments to address the follow- mendation, and the last item as the ground-truth item. To
ing research questions (RQs): mitigate the randomness of the LLM, each experiment was
• RQ1: Does incorporating KG information enhance the repeated three times, and the mean and standard deviation
recommendation performance? were reported.
• RQ2: How much do different types of KG information
contribute to the overall performance? Baselines
• RQ3: How does KG information influence agent memory We compared the proposed model with the following cate-
(user profiles) in simulation? gories of baseline methods available in the literature.
• RQ4: How does the enhanced agent’s memory influence
the ranking? Conventional recommendation methods. We included
BPR (Bayesian Personalized Ranking) (Rendle et al. 2012)
• RQ5: How much does our method reduce the input word
which uses matrix factorization to learn the potential repre-
count for the LLM?
sentations of users and items and performs recommending
Datasets. Following previous works, we conducted our based on these representations; Pop (Popularity-based rec-
experiments on three datasets containing KG, including ommendation) (Ji et al. 2020) which ranks candidates based
the CDs, Clothing, and Beauty which comprises product on their popularity; and BM25 (Best Matching 25) (Robert-
review data from existing recommendation benchmarking son and Zaragoza 2009) which ranks candidates based on
datasets (McAuley et al. 2015). Original data sets and their textual similarity to the user’s past interactions.
Knowledge Graph (KG) information are sourced from (Xian
et al. 2019)1 . To reduce the impact of expensive API calls Deep-learning based recommendation methods. We
and facilitate effective simulations, consistent with previous used SASRec (Kang and McAuley 2018) which captures the
sequential patterns of the users’ historical interactions utiliz-
1
https://github.com/orcax/PGPR ing a transformer-encoder.
CDs Clothing Beauty
Method
NDCG@1 NDCG@5 NDCG@10 NDCG@1 NDCG@5 NDCG@10 NDCG@1 NDCG@5 NDCG@10
BPR 0.083±0.021 0.278±0.019 0.441±0.018 0.110±0.035 0.307±0.021 0.462±0.020 0.113±0.031 0.313±0.032 0.468±0.021
Conventional
Pop 0.140±0.000 0.346±0.000 0.493±0.000 0.050±0.000 0.227±0.000 0.407±0.000 0.170±0.000 0.359±0.000 0.500±0.000
Rec.
BM25 0.050±0.000 0.318±0.000 0.451±0.000 0.130±0.000 0.333±0.000 0.476±0.000 0.180±0.000 0.379±0.000 0.523±0.000
Deep-learning Rec. SASRec 0.163±0.025 0.346±0.019 0.496±0.010 0.157±0.015 0.309±0.014 0.481±0.009 0.153±0.012 0.352±0.010 0.486±0.006
LLMRank 0.170±0.026 0.384±0.016 0.515±0.012 0.340±0.026 0.648±0.016 0.676±0.014 0.270±0.010 0.578±0.004 0.624±0.005
LLM-based
AgentCF 0.193±0.006 0.362±0.012 0.510±0.006 0.313±0.023 0.528±0.020 0.617±0.013 0.277±0.032 0.539±0.008 0.607±0.017
Rec.
KGLA (Ours) 0.377±0.006 0.637±0.009 0.675±0.005 0.453±0.021 0.699±0.016 0.732±0.010 0.390±0.010 0.655±0.003 0.691±0.003
Improvement over best baseline: 95.34% 65.89% 31.07% 33.24% 7.87% 8.28% 40.79% 13.32% 10.74%
Table 1: The overall performance comparison in terms of NDCG metric. The best result is in bold font, and the second best
result is underlined.
Table 2: Statistics of the sampled datasets and the related RQ2: How much do different types of KG
KG. #Avg. 2-hop denotes the average number of 2-hop paths information contribute to the overall performance?
while #Avg. 3-hop denotes the average number of 3-hop
paths. To further investigate the effect of different types of KG in-
formation on recommendation performance, we conducted
ablation studies on the CD dataset. The results are presented
LLM-based recommendation methods. We used LLM- in Table 3, which show that incorporating 2-hop and 3-hop
Rank (Hou et al. 2024) as a baseline, which leverages information from the KG can gradually improve the perfor-
LLMs as a zero-shot ranker to rank candidate items based mance metrics. This demonstrates that each module con-
on the user’s sequential interaction history. We compared tributes positively to the model and can complement each
our method with AgentCF (Zhang et al. 2024b) which also other, enabling the model to achieve better overall perfor-
builds an Agent Simulation framework to obtain agent mem- mance.
ories that are used for recommendation. Compared to their Method NDCG@1 NDCG@5 NDCG@10
model, our method adaptively designs strategies to synergize AgentCF (Baseline) 0.193±0.006 0.362±0.012 0.510±0.006
the Agent with the KG. KGLA (+KG 2-hop) 0.280±0.010 0.497±0.016 0.588±0.008
KGLA (+KG 3-hop) 0.257±0.029 0.518±0.011 0.596±0.012
Implementation details. We implemented all baseline KGLA
methods (Hou et al. 2024) except AgentCF using the open- (+KG 2-hop 0.377±0.006 0.637±0.009 0.675±0.005
source repository available at2 . For all Large Language + KG 3-hop)
Model (LLM)-based methods (LLMRank, AgentCF, and
our methods), we employ Claude3-Haiku-20240307 as the Table 3: The impact of 2-hop/3-hop KG paths information.
LLM. We set the maximum number of tokens (max tokens)
to 20,000. All other optional parameters are left at their de-
fault values: the temperature is 1, top p is 1, and top k is RQ3: How does KG information influence agent
250. memory (user profiles) in simulation?
RQ1: Does incorporating KG information enhance Our proposed methods aim to ensure that the text is easily
the recommendation performance? comprehensible for LLMs. In this experiment, to investigate
whether the LLM understands the incorporated text, we con-
The overall performance of the recommendation measured ducted a case study to evaluate the impact of the KG infor-
by NDCG@K is reported in Table 1. From the results, we mation on the agent’s memory during the Reflection stage.
have the following observations: To investigate further the main contribution of our method
1) Our method significantly outperforms all baselines in to the agent’s memory, we will provide a case study of our
all datasets. Compared to the previous best baseline, our method with KG. As shown in Figure 3, during simula-
methods obtained 95.3%, 44.7%, and 40.8% improvements tion, with positive and negative items, the LLM can effec-
in NDCG@1 on the three datasets. This shows that in- tively conclude the potential preferred 2-hop features includ-
corporating the KG information through our methods can ing “garden”, “sultry”, “chick” related to the user and help
significantly enhance the model’s ability to make accurate explore the expanded 3-hop features including “sensual” to
recommendations. enrich the user profiles.
2) The performance of all LLM-based recommendation
methods surpasses that of conventional and deep learn- RQ4: How does the enhanced agent’s memory
ing methods. This indicates that LLMs can improve the influence the ranking?
recommendation performance, especially in settings with
To investigate how the updated agent memory based on KG
2
https://github.com/RUCAIBox/LLMRank information influences the final recommendation, we show
Word Count CDs Clothing Beauty
# Avg. paths 13.38 15.56 20.36
# Avg. original w. 40.14 46.68 61.08
2-hop Paths
# Avg. words 15.38 17.56 22.36
Reduction Percentage 61.68% 62.38% 63.39%
# Avg. paths 312.89 213.87 337.29
# Avg. original w. 1564.45 1069.35 1686.45
3-hop paths
# Avg. words 62.73 18.23 25.46
Reduction Percentage 95.99% 98.30% 98.49%
References
Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla,
Figure 4: A case study for KG-enhanced ranking: Take two N. V.; Wiest, O.; and Zhang, X. 2024. Large language model
candidates as examples, candidate 1 shares more common based multi-agents: A survey of progress and challenges.
features with user agent memory, so the user agent can make arXiv preprint arXiv:2402.01680.
a correct choice in the ranking stage for recommendation. Guo, T.; Nan, B.; Liang, Z.; Guo, Z.; Chawla, N.; Wiest,
O.; Zhang, X.; et al. 2023a. What can large language mod-
els do in chemistry? a comprehensive benchmark on eight
the following case during the recommendation stage. As il- tasks. Advances in Neural Information Processing Systems,
lustrated in Figure 4, with the precise and extensive user 36: 59662–59688.
profiles and 2-hop relations, candidate 1 shares more com- Guo, T.; Yu, L.; Shihada, B.; and Zhang, X. 2023b. Few-
mon features with user agent memory, thus providing the shot news recommendation via cross-lingual transfer. In
user agent with more rationales to rank the candidates. Proceedings of the ACM Web Conference 2023, 1130–1140.
Hou, Y.; Zhang, J.; Lin, Z.; Lu, H.; Xie, R.; McAuley, J.;
RQ5: How much does our method reduce the input and Zhao, W. X. 2024. Large language models are zero-shot
word count for the LLM? rankers for recommender systems. In European Conference
A key objective of our proposed methods is to shorten the on Information Retrieval, 364–381. Springer.
description of KG. In this experiment, we compare the word Huang, X.; Lian, J.; Lei, Y.; Yao, J.; Lian, D.; and Xie, X.
count of the original paths with the text generated by our 2023. Recommender ai agent: Integrating large language
methods for 2-hop and 3-hop paths. As shown in Table 4, models for interactive recommendations. arXiv preprint
our methods achieve a substantial reduction in word count, arXiv:2308.16505.
with around 60% for 2-hop paths and 98% for 3-hop paths, Ji, Y.; Sun, A.; Zhang, J.; and Li, C. 2020. A re-visit of the
compared to the original descriptions of the path information popularity baseline in recommender systems. In Proceed-
across all datasets. ings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieval, 1749–
Conclusion 1752.
Using language agents for recommendation is a promis- Jiang, J.; Zhou, K.; Zhao, W. X.; Song, Y.; Zhu, C.; Zhu, H.;
ing yet challenging task. Previous works on this topic have and Wen, J.-R. 2024. Kg-agent: An efficient autonomous
primarily focused on utilizing agents to simulate the rec- agent framework for complex reasoning over knowledge
ommendation process, often neglecting the rationale be- graph. arXiv preprint arXiv:2402.11163.
hind recommendations, thus have struggled to discover user Kang, W.-C.; and McAuley, J. 2018. Self-attentive sequen-
preferences for recommendations. In this paper, we pro- tial recommendation. In 2018 IEEE international confer-
pose KGLA, the first framework that explores the syner- ence on data mining (ICDM), 197–206. IEEE.
Liu, N. F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan,
M.; Petroni, F.; and Liang, P. 2024. Lost in the middle: How K.; and Cao, Y. 2022. React: Synergizing reasoning and act-
language models use long contexts. Transactions of the As- ing in language models. arXiv preprint arXiv:2210.03629.
sociation for Computational Linguistics, 12: 157–173. Zhang, A.; Chen, Y.; Sheng, L.; Wang, X.; and Chua, T.-
Luo, L.; Li, Y.-F.; Haffari, G.; and Pan, S. 2023. Reasoning S. 2024a. On generative agents in recommendation. In
on graphs: Faithful and interpretable large language model Proceedings of the 47th International ACM SIGIR Con-
reasoning. arXiv preprint arXiv:2310.01061. ference on Research and Development in Information Re-
McAuley, J.; Targett, C.; Shi, Q.; and Van Den Hengel, A. trieval, 1807–1817.
2015. Image-based recommendations on styles and substi- Zhang, J.; Hou, Y.; Xie, R.; Sun, W.; McAuley, J.; Zhao,
tutes. In Proceedings of the 38th international ACM SIGIR W. X.; Lin, L.; and Wen, J.-R. 2024b. Agentcf: Collaborative
conference on research and development in information re- learning with autonomous language agents for recommender
trieval, 43–52. systems. In Proceedings of the ACM on Web Conference
Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt- 2024, 3679–3689.
Thieme, L. 2012. BPR: Bayesian personalized ranking from Zhang, Z.; Bo, X.; Ma, C.; Li, R.; Chen, X.; Dai, Q.; Zhu, J.;
implicit feedback. arXiv preprint arXiv:1205.2618. Dong, Z.; and Wen, J.-R. 2024c. A survey on the memory
Robertson, S.; and Zaragoza, H. 2009. The Probabilistic mechanism of large language model based agents. arXiv
Relevance Framework: BM25 and Beyond. Found. Trends preprint arXiv:2404.13501.
Inf. Retr., 3(4): 333–389.
Shinn, N.; Labash, B.; and Gopinath, A. 2023. Reflex-
ion: an autonomous agent with dynamic memory and self-
reflection. arXiv preprint arXiv:2303.11366, 2(5): 9.
Shu, D.; Chen, T.; Jin, M.; Zhang, Y.; Du, M.; and Zhang, Y.
2024. Knowledge Graph Large Language Model (KG-LLM)
for Link Prediction. arXiv preprint arXiv:2403.07311.
Shu, Y.; Gu, H.; Zhang, P.; Zhang, H.; Lu, T.; Li, D.; and
Gu, N. 2023. Rah! recsys-assistant-human: A human-central
recommendation framework with large language models.
arXiv preprint arXiv:2308.09904.
Wang, L.; Zhang, J.; Yang, H.; Chen, Z.; Tang, J.; Zhang,
Z.; Chen, X.; Lin, Y.; Song, R.; Zhao, W. X.; et al. 2023a.
User behavior simulation with large language model based
agents. arXiv preprint arXiv:2306.02552.
Wang, S.; Hu, L.; Wang, Y.; Cao, L.; Sheng, Q. Z.;
and Orgun, M. 2019. Sequential recommender sys-
tems: challenges, progress and prospects. arXiv preprint
arXiv:2001.04830.
Wang, Y.; Jiang, Z.; Chen, Z.; Yang, F.; Zhou, Y.; Cho, E.;
Fan, X.; Huang, X.; Lu, Y.; and Yang, Y. 2023b. Recmind:
Large language model powered agent for recommendation.
arXiv preprint arXiv:2308.14296.
Wang, Z.; Yu, Y.; Zheng, W.; Ma, W.; and Zhang, M. 2024.
Multi-Agent Collaboration Framework for Recommender
Systems. arXiv preprint arXiv:2402.15235.
Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.;
Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. 2023. The
rise and potential of large language model based agents: A
survey. arXiv preprint arXiv:2309.07864.
Xian, Y.; Fu, Z.; Muthukrishnan, S.; De Melo, G.; and
Zhang, Y. 2019. Reinforcement knowledge graph reason-
ing for explainable recommendation. In Proceedings of the
42nd international ACM SIGIR conference on research and
development in information retrieval, 285–294.
Xu, Y.; He, S.; Chen, J.; Wang, Z.; Song, Y.; Tong, H.; Liu,
K.; and Zhao, J. 2024. Generate-on-Graph: Treat LLM as
both Agent and KG in Incomplete Knowledge Graph Ques-
tion Answering. arXiv preprint arXiv:2404.14741.