0% found this document useful (0 votes)
20 views13 pages

Mathematics 13 01386

The paper presents an innovative attribute-sequence-based recommendation model that enhances the accuracy and interpretability of recommendation systems by focusing on user attributes rather than items. By constructing attribute sequences to capture fine-grained preference changes and utilizing advanced modeling techniques like LightGCN and gated attention mechanisms, the proposed model significantly outperforms traditional methods in various metrics. Experiments on real datasets demonstrate the model's effectiveness in adapting to dynamic user preferences and improving recommendation relevance.

Uploaded by

gayiwo7357
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Mathematics 13 01386

The paper presents an innovative attribute-sequence-based recommendation model that enhances the accuracy and interpretability of recommendation systems by focusing on user attributes rather than items. By constructing attribute sequences to capture fine-grained preference changes and utilizing advanced modeling techniques like LightGCN and gated attention mechanisms, the proposed model significantly outperforms traditional methods in various metrics. Experiments on real datasets demonstrate the model's effectiveness in adapting to dynamic user preferences and improving recommendation relevance.

Uploaded by

gayiwo7357
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

mathematics

Article

Attribute-Aware Graph Aggregation for Sequential


Recommendation
Yiming Qu 1 , Yang Fang 2, * , Zhen Tan 1 and Weidong Xiao 1

1 National Key Laboratory of Information Systems Engineering Laboratory, National University of Defense
Technology, Changsha 410073, China; [email protected] (Y.Q.)
2 College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
* Correspondence: [email protected]

Abstract: In this paper, we address the challenge of dynamic evolution of user preferences
and propose an attribute-sequence-based recommendation model to improve the accuracy
and interpretability of recommendation systems. Traditional approaches usually rely on
item sequences to model user behavior, but ignore the potential value of attributes shared
among different items for preference characterization. To this end, this paper innovatively
replaces items in user interaction sequences with attributes, constructs attribute sequences
to capture fine-grained preference changes, and reinforces the prioritization of current inter-
ests by maintaining the latest state of attributes. Meanwhile, the item–attribute relationship
is modeled using LightGCN and a variant of GAT, fusing multi-level features using gated at-
tention mechanism, and introducing rotary encoding to enhance the flexibility of sequence
modeling. Experiments on four real datasets (Beauty, Video Games, Men, and Fashion)
showed that the model in this paper significantly outperformed the benchmark model in
both NDCG@10 and Hit Ratio@10 metrics, with a highest improvement of 6.435% and
3.613%, respectively. The ablation experiments further validated the key role of attribute
aggregation and sequence modeling in capturing user preference dynamics. This work
provides a new concept for recommender systems that balances fine-grained preference
evolution with efficient sequence modeling.

Keywords: recommender system; sequential recommendation; item–attribute graph embedding

MSC: 68T07
Academic Editor: Pasquale De Meo

Received: 7 March 2025


Revised: 10 April 2025
Accepted: 21 April 2025
1. Introduction
Published: 24 April 2025
Recommender systems aim to present users with candidate items that align with their
Citation: Qu, Y.; Fang, Y.; Tan, Z.;
Xiao, W. Attribute-Aware Graph
preferences, derived from their historical interactions. Significant efforts have been invested
Aggregation for Sequential to enhance performance through various approaches. Conventionally, recommendation
Recommendation. Mathematics 2025, algorithms concentrate on calculating similarities between users and items [1–3], primarily
13, 1386. https://doi.org/10.3390/ employing collaborative filtering techniques. This method is founded on the assumption
math13091386
that users with similar tastes will likely favor similar items. With the evolution of deep
Copyright: © 2025 by the authors. learning, neural-network-based approaches have surpassed traditional methods in terms
Licensee MDPI, Basel, Switzerland. of their ability to more effectively represent complex user–item relationships. However,
This article is an open access article
user preferences are not static; they evolve over time. The static user embeddings that
distributed under the terms and
these models initially learn fail to capture the dynamic nature of user demands. As user
conditions of the Creative Commons
Attribution (CC BY) license
preferences change, recommender systems need to adapt to these changes to continue
(https://creativecommons.org/ delivering relevant suggestions that resonate with the user’s current preferences. Sequen-
licenses/by/4.0/). tial recommendation systems [4–7] (SRS) have thrived in the field of recommendation

Mathematics 2025, 13, 1386 https://doi.org/10.3390/math13091386


Mathematics 2025, 13, 1386 2 of 13

systems over the past few years to solve this problem, primarily because these systems are
predominantly based on users’ past behavior and are prevalent in nearly all e-commerce,
social media, and streaming platforms. Sequential recommendation systems process a
user’s history of interactions into a time-ordered sequence that reflects the evolution of
their preferences. Initially, research in this area primarily utilized Markov chains, which
posited that each interaction is dependent on the preceding one. More recent approaches,
such as those using recurrent neural networks [8] and self-attention mechanisms [9], have
shifted focus towards enhancing information propagation within sequences and mod-
eling relationships between items, yielding notable improvements. However, existing
approaches often treat items as the fundamental units that reflect user preferences, without
considering the attributes of these items, and they fail to explore the underlying reasons
behind each interaction. We contend that users consistently focus on specific attributes
during their interactions; different users have preferences for varying attributes, and even
the same user preferences for attributes can change over time, resulting in a shift in their
interaction patterns with items. By identifying the attributes of items that users genuinely
care about and directly generating preferences based on these attributes, we can achieve a
more granular representation of both items and their attributes, thereby enhancing model
performance. Even some of the sequential recommendation models [10] that utilize item
attributes for recommendation are inclined to use item attribute information as auxiliary
information for learning item changes, and do not delve into the deeper reasons for the
changes in the items that users have interacted with. In contrast, we attempt to identify key
item attributes as the primary unit for detailed extraction of user preferences. To enhance
the accuracy of capturing user preferences, this paper introduces a novel methodology that
substitutes items with attributes, thereby constructing an attribute sequence for each item
sequence. It is crucial to acknowledge that different items may share identical attributes,
indicating that these attribute sequences should represent the progression and evolution of
user preferences over time. An effective solution to address this issue is to maintain the
most recent state of each attribute within the sequence. This approach not only simplifies
representation but also ensures that the most relevant and current user preferences are
given priority. By shifting the focus from items to attributes, our method offers a more
dynamic insight into user behavior, capturing subtle shifts in preferences that might go
unnoticed by conventional item-based sequence recommendation systems. This strategy
further enhances the generalization ability and capability of the model to anticipate changes
in user interests, thus improving the precision and relevance of recommendations. Ad-
ditionally, this paper advances the encoding of attention networks by introducing new
codes aimed at enhancing experimental metrics and stabilizing the model in the face of
data sparsity. The main contributions of the paper are as follows:
• The paper constructs item–attribute interaction graphs to model correlations between
attributes and attribute change sequences, to model user preferences for attributes.
• The paper proposes new encoding methods for attention networks, to improve experi-
mental metrics and model stability.
• The paper conducted experiments on four real datasets, and the experimental results
show that our proposed model outperformed the state-of-the-art baseline models.

2. Related Work
2.1. Sequential Recommendation
In comparison with traditional recommendation methods, sequential recommendation
systems yield more accurate predictions by modeling the dependencies among items within
interaction sequences. This capability allows them to capture the dynamic preferences of
users. An archetypal solution in early sequential recommendation research involved the
Mathematics 2025, 13, 1386 3 of 13

use of Markov-Chain-based models [11], which model the transition of information within
the sequence to predict the subsequent item. Rendle and colleagues introduced person-
alized Markov chains that utilize transition matrices constructed from user consumption
sequences to capture both long-term and short-term user preferences. However, a signif-
icant drawback of Markov-chain-based methods is that the current interaction depends
on only one or a few recent interactions, thereby capturing only short-term dependencies,
while overlooking long-term relationships. With the advancement of deep learning, Recur-
rent Neural Networks (RNNs) and its variants, such as Long Short-Term Memory (LSTM)
and Gated Recurrent Units (GRUs), have been introduced into sequential recommendation.
Zhang modeled user click sequences using RNNs [12], enhancing prediction accuracy
by incorporating sequential dependencies. GRU4Rec [8] was among the first to employ
GRUs for modeling entire sessions, achieving more precise recommendations. To effectively
capture user sequential behavior and primary intent, NARM [13] incorporated an attention
mechanism within an RNN framework. Although these RNN-based methods demonstrated
satisfactory performance, they still faced challenges such as difficulties in parallelization
and learning long-term dependencies. To address these issues, inspired by the Transformer
model [14], SASRec [6] captures inter-item correlations through stacking self-attention
layers, eliminating the need for recurrent or convolutional networks. Recently, some
researchers have focused on modeling interaction times within sequences. To underline
the impact of time intervals on recommendation outcomes, TiSASRec [15] includes relative
time intervals and positional information within item embeddings. Chorus [16] designed
two distinct normal distribution temporal kernels for “complementary” and “substitute”
relationships, to explicitly model the evolution of user preferences over time. GDERec [17]
constructs a user project interaction graph and captures the continuous time dynamics of
user preferences using regular differential equations.

2.2. Attribute-Aware Recommendation


Attributes are ubiquitously available and provide essential descriptions of items,
leading to the development of numerous attribute-based recommendation methods. Early
approaches utilized models based on Factorization Machines [2] to model high-order
interactions among attributes. Pasricha and McAuley presented TransFM [18], which
substitutes the inner product typically used in sequential recommendations with squared
Euclidean distance, enhancing the model’s generalizability and transferability among
sample features. He and Chua introduced NFM [19], employing neural networks to learn
informative feature combinations, thus enhancing the capacity for multi-order interaction
among features.
Recent works [20,21] have embedded attribute information into feature vectors and
concatenated them for recommendation purposes. Cheng introduced the Wide&Deep
model [22], which combines deep neural networks with linear models for recommendation.
In this model, the deep component employs a multi-layer perceptron to connect feature
embedding vectors for learning feature interactions. Li proposed a Deep Heterogeneous
Autoencoder (DHA) [23] model that integrates attribute information into a shared feature
space, achieving improved predictive outcomes. Su and colleagues developed a multi-
matrix factorization approach, named MMF [24], which leverages common attributes
across different items to achieve state-of-the-art prediction accuracy. Lv introduced a
Sequential Deep Matching [25] (SDM) model that captures short-term preferences using a
multi-head attention mechanism; long-term preferences are captured by categorizing the
attributes of all items in a user’s long-term behavior sequence, which are then modeled by
an attention mechanism. Zhang proposed FDSA [26], a feature-level deep self-attention
network for sequential recommendations that employs distinct self-attention blocks for
Mathematics 2025, 13, 1386 4 of 13

item-level sequences and feature-level sequences, modeling item transition patterns and
feature transition patterns separately. One of the SOTA models CARCA [27] uses a multi-
layer multi-head self-attention module to extract user features contained in the user’s
historical behavior, and also utilizes a cross-attention module to fuse the target item and
the user’s features.

3. Problem Formulation
Let U={u1 , u2 , · · · , un } denote the set of n users, and I ={i1 , i2 , · · · , im } denote the set
of m items. The interaction history of user u ∈ U is denoted by Su ={i1 , i2 , · · · , i|Su | } in
chronological order, where |Su | represents the total number of interactions in this sequence
of user u . Each item in the interaction sequence is associated with multiple attributes, such
as category, brand, and time of interaction. These attributes are collectively represented by
Au ={ a1 , a2 , · · · , ak } , where k denotes the total number of distinct item attributes.
The primary objective of sequential recommendation systems is to accurately predict
the next item i|Su |+1 with which the user u is most likely to engage. This prediction is
based upon the historical sequence of item interactions |Su | by the user u, and the attributes
Au associated with each item in the sequence. By analyzing the patterns within |Su | and
considering the characteristics specified in Au , the system aims to enhance the precision of
its recommendation for the subsequent interaction.

4. Proposed Model
An overview of our proposed framework is shown in Figure 1. It consists of four
main components: (1) Inputting layer: inputting item sequences, as well as item attribute
information. (2) Attribute Attention Graph Embedding, which includes the generation of
attribute sequences for the graph convolutional network of items, as well as the attention
network of attribute sequences. (3) Attribute Aggregation: aggregation of items and
attribute embeddings directly extracted at different granularities through the attention
mechanism. The item-level embeddings are used as a query to aggregate multi-class
embeddings into a single representation. (4) Rotary Encoding: rotationally transforming
the fused embeddings to capture the relative positional relationships in the sequence.
(5) FFN: outputting scores and making recommendations.

Figure 1. Overview of the proposed framework.

Next, we describe each component in detail.


Mathematics 2025, 13, 1386 5 of 13

4.1. Attribute Attention Graph Embedding


In an effort to more precisely capture user interests, this paper suggests a novel
approach where items are replaced by their attributes, facilitating the construction of an
attribute sequence for every item sequence. It is important to recognize that different items
might possess the same attributes, indicating that these attribute sequences ought to reflect
the progression and evolution of user interests over time.
An intuitive method to address this is to maintain the most recent version of each
attribute within the sequence. Doing so not only simplifies the representation but also
ensures that the most relevant and recent user preferences are prioritized. By focusing on
attributes rather than items, our approach allows for a more dynamic understanding of user
behavior, capturing subtle shifts in preferences that might not be immediately apparent
through traditional item-based tracking. Additionally, this strategy enhances the model’s
ability to adapt to and predict changes in user interests, thereby improving the accuracy
and relevancy of our recommendations.
First, based on the attribute information Au of each item, an item–attribute graph can
be constructed as follows:
Ga = ( N, E), (1)

where N = {im ∪ ak |im ∈ I, ak ∈ Au } denotes the node set, including all items and their
attributes, E = {(im , ak )|im ∈ I, ak ∈ Au } denotes the edge set and im contains the attribute
ak . Then, we use GNN to model the connectivity of attribute graphs with multi-hop
aggregation. Here, we adopt LightGCN [28], a classic GNN-based CF method, to model
the item–attribute relationships:

( l −1) ( l −1)
ea ei
∑ ∑
(l ) (l )
ei = q , ea = q , l = 1, · · · , L, (2)
a∈NiC NiC |N a | i ∈Nu |N a | NiC

(l ) (l )
where ei , ea ∈ Rd represent the corresponding embeddings of the item i and the attribute
a in the l-th layer, respectively; L is the number of aggregation layers; d is the dimension of
embeddings; N a = {i |( a, i ) ∈ E} represents all the items that have the attributes a ; and
NiC = { a|( a, i ) ∈ E} represents all the attributes of the item i.
In addition, we construct a sequence of attribute changes based on the user’s sequence
of item interactions, using certain mapping rules to reflect the user’s preference for at-
tributes over time. The attribute change sequence construction rule is as follows: replace
each item of the item sequence with the attribute it has, to form an attribute sequence in
which only the last of the duplicated attributes is retained, and all the rest of the duplicated
attributes are deleted to form a new attribute sequence. This sequence is the generated
attribute change sequence [29].
Specifically, based on the previously defined item–attribute graph, we can we can
determine the exact location of each attribute in the item sequence:
 
Loc a j = max Pos(im ) | im , a j ∈ E (3)

where the Pos(im ) function denotes the original position of item im . The corresponding
attribute sequence Au seq = a1u , a2u , . . . , auL is obtained by sorting the locations and filtering
 

out attributes with fewer connections, where L a is a hyperparameter that denotes the
maximum length of the attribute sequence.
Based on the obtained attribute changing sequence, in order to build a bridge between
the interests extracted from the attributes and the items to be predicted, we deploy attribute-
based aggregation using a variant of GAT. Considering an attribute in the sequence of
attributes ai ∈ Au , we use all the attributes in the sequence Au to update the features for
Mathematics 2025, 13, 1386 6 of 13

ai , and each attribute in the sequence is computed with the ai to compute the normalized
attention value and aggregated:
 
exp Leaky_ReLU(eaTi ea j )
eSa =
i
∑   ea j (4)
a j ∈ Au ∑ ak ∈ Au exp Leaky_ReLU(eaTi eak )

where eai ∈ Rd denotes the embedding of ai .

4.2. Attribute Aggregation


We have now derived the augmented embeddings for both items and attributes, along
with the sequence of changes in these items and attributes. These embeddings are designed
to represent user preferences at two levels: basic embeddings at the item level, and more
detailed, fine-grained embeddings at the attribute level. Each type of embedding captures
different facets of user preferences. By employing an attention mechanism, we aggregate
these multiple embeddings into a final item representation. Specifically, this involves
calculating the similarity between the item-level embeddings and the various attribute-
level embeddings. Based on these similarity scores, we compute a weighted sum of the
attribute-level interests to form the final representation. In this attention mechanism, the
item-level embeddings serve as the query, while the attribute-level interests are used as
both keys and values:
s = Attention( Ei , Ea , Ea ), (5)

where Ei and Ea are the final embeddings formed by the item and attribute aggregated
embeddings through the fully connected layer, respectively.

4.3. Rotary Encoding


The final embedding is encoded next, and unlike most methods that use a self-attentive
approach, the paper uses rotational coding to encode the fused embedding. Rotational
coding [30] focuses on representing positions in a fixed coordinate system or relative to other
items in the sequence, utilizing rotational transformations to encode positional information.
The orientation or rotation of the positional embedding can provide valuable informa-
tion about the relationships between props in the sequence. This approach is particularly
suitable for scenarios where the inherent order or arrangement of items is not straight-
forward, and can benefit from more flexible representations, such as a user searching
for multiple items in the same session. These embeddings capture the linear order and
rotational relationships in sequences. Rotational positional encoding applies rotations to
the query and key-value matrices, which preserves the dot product between vectors.
Absolute or relative positional encoding focuses on representing positions within a
fixed coordinate system or relative to other items in a sequence. Conversely, rotational
encoding operates differently by utilizing rotational transformations to encode positional
information. Rotational positional encoding, often referred to as ‘Rope’, represents a specific
technique used within position encoding. The underlying motivation for ‘Rope’ is that the
orientation or rotation of positional embeddings can provide valuable information about
the relationships between items in a sequence. This approach is particularly beneficial in
scenarios where the inherent order or arrangement of items is not straightforward and
could benefit from a more flexible representation, such as a user’s search behavior within
the same session [31]. These embeddings capture both the linear order and rotational
relationships within the sequence. Rotational positional encoding involves rotating the
query and key matrices, thereby preserving the dot product between vectors. In a two-
Mathematics 2025, 13, 1386 7 of 13

dimensional setup, for a given input vector x and a specified angle θ, the encoding function
f is defined as follows:
f θ (s) = R(θ ) · W · s. (6)

where R(θ ) denotes the rotation matrix parameterized by the angle, denoted by the sine
and cosine, and W is the key value or query. More explicitly, for the case of key values K,
we have " # " # " #
cos(θ ) −sin(θ ) K11 K12 s
· · 1 . (7)
sin(θ ) cos(θ ) K21 K22 s2

where s1 and s2 denote the components of s in Equation (6) in two-dimensional space.


The encoded embedding is then output to the final item score after a layer of FFN:

ŶO = FFN − Out SO .



(8)

where SO denotes the embedding after rotation encoding in Equation (7).

4.4. Model Optimization


As in previous work [27], we use the Adam optimizer to minimize the binary cross-
entropy loss of the CARCA model, while masking filler terms to prevent them from
n sequence ofoitems, the interaction of user u is
contributing to the loss function. For a given
Su ={i1 , i2 , · · · , i|Su | }. The input list I u+ = i1 , . . . , i|Su |−1 is obtained by deleting the last
n o
item of Su . The positive target list T u+ = i2 , . . . , i|Su | is obtained by moving the list Su
one item to the right. The negative list of targets T u− = {irand1 , . . . , irandN −1 } is obtained
by random generation. Then the loss function is

L=−∑ ∑
  
Yr log Ŷr + (1 − Yr )log 1 − Ŷr . (9)
u ∈U r ∈ T u + ∪ T u −

5. Experimental Setup
5.1. Datasets
We chose the Amazon dataset, to assess proficiency in various coding types. For this
purpose, we used four different real-world datasets extracted from product reviews on
Amazon. Basic statistical information for these four datasets is shown in Table 1.

Table 1. Dataset statistics.

Datasets User Item Interactions Attributes Sparsity


Beauty 52,204 57,289 394,908 6507 99.9868%
Video games 31,013 23,715 287,107 506 99.9610%
Men 34,244 110,636 254,870 2048 99.9933%
Fashion 451,84 166,270 358,003 2048 99.9523%

These datasets are diverse and have been widely used for SRS under leave-behind
protocols [4–6,32–34]. Since there are multiple versions of these datasets, we chose the
preprocessed version of CARCA [27].
These datasets contain a variety of product categories and provide a diverse and com-
prehensive source for training and evaluating recommendation models. They provide the
necessary characteristics for each interaction, such as the user id, item id, timestamp, and addi-
tional vector context information. The specific information for each dataset is provided below:
The Beauty dataset includes discrete and categorical attributes of all beauty prod-
ucts, including fine-grained categories and brands. There are mainly categorical and
discrete features.
Mathematics 2025, 13, 1386 8 of 13

The Video Games dataset includes user interactions, reviews, and product details
specific to the video games category, such as price, brand, and categorical features. Most of
the attributes here are discrete and categorical.
The Men dataset contains a comprehensive collection of men’s clothing items. At-
tributes are dense vectors of image-based features extracted from the last layer of ResNet50
from the ImageNet dataset.
The Fashion dataset contains six categories of male and female clothing. Dense features
are extracted using the same ResNet50 method.

5.2. Baselines
SASRec [6]: A sequence recommendation model that utilizes self-attention networks to
capture the relationships between different time steps in the same sequence in a parallelized
manner. It then derives the sequence characteristics for each time step through weighted
summation.
BERT4Rec [7]: A model that employs a deep bidirectional self-attention mechanism
to model user behavior sequences, addressing the limitations of traditional unidirectional
models in capturing user behavior patterns.
S3Rec [5]: A model that leverages correlations within the original data to construct
self-supervised signals, enhancing data representation through pretraining methods to
improve sequence-based recommendations.
CARCA [27]: A recommender system model that that integrates contextual infor-
mation with item attributes. The model enhances recommendation accuracy by using a
cross-attention mechanism to predict user preferences for the next item.

5.3. Evaluation Metrics


To evaluate the performance of our recommendation system, we employed two widely-
used Top-K metrics: Hit Ratio (HR@K) and Normalized Discounted Cumulative Gain
(NDCG@K). These metrics provided complementary insights into the quality and effec-
tiveness of our recommendations. Hit Ratio@K measures the proportion of recommended
items that appear in the top K positions of the list, indicating how well the system identifies
relevant items. This metric focuses on the accuracy of the recommendations. NDCG@K
evaluates the ranking quality within the top K recommendations, giving higher weight to
items ranked closer to the top. This metric emphasizes the precision of the ranking order,
ensuring that the most relevant items are prioritized. The formulas for these metrics are
as follows:

S
1
HR =
S ∑ hits(i), (10)
i =1

DCG
NDCG = (11)
IDCG
p
2reli − 1
DCG = ∑ log2 (i + 1) (12)
i =1

| REL|
2reli − 1
IDCG = ∑ log2 (i + 1)
(13)
i =1

where S denotes the total number of items, hits(i ) denotes whether the i-th demand item
is included in the list of items recommended by the model. If it is in the list, its value is 1;
otherwise it is 0. reli denotes the set after sorting the original recall set R according to the
scores from largest to smallest, and the sorted set is used to compute the DCG to obtain
Mathematics 2025, 13, 1386 9 of 13

the IDCG. The NDCG obtained after normalization is a relative value, thus even enabling
comparisons to be made between different users.

5.4. Parameter Settings


The parameter settings for the baseline model were initialized according to the optimal
values recommended in the original paper. For our model, the number of attention heads
and blocks was set to 3, the L2 regularization weight was set to 0.0001, and the other
parameter settings for the four datasets are shown in Table 2.

Table 2. Parameter settings.

Video
Parameter Beauty Men Fashion
Games
Learning Rate 0.0001 0.0001 0.000006 0.00001
Max Sequence Length 75 50 35 35
Dropout Rate 0.5 0.5 0.3 0.3
Embedding Dimension 90 90 390 390
Dimension of Fully Connected Layer 450 450 1950 1950
Maximum Length of Attribute Sequence 100 80 60 60

6. Experimental Results
Table 3 summarizes the recommendation performance of the baseline approach and
our proposed model on the four datasets. It is clear that the model outperformed the
other baseline methods on most datasets, because it is able to understand user prefer-
ences in a fine-grained way from the correlation and change trends of attributes. The
better performance for the Beauty dataset demonstrates the advantage of utilizing more
attribute information propagation in sequential recommendation. The model achieved an
NDCG@10 of 0.419 (highest improvement of 6.435% over CARCA) on the Beauty dataset
and a Hit Ratio@10 of 0.803 (highest improvement of 3.613% over CARCA) on the Video
Games dataset.

Table 3. Performance comparison.

Model Beauty Video Games Men Fashion


NDCG@10 HitRatio@10 NDCG@10 HitRatio@10 NDCG@10 HitRatio@10 NDCG@10 HitRatio@10
SASRec 0.322 0.485 0.541 0.742 0.259 0.397 0.245 0.381
BERT4Rec 0.318 0.478 0.509 0.705 0.193 0.315 0.309 0.328
S3Rec 0.371 0.538 0.541 0.765 0.238 0.365 0.239 0.367
CARCA 0.394 0.574 0.567 0.775 0.349 0.550 0.381 0.591
Ours 0.419 0.593 0.575 0.803 0.361 0.544 0.378 0.624
The bold data in the table represents the experimental result with the highest indicators in the dataset. The
underlined data in the table represents the experimental result with the second highest indicators in the dataset.

Our model significantly outperformed all baseline models, except CARCA, across all
datasets, highlighting the importance of leveraging item attributes to learn user preferences.
Compared to the more advanced S3Rec model, our model achieved improvements of
6.285% in NDCG@10 and 4.967% in Hit Ratio@10 on the Video Games dataset, which had
the fewest item attributes. This improvement is attributed to the ability to incorporate item
attribute information when representing item sequences.
A rather counterintuitive result was that our proposed model slightly underperformed
CARCA on one of the metrics for the Men and Fashion datasets, both of which contained
more attributes than the Video Games dataset. This was mainly due to the fact that the item
attributes in these two datasets are image-based features, and the refinement process of
converting item features into attribute features in our model led to some information loss,
Mathematics 2025, 13, 1386 10 of 13

negatively impacting performance. On the other hand, this also explains why our model
performed better on the Game and Beauty datasets, as these datasets predominantly feature
categorical attributes, being more effective at extracting fine-grained attribute-level features.

6.1. Ablation Study


We performed ablation experiments on each of the two components of the model:
the item–attribute graph aggregation, and attribute sequence aggregation. Table 4 and
Figure 2 shows the experimental results, where w/o attr agg and w/o attr seq represent
the variants where the model removed item–attribute graph aggregation and removed
attribute sequence aggregation, respectively.

Table 4. Ablation study, effect of item–attribute graph aggregation and attribute sequence.

Model Beauty Video Games Men Fashion


NDCG@10 HitRatio@10 NDCG@10 HitRatio@10 NDCG@10 HitRatio@10 NDCG@10 HitRatio@10
w/o attr agg. 0.385 0.575 0.548 0.789 0.355 0.519 0.331 0.497
w/o attr seq. 0.371 0.549 0.527 0.781 0.359 0.519 0.346 0.508
Ours 0.419 0.593 0.575 0.803 0.361 0.544 0.378 0.624
The bold data in the table represents the experimental result with the highest indicators in the dataset.

Figure 2. Impact of item–attribute graph aggregation and attribute sequence aggregation.

We can see that in the absence of an attribute sequence graph, the model suffered
the greatest performance loss on the first two datasets. On the latter two datasets, the
experimental results of removing the two model components differed negligibly, which
suggests that fine-grained information extracted from attributes is essential for user pref-
erence representation. Removing attribute sequences (w/o attr seq) resulted in a larger
performance loss compared to removing attribute aggregation. For example, on the Video
Games dataset, the NDCG@10 dropped from 0.575 to 0.527, and the Hit Ratio@10 dropped
from 0.803 to 0.781. This suggests that fine-grained attribute-level information is crucial
for user preference representation, and the attribute aggregation mechanism plays a key
role in integrating this information into the model. In most cases, our model w/o attr
agg achieved a better performance than our model w/o attr seq, proving that directly
utilizing attributes for feature extraction, without learning how they change with user pref-
erences, is not sufficient. Compared to the other two compared models, our model utilizes
both item-level features and attribute-level fine-grained multiple feature representations,
which ensures that it captures detailed user preferences. However, the model failed to
capture changes in user preferences when extracting different features from a sequence of
attributes, without adding relative positional encoding to each attribute, which led to a
performance degradation.

6.2. Hyperparameter Analysis


Figure 3 reflect the performance of the model under different settings of the maximum
attribute sequence length L a , which we set at 40, 60, 80, 100, and 120 to explore its impact.
Mathematics 2025, 13, 1386 11 of 13

Under our attribute sequence construction method, when L a was small, the attribute
sequence retained only the first La attributes that had the largest item neighbors, which
represented the attributes most likely to be preferred by the user. When L a was too large,
the attributes of the nearest items dominated the user preference extraction. And when
L a = 60, the model achieved the best results on the Men and Fashion datasets, while on the
Beauty and Video Games datasets, the best performance was achieved with a L a of 100 and
80, respectively, which may have been due to the fact that these two datasets have longer
sequences and therefore require more attributes to reflect user preferences.

Figure 3. Impact of maximum attribute sequence length.

We further investigated the effect of the initial embedding dimensions of items and
attributes on the experimental results. With the remaining hyperparameters fixed, we set
the value range of the embedding dimension d at 90, 190, 290, 390, 490, 590. From Figure 4,
we can see that the model obtained the best performance on the Beauty dataset and
Video games dataset when d = 90, and that when d was large, the high dimensionality
extraction of item features led to the model’s inability to learn the sequence prediction task
efficiently, which resulted in a lower recommendation performance on the Beauty dataset
and Video games dataset. The Men and Fashion datasets, on the other hand, required
a higher embedding dimension d = 390 to achieve the best performance, due to having
more features.

Figure 4. Impact of embedding dimension.

7. Conclusions
In this paper, we proposed a method based on a graph neural network for constructing
item attribute maps, to compute the sequence construction of attribute changes, in order to
Mathematics 2025, 13, 1386 12 of 13

quickly determine which attributes are valuable for modeling user preferences. An attention
mechanism was also introduced to integrate item-level and sequence-level information, and
the effect of item attributes on encoding in the user–item interaction sequence was fused
into the recommendation results. Experimental results on four different datasets showed
that our proposed model significantly outperformed several state-of-the-art models in the
item sequence recommendation task.

Author Contributions: Conceptualization, Y.Q. and W.X.; methodology, Y.Q. and Y.F.; software, Y.Q.
and W.X.; validation, Y.Q. and Y.F.; formal analysis, Y.Q.; investigation, Y.Q.; resources, Y.Q.; data
curation, Y.Q.; writing—original draft preparation,Y.Q.; writing—review and editing, Y.F., Z.T. and
W.X.; supervision, W.X.; project administration, W.X.; funding acquisition, W.X. All authors have read
and agreed to the published version of the manuscript.

Funding: This research was funded by National Key R&D Program of China No. 2022YFB3102600,
NSFC under grants Nos.62306322 and 72371245.

Data Availability Statement: The datasets can be found at https://nijianmo.github.io/amazon/


index.html (accessed on 15 October 2024).

Acknowledgments: This work was partially supported by National Key R&D Program of China No.
2022YFB3102600, NSFC under grants Nos. 62306322 and 72371245.

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE
International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 263–272.
2. Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia,
13–17 December 2010; pp. 995–1000.
3. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the
10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295.
4. Wu, L.; Li, S.; Hsieh, C.J.; Sharpnack, J. SSE-PT: Sequential recommendation via personalized transformer. In Proceedings of the
14th ACM Conference on Recommender Systems, Virtual, 22–26 September 2020; pp. 328–337.
5. Zhou, K.; Wang, H.; Zhao, W.X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; Wen, J.R. S3-rec: Self-supervised learning for
sequential recommendation with mutual information maximization. In Proceedings of the 29th ACM International Conference
on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 1893–1902.
6. Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on
Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 197–206.
7. Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder
representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge
Management, Beijing, China, 3–7 November 2019; pp. 1441–1450.
8. Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015,
arXiv:1511.06939.
9. Hidasi, B.; Karatzoglou, A. Recurrent neural networks with top-k gains for session-based recommendations. In Proceedings
of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018;
pp. 843–852.
10. Dong, X.; Jin, B.; Zhuo, W.; Li, B.; Xue, T. Improving sequential recommendation with attribute-augmented graph neural
networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Virtual Event, 11–14 May
2021; pp. 373–385.
11. Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In
Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820.
12. Zhang, Y.; Dai, H.; Xu, C.; Feng, J.; Wang, T.; Bian, J.; Wang, B.; Liu, T.Y. Sequential click prediction for sponsored search with
recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31
July 2014; Volume 28.
13. Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM
on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428.
Mathematics 2025, 13, 1386 13 of 13

14. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In
Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December
2017.
15. Li, J.; Wang, Y.; McAuley, J. Time interval aware self-attention for sequential recommendation. In Proceedings of the 13th
International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 322–330.
16. Wang, C.; Zhang, M.; Ma, W.; Liu, Y.; Ma, S. Make it a chorus: Knowledge-and time-aware item modeling for sequential
recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information
Retrieval, Virtual, 24–25 July 2020; pp. 109–118.
17. Qin, Y.; Ju, W.; Wu, H.; Luo, X.; Zhang, M. Learning graph ode for continuous-time sequential recommendation. IEEE Trans.
Knowl. Data Eng. 2024, 36, 3224–3236. [CrossRef]
18. Pasricha, R.; McAuley, J. Translation-based factorization machines for sequential recommendation. In Proceedings of the 12th
ACM Conference on Recommender Systems, Vancouver, BC, Canada, 2–7 October 2018; pp. 63–71.
19. He, X.; Chua, T.S. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM
SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 355–364.
20. Shi, S.; Zhang, M.; Liu, Y.; Ma, S. Attention-based adaptive model to unify warm and cold starts recommendation. In Proceedings
of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018;
pp. 127–136.
21. Zhou, G.; Zhu, X.; Song, C.; Fan, Y.; Zhu, H.; Ma, X.; Yan, Y.; Jin, J.; Li, H.; Gai, K. Deep interest network for click-through rate
prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London,
UK, 19–23 August 2018; pp. 1059–1068.
22. Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide
& deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems,
Boston, MA, USA, 15 September 2016; pp. 7–10.
23. Li, T.; Ma, Y.; Xu, J.; Stenger, B.; Liu, C.; Hirate, Y. Deep heterogeneous autoencoders for collaborative filtering. In Proceedings of
the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 1164–1169.
24. Su, Y.; Erfani, S.M.; Zhang, R. MMF: Attribute interpretable collaborative filtering. In Proceedings of the 2019 International Joint
Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8.
25. Lv, F.; Jin, T.; Yu, C.; Sun, F.; Lin, Q.; Yang, K.; Ng, W. SDM: Sequential deep matching model for online large-scale recommender
system. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China,
3–7 November 2019; pp. 2635–2643.
26. Zhang, T.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Wang, D.; Liu, G.; Zhou, X. Feature-level deeper self-attention network for
sequential recommendation. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 4320–4326.
27. Rashed, A.; Elsayed, S.; Schmidt-Thieme, L. Context and attribute-aware sequential recommendation via cross-attention. In
Proceedings of the 16th ACM Conference on Recommender Systems, Seattle, WA, USA, 18–23 September 2022; pp. 71–80.
28. He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for
recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information
Retrieval, Virtual, 24–25 July 2020; pp. 639–648.
29. Ding, Q.; Sun, T.; Zhou, M. Attribute-driven Interest Modeling for Sequential Recommendation. In Proceedings of the 2022
International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8.
30. Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing
2024, 568, 127063. [CrossRef]
31. Lopez-Avila, A.; Du, J.; Shimary, A.; Li, Z. Positional encoding is not the same as context: A study on positional encoding for
Sequential recommendation. arXiv 2024, arXiv:2405.10436.
32. He, R.; McAuley, J. VBPR: Visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference
on Artificial Intelligencer, Phoenix, AZ, USA, 12–17 February 2016; Volume 30.
33. Hou, M.; Wu, L.; Chen, E.; Li, Z.; Zheng, V.W.; Liu, Q. Explainable fashion recommendation: A semantic attribute region guided
approach. arXiv 2019, arXiv:1905.12862.
34. Steck, H. Embarrassingly shallow autoencoders for sparse data. In Proceedings of the The World Wide Web Conference, San
Francisco, CA, USA, 13–17 May 2019; pp. 3251–3257.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like