0% found this document useful (0 votes)
8 views20 pages

Graph Neural Network For Context-Aware Recommendation: Asma Sattar Davide Bacciu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Graph Neural Network For Context-Aware Recommendation: Asma Sattar Davide Bacciu

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Neural Processing Letters (2023) 55:5357–5376

https://doi.org/10.1007/s11063-022-10917-3

Graph Neural Network for Context-Aware Recommendation

Asma Sattar1 · Davide Bacciu1

Accepted: 6 June 2022 / Published online: 20 June 2022


© The Author(s) 2022

Abstract
Recommendation problems are naturally tackled as a link prediction task in a bipartite graph
between user and item nodes, labelled with rating information on edges. To provide personal
recommendations and improve the performance of the recommender system, it is necessary
to integrate side information along with user-item interactions. The integration of context is a
key success factor in recommendation systems because it allows catering for user preferences
and opinions, especially when this pertains to the circumstances surrounding the interaction
between users and items. In this paper, we propose a context-aware Graph Convolutional
Matrix Completion which captures structural information and integrates the user’s opinion on
items along with the surrounding context on edges and static features of user and item nodes.
Our graph encoder produces user and item representations with respect to context, features
and opinion. The decoder takes the aggregated embeddings to predict the user-item score
considering the surrounding context. We have evaluated the performance of our model on 14
five publicly available datasets and compared it with state-of-the-art algorithms. Throughout
this we show how it can effectively integrate user opinion along with surrounding context
to produce a final node representation which is aware of the favourite circumstances of the
particular node.

Keywords Recommender Systems · Context-aware Recommendation · Deep learning for


Graphs · Graph Neural Networks

Symbols:
Auvc : 3D matrix between users, items and context
Ar : 2D user’s opinion matrix
Uc : 2D user’s contextual importance matrix
Vc : 2D item’s contextual importance matrix
UF : 2D user’s static importance matrix
VF : 2D item’s static importance matrix
NF : Total number of user’s features

B Asma Sattar
[email protected]
Davide Bacciu
[email protected]
1 Dipartimento di Informatica, Università di Pisa, L.Go B. Pontecorvo 3, Pisa 56121, Pisa, Italy

123
5358 A. Sattar, D. Bacciu

NF : Total number of item’s features


zu : user’s opinion representation
zv : item’s opinion representation
zu : user’s contextual representation
zv : item’s contextual representation
zu : user’s feature representation
zv : item’s feature representation

Abbreviations:
GCMC: Graph convolution matrix completion
GCMC + feat: Graph convolution matrix completion with user and item features
GCMC: context aware Graph convolution matrix completion
cGCMC + feat: context aware Graph convolution matrix completion with user and
item features

1 Introduction

With the rapid development of e-commerce and social media platforms in the last few years,
recommender systems have gathered notable attention [1, 2]. They provide a methodology
to identify user’s requirements and predict the interest by mining the user’s history and
their interactions with items (e.g., purchase, watch, click, and read). Recommender systems
can take various forms depending upon the application, e.g., playlist generator for video
and music services (Netflix, YouTube), friend suggestions on Instagram and Facebook, and
product suggestion on eBay and Amazon. One of the most common and general approaches
for recommendation is Collaborative Filtering (CF) [3, 4], which assume similar users have
similar preferences and hence they like similar items. This approach models explicit feedback
(e.g., ratings) or implicit feedback (e.g., clicks, read) to reconstruct the user’s interactions.
Recently, approaches based on Graph Neural Networks (GNNs) have been demonstrated to
be highly effective on various tasks defined over relational data, such as protein structure
and knowledge graphs [5]. The main idea of GNN is to produce the representation of a node
by aggregating features from its neighbouring nodes iteratively, as shown in Fig. 1. Each
GNN layer gathers all k-hop nearby node embeddings (messages) and summarizes them via
an aggregation function (e.g., sum). After aggregation, the node’s current state is updated.
Many of these approaches treat recommendation tasks as link prediction in bipartite graphs
via matrix completion [6, 7]. The bipartite graph can be represented as an adjacency matrix
between user and item nodes, where the task is to predict entries inside the matrix (also
known as link prediction). Recently, many researchers contributed towards the development
of GNN-based collaborative filtering for modelling user-item interactions in the form of a
message passing neural network between user and item nodes [8, 9].
A wide range of techniques including CF based approaches for recommender systems
solely focus on rating information provided by users. Despite the popularity of these
approaches, they have limited performance in real-world applications as they neglect side
information such as static features of nodes (user’s and item’s profile), and surrounding con-
text information (e.g., mood, time, weather) that can improve performance by enhancing the
personalization in recommender systems. The surrounding context reflects the fact that user
choices change with time and are highly dependent on the context under which they interact
with the item. For example, time and weather information highly impact the choice of users
in restaurant recommendation, while the user’s mood influences which song they are most

123
Graph Neural Network for Context-Aware Recommendation 5359

likely to listen. As such, it is important to develop context-aware recommender systems that


can effectively accommodate the static features of users, as well as surrounding context infor-
mation while making predictions [10]. The contextual prefiltering technique [11] filters the
originally available data based on current context information, and recommendation is based
on the filtered data. On the other hand, the contextual postfiltering paradigm [12] takes the
recommendation results from the two-dimensional recommendation techniques and filters
these results based on the current context. In [13], a context based recommendation problem is
mapped to a tensor completion task, which is inspired from CF approach (matrix completion),
but it suffers from high complexity. SocialMF [14] integrates a trust factor as a social context
between users in the social network to enhance the performance of the matrix factorization
approach. Following this line of research, several deep learning based matrix factorization
approaches have been proposed for context-aware recommendation tasks [15–17].
The existing approaches for context-aware recommendation are incapable of capturing
dynamic user-item-context deep interaction, and discount the fact that the same person can
behave differently when interacting with the same item under different context [18]. It is
therefore reasonable to expect an improvement in the quality of personalized recommenda-
tions when incorporating dynamic context information. This is the key focus and motivation
underlying this work. The Fig. 2 represents the user’s interaction with items considering the
knowledge about the surrounding context. This can also be represented as a bipartite graph
between users and items, with edges labelled with context and ratings/opinion. We introduce
a novel GNN based matrix completion approach with an attention mechanism that effectively
integrates the following three kinds of information from the graph (Fig. 2), between user and
item nodes:

Fig. 1 Graph Neural Network with message passing up to k hop neighbours. Each neighbouring node or edge
share information and impact each other’s updated embedding

Fig. 2 User’s interaction with the item (e.g., movie) is surrounded by certain context (e.g., weather, mood,
weekend) that influence user’s opinion on item. This data takes the form of a 3D matrix between user, item
and context

123
5360 A. Sattar, D. Bacciu

• user’s opinion/rating on items;


• context information on edges between users and items;
• Static features of users and items.
In particular, we leverage a context-aware graph convolutional autoencoder for matrix
completion. Our graph convolutional auto encoder learns from the static features of nodes
and user-item interaction information (rating), and context. We also introduce an attention
factor for three kinds of embeddings (static feature-based, opinion-based, and context-based)
generated by the encoder. The resulting embeddings are given as input to the decoder with
the objective to reconstruct a matrix with minimum loss. A preliminary version of this work
has appeared as a conference article [19]. This work extends the original article by including
:
• multiple aggregation functions for user-item opinion graph inside a customized weight
sharing graph convolutional network;
• the attention mechanism for integrating multiple representations for users and items, i.e.,
opinion, contextual and static feature representation;
• a performance evaluation of the proposed algorithm on two additional datasets for music
and travel recommendation;
• an extended analysis of the algorithm to include the impact of the attention mechanism
for the aggregation of multiple representations.

2 Related Work

The vast majority of the work in the field of context-aware recommendation frameworks has
been devoted to the improvement of matrix factorization (MF) approaches. These approaches
work by decomposing the user-item interaction matrix into lower dimension matrices [20, 21].
Despite of good performance, these approaches are unable to capture the user/item-context
correlation as they consider context as features of the user and item [22]. Neural Factorization
machine (NFM) is a deep learning method to model high-order nonlinear feature interactions
for sparse data [15]. In [23], a neural network model has been proposed that captures the
impact of context on users and items. It learns the importance of context, but the simplicity of
this model limits the ability to capture the real influence of the relationship between features.
Recently, GNN based approaches have been introduced to tackle recommendation tasks
on graph-structured representations of the problem [24]. These methods are suitable for
modelling the interaction of nodes on graph structural features in a flexible and explicit
way. Fi-GNN [25] utilizes a graph structure to naturally represent the characteristics of
multiple feature fields, in which every node corresponds to a feature field, and these different
fields can interact through edges to model the node interaction in graph. STAR-GCN [7]
stacks multiple identical GCN encoder-decoders combined with intermediate supervision to
improve the final prediction performance. GCMC [6] leverages the bipartite graph between
user and item nodes to learn the node representations. Both GCMC and STAR-GCN treat
equally all neighbours of a node. IGMC [26] is an inductive approach for user-item matrix
completion recommendation tasks, which do not consider any side information.
Previous GNN based collaborative filtering approaches [27, 28] are unable to capture
the collaborative filtering effect, as they discard the collaborative signals that are hidden
in user-item interaction. In [8], NGCF model successfully encodes user-item high-order
connectivity by exploiting user-item bipartite graph. GCF-YA [29] is a deep graph neural
network implementation of collaborative filtering, based on information propagation and

123
Graph Neural Network for Context-Aware Recommendation 5361

attention mechanism to predict missing links between users and items. GraphRec [30] tackles
social recommendation by aggregating the historical behaviour of individuals from user-user
and user-item bipartite graph for recommendation.
Context information on the user has been successfully used to improve recommendation
performance [16, 31]. Recently, we have seen work on dynamic graphs that integrate inter-
action times as context information [32–34]. DGCF [35] integrates the time interval between
the previous and current interaction of user-item pairs inside their embedding to get the lat-
est node representations for recommendation. An inductive deep learning approach DyRep,
which is used to learn from the temporally evolving interaction between user item nodes.
These approaches solely consider time information and are hence limited to integrate any
other context information.
The above GNN based approaches consider the rating information as the user’s opinion on
the edges between the user and item nodes in a bipartite graph. Some approaches only consider
user and item static features, or integrate time as a context to capture a dynamically evolving
environments. All these approaches ignore the surrounding context information that can
improve performance. In the following, we show how it is possible to extend such approaches
to consider dynamic and time-varying contextual features influencing recommendations.

3 Problem Definition

We have categorized data for context-aware recommendation into four categories: items,
users, context, and interactions. Context can be defined as the surrounding knowledge that
is associated with the user-item interaction, e.g., time, company, mood, location, etc. In this
work, we have defined a 3D rating/opinion interaction matrix between user, item and context
Auvc ∈ R Nu ×Nv ×Nc , where Nu is total number of users, Nv represents the total number of
items and Nc is total number of different contexts (as shown in Fig. 2). The rating scale
ranges from one to five stars such that Auvc ∈ {1, . . . 5} Nu ×Nv ×Nc except for InCarMusic
dataset, where maximum rating is six. User and items are associated to multiple static features
describing the characteristics of individuals. For example, static user features are gender, age,
and static product features can be colour, brand, category etc. Let N Fu and N Fv represents the
total number of features of users and items, respectively. The importance of the contextual
features varies from person to person and from item to item.
Given such data, the recommendation problem is then cast as a task aiming to predict the
existence of a labelled link between a user and an item considering the knowledge about the
surrounding context. This work aims to introduce context information to matrix completion
tasks with mechanisms for finding which context attributes are important for a target user
and items. Details of the learning model are discussed in Sect. 4.

4 Context-Aware GNN model

In this section, we present our link prediction model for bipartite graph between users and
items with context information on edges. We extend the graph convolutional autoencoder in
[6] (GC MC + f eat, in the following). GC MC + f eat leverages rating information using
a 2D user-item opinion/rating matrix along with static node features, ignoring the context
information on edges. The major contribution of our approach, dubbed as context-aware graph
convolutional matrix completion (cGC MC F ), is to utilize context features on the edges. The

123
5362 A. Sattar, D. Bacciu

Fig. 3 High-level architecture of the proposed context-aware graph convolutional autoencoder. User’s opinion
on item is modeled using local weight sharing GCN. User and item features as well as user-context and item-
context are modeled using dense neural network. While user-context-item interaction is modeled with GCN
with global weight sharing

proposed architecture has three main blocks, shown in Fig. 3. From top to bottom: the first
block represents the input data, i.e., user’s opinion/rating on items, the profile of users and
items, user-item-context interaction graph with edges labeled with context and rating, and
the favourite context of users and items. The second block represents the graph encoder.
Inside the graph encoder, GC MC + f eat operates on 2D user-item rating matrix, while
cGC MC F is our proposed extension that leverages context information on edges and maps
user-item-context interaction to a 3D matrix. The graph encoder is composed of two graph
convolutional neural network layers and two dense neural network layers. Each layer operates
on different data to produce user and item representations with respect to rating opinion, static
node features, and context information. This multiple perspective representation for each
user and item is accumulated without attention weights, in our algorithms cGC MC old and
cGC MC Fold [19]. While in cGC MC and cGC MC F , we provide the accumulation along
with the attention mechanism. Further details regarding the encoder part are explained in
Sect. 4.1. The decoder (discussed in Sect. 4.2) utilizes the encoded representations to predict
the link in a bipartite graph.

4.1 Graph Encoder

Our graph encoder takes the following data in input:

123
Graph Neural Network for Context-Aware Recommendation 5363

1. User’s Opinion on Items. The matrix A ∈ R Nu ×Nv +R represents user’s rating/opinion


on items. This matrix is composed of Ar sub-matrices where Ar ∈ R Nu ×Nv and r ∈
{1, 2 . . . , R}.
      
A = A1 N ×N A2 N ×N − − −− Ar N ×N
u v u v u v
(1)
Ar [u][v] = 1 ⇐⇒ (u, v) = r : r ∈ {1, 2 . . . , R} (2)

2. Static User’s features. The matrix U F ∈ R Nu ×N Fu consists of normalized static feature


attributes for users.
3. Static Item’s features. The matrix VF ∈ R Nv ×N Fv consists of normalized static feature
attributes for items.
4. Surrounding Context of User-Item Interaction (Auvc ). We have represented user-
item-context interaction using 3D matrix ∈ R Nu ×Nv ×Nc . This binary matrix contains
information about the surrounding context under which the user has provided a specific
opinion on the item. For example, if user U A has rated item V B with rating 5 under context
c1 , c2 , c3 ∈ {c1 , c2 , c3 , . . . , Nc }, then this matrix contain an entry set to 1 for U A , V B
and c1 , c2 , c3 .
5. Favourite Context of Users. The matrix UC ∈ R Nu ×Nc denotes the importance of
context for individual users. We use information from the matrix A (Eq. 1) to give more
weight (α) to the context in which a user has given the high rating, compared to the
context under which the user has rated less.
6. Favourite Context of Items. The matrix VC ∈ R Nv ×Nc use Ar in a similar way as UC
above. The value of the context attributes of an item is high if it is more likely to get a high
rating under a specific context. Thus, giving more importance to the context attributes
under which an item is rated highly.
Next, we explain how a graph encoder operates on the matrices defined above, to learn the
representations of users and items with respect to rating, context and static features.

4.1.1 User-Rating-Item Representation

The user opinions represented in the adjacency matrix A (Eq. 1) map the user’s likeliness
for items in the bipartite graph. We have a local weight sharing graph convolutional layer
for modelling user’s opinion. The local weight sharing mechanism allows having different
convolutional weights based on the edge types. The number of weight matrices is equal
to the possible available rating levels R. The customized message propagation for graph
convolutions uses an edge type-specific parameter matrix Wr . After the message propagation
step, we aggregate the incoming messages at each node by two alternative types of aggregation
functions: sum and stack.
• stack aggregation: concatenating all edge specific matrices along their first dimension.
• sum aggregation: performing an addition of all edge-specific matrices.
Overall, this edge specific message propagation is more effective compared to the general
global message propagation. Our model selection experiments considered summation and
concatenation as alternatives, and we have selected the former for its best overall performance
(in validation). Details of this spectral convolutional layer are defined in the following:
  
z uo = Agg (GC N (X v , Ai )) = σ Agg Ãi X v Wiv (3)
i:0→R i:0→R

123
5364 A. Sattar, D. Bacciu

  
z vo = Agg (GC N (X u , AiT )) =σ Agg ÃiT X u Wiu (4)
i:0→R i:0→R

where X u and X v are the one-hot unique vectors for the user and item node. The term R is
the maximal rating a user can give to an item, Wiu and Wiv represents R trainable weight
matrices and σ is non linear activation function such as ReLU. The matrix Ãi and ÃiT are
the normalized adjacency matrix Ai and its transpose, respectively.
Ãi = D −1/2 Ai D −1/2 ∀ i = 0 to R (5)
where the term D represents a diagonal degree matrix, containing the square root of degree
on diagonal. Similarly, AiT is normalized to get ÃiT (using Eq. 5).

4.1.2 Context Representation

The user-item-context interaction matrix Auvc is normalized by dividing each context attribute
with the total count of context attributes recorded at the time of user-item interaction. The
normalized context attributes are further accumulated to get Ac ∈ R Nu ×Nv .
Ncuv
ciuv
Ac [u] [v] = (6)
Ncuv
i=0

where u and v are user and item indexes in the matrix, Ncuv represents the count of occurrences
of context c when user u has rated item v, ciuv denotes the individual context value under
which user u has rated item v.
We propose to leverage graph convolutions to model user-context-item interactions in
the matrix Ac , with the same message propagation rule as used for modelling user’s opin-
ion (Eq. 3) and (Eq. 4)) but with a single global weight matrix. We represent the user and
item representation with respect to context attributes as z uc1 and z vc1 , respectively. The user’s
behaviour varies with the change in the surrounding context, which makes them react differ-
ently to the same item under different contexts. Similarly, an item gets a different rating when
the surrounding context changes. This makes the context information naturally dynamic. For
modelling this dynamic user-context and item-context relation, we performed a statistical
analysis of training data and identify α importance factor for each user and item, respec-
tively. The α factor gives more importance to the favourite context of users and items. We
have stored the extracted user preferences in UC :
uvi
Nu ,Nc
UC [u][c] = Auvc [u][vi ][c j ] ∗ α[r ] : r ∈ {1, · · · , R} (7)
i, j

where Nu denotes the neighbours of user u, Ncuv represents the number of context attributes
in which the user provides opinion r . We have obtained the context importance for each
item in a similar way (Eq. 7) and stored in VC . Both matrices are normalized to have values
between 0 to 1. We have the simplest dense neural network layer to process this information.
The weight matrices chosen for this purpose are randomly and uniformly distributed and
node dropout is applied to the hidden layers to prevent overfitting. The operations on this
layer are defined as :
z uc2 = σ (UC W3c + bc ) (8)
z vc2 = σ (VC W4c + bc ) (9)

123
Graph Neural Network for Context-Aware Recommendation 5365

To get the final user’s and item’s context representation, we have integrated z uc1 with z uc2 , and
z vc1 with z vc2 .
 
z uc = σ z uc1 ⊕ z uc2 W5c + bc (10)
 c  c
z v = σ z v ⊕ z v W 6 + bc
c 1 c2
(11)
where as W represents trainable weight matrices and b is a bias.

4.1.3 User’s and Item’s Profile Representation

The static features of user and item nodes are represented as U F and VF , respectively. We
have not given these features directly as input in the graph convolution layer as they degrade
the performance in case of sparse user-item content features. Therefore, we have a separate
dense neural network layer to get the static feature representation for user and item nodes.
 
f f
z u = σ U F W3 + b f (12)
 
f
z vf = σ VF W4 + b f (13)
f f
where W3 and W4 represent trainable weight matrices and b f is a bias.

4.1.4 Accumulation with Attention

We have accumulated the user’s representation from rating/opinion (Eq. 3), features (Eq. 12)
and context (Eq. 10) perspective. Here, we introduce the learnable attention weights for
the three representations in cGC MC F . In cGC MC old [19], we have accumulated these
embeddings without considering any learnable attention weights. The last layer of the graph
encoder is a dense neural network layer and is responsible for producing the final embedding
with or without attention weights. For cGC MC F user’s final representation is defined as:
  
f
z u = σ wuo ∗ z uo ⊕ wuc ∗ z uc ⊕ wuo ∗ z u W6 + b . (14)

Similarly, the item’s representations from rating/opinion, context and feature perspective are
concatenated after having attention weights to get the final item embedding.
  
z v = σ wvo ∗ z vo ⊕ wvc ∗ z vc ⊕ wvf ∗ z vf W7 + b . (15)

4.2 Decoder

We use a bilinear decoder that takes context-aware embedding of user-item interaction and
reconstructs rating matrix ( Â) between users and items. Here, we address this problem as
a classification task and each rating is treated as a separate class. The decoder produces a
probability distribution over all classes through a bilinear operation:

eu i Q r v j
T

Âi j = p( Âi j = r ) : p( Âi j = r ) = (16)


eu i Q k v j
T
r ∈R k∈R

where Q r are R trainable matrices of dimension D × D, D is the hidden dimension of user’s


and item’s embedding obtained from encoder and R are the available rating levels. In our

123
5366 A. Sattar, D. Bacciu

setting, we defined Q r as:


nb
Qr = αkr Ws (17)
k=1
Here, k represents the number of linear functions which are chosen to be lower than the rating
level, to avoid overfitting. The term αkr is learnable Ws represents the weight matrix.
We have tested our model with different settings and represented them with different
names: cGC MC and cGC MC F . cGC MC models the effect of context along with an opinion
matrix, while cGC MC F brings the context effect with opinion as well as static features. We
have tested both models with and without attention mechanism. We found that the attention
mechanism improved the performance.

4.2.1 Rating Prediction and Model Training

We evaluate the performance of the proposed algorithm using MAE (Eq. 18) and RMSE
(Eq. 19) metrics with respect to the rating assigned by the user to their interaction with the
item. The choice of these metrics over classification based ones is driven by the nature of the
ratings, which is ordinal rather than multinomial. Hence it is important to capture how closely
the prediction approximates the expected rating (which is not the case for classification-based
metrics). Our model is trained in end-to-end fashion by minimizing the root mean square
error between the actual (Ai j ) and reconstructed rating ( Âi j ).
( Âi, j − Ai, j )
M AE = (18)
n
i, j


 ( Âi, j − Ai, j )2
RMSE =  (19)
n
i, j

where n represents the cardinality of user-item pairs.

5 Experiments

5.1 Datasets

To demonstrate the effectiveness of our proposed algorithms cGC MC and cGC MC F , we


conduct experiments on five real-world publicly available datasets for movies, music and
travel. We summarize the statistics of datasets in Table 1, where density is defined as the ratio
between the number of edges and the cardinality of the (user,items) pairs.
LDOS-CoMoDa1 is a popular movie dataset collected from survey. This dataset contains
user’s opinions on a movie considering the surrounding context. The context information
includes location (home, friend’s house, public place), time (morning, afternoon, evening,
night), day-type (working day, weekend, holiday), weather (sunny, cloudy, rainy, stormy,
snowy), decision (movie choices by themselves or users were given a movie), mood (positive,
negative, neutral), season (summer, winter, spring, autumn), endEmo i.e., emotional state at
the end of watching movie (sad, happy, angry, surprised, neutral, scared, disgusted), domEmo
i.e., emotional state experienced most when watching movie (sad, happy, angry, surprised,
1 https://www.lucami.org/en/research/ldos-comoda-dataset/

123
Table 1 The statistical information defining number of users, items and context attributes along with the edge density and rating levels for each of the datasets used in our
experiments
Dataset LDOS-CoMoDa DePaul Travel-STS InCarMusic Tijuana-Restaurant

No of Users 268 97 325 1042 50


No of Items 4381 79 249 139 40
Graph Neural Network for Context-Aware Recommendation

No of Context variables 12 3 14 8 6
Rating Scale 1−5 1−5 1−5 1−6 1−5
No of Ratings 2278 2270 2534 4012 1422
Density 0.0154 0.6581 0.0313 0.6872 0.711

123
5367
5368 A. Sattar, D. Bacciu

neutral, scared, disgusted), interaction (1st interaction with a movie, N th interaction with a
movie), physical (ill, healthy), companion (alone, friends, partner, family, colleagues, parents,
public). Besides this information, LDOS-CoMoDa also has profile features for users (gender,
age, city, country) and movies (director, language, actor, genre).
DePaulMovie2 is a movie dataset collected by researchers of the DePaul University, with
ratings acquired by survey. Students have been asked to rate movies subject to 3 context
variables: location (home, Cinema), time (weekend, weekday), and companion (partner,
family, alone) information. This dataset does not have user’s and item’s profile features.
Travel-STS3 dataset contains information about places visited by tourists. The context
information includes distance (nearby, far away), time available (half a day, one day, more
than one day), temperature (warm, hot, burning, cool, cold, freezing), season (summer, win-
ter, spring, autumn), crowdedness (empty, crowded, not crowded), mood (happy, active, sad,
lazy), budget (high spender, budget traveler, price for quality), weather (sunny, cloudy, rainy,
clear sky, thunderstorm, snowing), companion (with children, with friends/colleagues, alone,
with family, with girlfriend/boyfriend), weekend (weekday, weekend), travel goal (visiting
friends, religion, business, health care, education, social event, scenic/landscape, hedonis-
tic/fun, activity/sport), means of transport (bicycle, car, public transport, no transportation
means) and knowledge of surrounding (returning visitor, completely new area, citizen of the
area). This dataset also contains user profile features (age, gender).
InCarMusic3 dataset consists of music tracks recommended to passengers based on the
surrounding contextual information. The context information includes driving style (sport
driving, relaxed driving), road type (highway, city, serpentine), landscape (mountains, coast
line, urban, country side), sleepiness (sleepy, awake), traffic conditions (busy road, free road,
traffic jam), mood (happy, active, sad, lazy), weather (sunny, cloudy, rainy, snowing), and
natural phenomena (day time, morning, night, afternoon).
Tijuana Restaurant3 is a restaurant dataset gathered via a survey consisting of 8 inquiries
from persons about various neighbouring cafes. Every restaurant picked was assessed
multiple times, one for every possible context setting. The context information includes
combinations of time and location (c1 : weekday and school, c2 : weekday and home, c3 :
weekday and work, c4 : weekend and school, c5 : weekend and home, and c6 : weekend and
work).
The density value in Table 1 represent a fraction of positive links between the nodes.
Tijuana-Restaurant dataset has a few number of nodes connected with a high number of
edges, while LDOS-CoMoDA dataset has a greater number of nodes connected with few
edges (compared to other datasets). Overall, the effect of high or low density values on the
performance of our models is shown to be negligible in Sect.6.

5.2 Implementation Setup

Our Pytorch implementation4 of the cGC MC and cGC MC F models is publicly available.
We have used 60% data as a training set, 20% as a validation set and 20% as a test set for
each dataset. The data splitting is performed five times. Each time the data is shuffled with a
different random seed before dividing into splits. The average performance of all algorithms
after five runs with different random splits is presented in Sect. 6.

2 https://cran.r-project.org/web/packages/contextual/vignettes/
3 https://github.com/irecsys/CARSKit/blob/master/context-aware_data_sets/
4 https://github.com/asmaAdil/cGCMC

123
Graph Neural Network for Context-Aware Recommendation 5369

Table 2 The average time (sec.) taken by cGCMC and cGCMC F for each dataset
Algorithm Dataset Avg. time per epoch Avg. time for prediction

cGCMCF LDOS CoMoDa 0.29 0.04


Travel-STS 0.089 0.016
cGCMC LDOS CoMoDa 0.197 0.02
Travel-STS 0.079 0.014
DePaul 0.033 0.006
InCarMusic 0.039 0.007
Tijuana Restaurant 0.017 0.004

Table 3 cGCMC F encoder and decoder layers and their respective best output dimension hyperparameter
values
cGCMC F Layers Output dimension

Encoder Layer-1: Local weight sharing GCN do = 500


Layer-2: Dense Layer d f = 10
Layer-3: Global weight sharing GCN dc1 = 150
Layer-4: Dense Layer dc2 = 10
Layer-5: Dense Layer 150
Decoder Layer-6: Bilinear decoder R probabilities

5.2.1 Computational cost

We report the computational costs (in seconds) of cGCMc and cGCMC F , obtained by com-
puting the average time required by a single training epoch and the average time required by
the prediction step (i.e., on the whole testset). Results are presented in the Table 2.

5.2.2 Hyper-parameters

We have evaluated our approach under different configurations. The best value for each
hyper-parameter is shown in bold. We have searched the embedding size for the user’s
opinion representation do in [300, 400, 500, 600]), static features representation d f in
[5, 10, 15, 20, 25] and contextual representation dc1 in [50, 100, 150, 200, 250] (for GCN)
and dc2 in [5, 10, 15, 20, 25] (for the dense layer) as shown in Table 3. We have chosen batch
size from [40, 80, 120, 150, 200]. The last layer of the encoder is set to produce embeddings
of size 75. The node dropout (Pdr op ) rate is tuned in [0.3, 0.4, 0.5, 0.6, 0.7]. Pdr op is the
probability to randomly drop all outgoing messages from specific nodes to train under the
denoising setup. The α importance factor defined as [0.2, 0.3, 0.5, 0.7, 0.8] ∀ r ∈ R, initially
chosen randomly considering the fact: α[r1 ] < α[r2 ] ⇐⇒ r1 < r2 . We can choose from
any set of initial values provided that it satisfies the fact: the context in which the user gives a
high rating should have more weight. The attention weights for opinion, feature, and context
representations are first set to random values and then learned to give appropriate weights for
each of these representations before combining them. All neurons use ReLU nonlinearity and
Adam is employed as the optimization algorithm. The model is trained for 200 epochs.For
baseline algorithms, all parameters are initialized as mentioned in the corresponding papers.

123
5370 A. Sattar, D. Bacciu

5.3 Benchmarks

In the evaluation phase, we have evaluated the test set using predictive performance in terms of
mean absolute error (M AE) and root mean square error (R M S E). We compare our approach
with several link prediction algorithms from the literature as follows :
• SocialMF [14] is a matrix factorization approach that exploits user-user trust information
along with user opinion on the item to predict items for users.
• SVD + + [36] improves the conventional SVD approach by allowing the joint use of
explicit (e.g., user’s rating opinion), and implicit (e.g., purchases, visited items) infor-
mation.
• PMF [37] is a matrix factorization approach for sparse datasets. This exploits the user-
item interactions only to learn user and item embeddings, while forgoing the context
features.
• BiasedMF [38] is an improvement to traditional matrix factorization and it incorporates
bias for user, item, and global bias factors
• GCMC [6] models user’s opinion leveraging the rating matrix between users and items
for matrix completion task.
• GCMC+feat [6] extended GC MC by integrating static features inside the user and item
nodes for link prediction in a bipartite graph.
• GraphRecuu uv [30] algorithm exploits the social relation between users along with user-
item interactions for link prediction in user-item bipartite graph.

6 Performance Comparison

Table 4 presents a comparison between the previous version of our algorithm (subscript with
’old’) with the extended version, and Table 5 presents the performance comparison of our
approach with other state-of-art algorithms. Our two datasets (LDOS-CoMoDa and Travel-
STS) contain user and item (description) features along with the user’s opinion on items and
context information. For the other three datasets (DePaul, InCarMusic, Tijuana-Restaurant),
we have only user’s opinion on the item and contextual information. The algorithms that are
integrating user’s and item’s feature information are not applicable to the later category of
datasets (indicated inside tables with the NA mark, as in “Not Applicable”).
• A clear performance difference can be seen between the old and extended versions of our
model on all datasets (provided in Table 4). This is purely due to the newly introduced
attention factor in the last layer of the encoder.
• Basic matrix factorization approaches, P M F and Biased M F, that solely model
user-item interaction as isolated instances, ignore side information thus limiting their rep-
resentation ability. These approaches perform worse compared to all baseline algorithms
on all datasets because of their limitation to integrate knowledge about surroundings.
• The SV D++, Social M F, and Graph Recuv uu perform better than basic matrix factoriza-

tion approaches as they capture and integrate knowledge about an individual user in the
form of social trust or by using implicit feedback. Despite of integrating side information,
these approaches perform worse than our method because of the advantageous effect of
surrounding contextual learning.
• When comparing our proposed algorithm with GNN based approaches (GC MC and
GC MC + f eat), we can identify a significant improvement in performance motivated
by the capability of providing context-aware recommendations.

123
Table 4 Test performance comparison with state-of-art algorithms. Best results are marked in bold letters
ALGORITHM LDOS-CoMoDa DePaul Travel-STS InCarMusic Tijuana-Restaurant
MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE

cGCMC 0.77 ± 0.01 1.03 ± 0.01 1.03 ± 0.01 1.21 ± 0.01 0.85 ± 0.02 1.03 ± 0.02 1.15 ± 0.01 1.25 ± 0.01 0.93 ± 0.01 1.23 ± 0.01
Graph Neural Network for Context-Aware Recommendation

cGCMCold [19] 0.938 ± 0.01 1.15 ± 0.01 1.04 ± 0.01 1.21 ± 0.01 0.96 ± 0.02 1.17 ± 0.02 1.19 ± 0.01 1.30 ± 0.01 1.07 ± 0.01 1.28 ± 0.01
cGCMC F 0.853 ± 0.01 1.10 ± 0.01 NA NA 0.91 ± 0.02 1.12 ± 0.02 NA NA NA NA
cGCMCold
F [19] 0.918 ± 0.01 1.127 ± 0.01 NA NA 0.932 ± 0.02 1.14 ± 0.02 NA NA NA NA

123
5371
5372

123
Table 5 Test-set performance comparison with state-of-art algorithms. Best results are marked in bold
ALGORITHM LDOS-CoMoDa DePaul Travel-STS InCarMusic Tijuana-Restaurant
MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE

cGCMC 0.77 ± 0.01 1.03 ± 0.01 1.03 ± 0.01 1.21 ± 0.01 0.85 ± 0.02 1.03 ± 0.02 1.15 ± 0.01 1.25 ± 0.01 0.93 ± 0.01 1.23 ± 0.01
cGCMC F 0.853 ± 0.01 1.10 ± 0.01 NA NA 0.91 ± 0.02 1.12 ± 0.02 NA NA NA NA
GCMC [6] 1.12 ± 0.01 1.33 ± 0.01 1.18 ± 0.00 1.42 ± 0.00 1.09 ± 0.02 1.32 ± 0.02 1.21 ± 0.01 1.42 ± 0.01 1.25 ± 0.01 1.47 ± 0.01
GCMC+feat [6] 1.001 ± 0.01 1.24 ± 0.01 NA NA 0.95 ± 0.01 1.23 ± 0.01 NA NA NA NA
SocialMF [14] 0.96 ± 0.01 1.28 ± 0.02 1.06 ± 0.01 1.29 ± 0.01 1.12 ± 0.01 1.46 ± 0.01 1.34 ± 0.01 1.56 ± 0.02 1.09 ± 0.01 1.28 ± 0.01
SVD++ [36] 1.10 ± 0.01 1.45 ± 0.01 1.17 ± 0.02 1.40 ± 0.01 1.20 ± 0.01 1.36 ± 0.02 1.20 ± 0.01 1.41 ± 0.01 1.13 ± 0.01 1.32 ± 0.01
PMF [37] 1.38 ± 0.00 1.75 ± 0.00 1.19 ± 0.01 1.44 ± 0.01 1.14 ± 0.00 1.49 ± 0.00 1.37 ± 0.01 1.58 ± 0.01 1.30 ± 0.01 1.54 ± 0.01
BiasedMF [38] 1.46 ± 0.02 1.78 ± 0.02 1.20 ± 0.02 1.46 ± 0.02 1.13 ± 0.01 1.45 ± 0.01 1.45 ± 0.02 1.65 ± 0.02 1.41 ± 0.02 1.68 ± 0.02
GraphRecuu
uv [30] 1.16 ± 0.02 1.32 ± 0.02 1.25 ± 0.03 1.45 ± 0.03 1.20 ± 0.02 1.36 ± 0.02 1.25 ± 0.02 1.40 ± 0.02 1.18 ± 0.01 1.34 ± 0.01
A. Sattar, D. Bacciu
Graph Neural Network for Context-Aware Recommendation 5373

1.4
α α-ablated
1.2

0.8
MAE

0.6

0.4

0.2

0
LDOS-CoMoDa Tijuana-Restaurant Travel-STS InCarMusic DePaul

Fig. 4 Effect of importance factor α on cGC MC in terms of MAE

Overall, our model outperforms all baseline approaches on all datasets, providing sufficient
grounding to state the importance of being able to take into consideration the surrounding
knowledge of the context to provide accurate recommendations.

6.1 Impact of Context Modeling

The major contribution of our approach is to organize context features on edges with user-item
interaction in an effective way. We have used the α importance factor to learn favourite sur-
rounding context features for target user and item for context-aware link prediction. We hence
execute ablation study, to validate the rationality and usefulness of α. We already explained
how context importance varies from person to person and different context attributes effect
differently on the items. The Fig. 4 demonstrates the positive effect of capturing this impor-
tance factor in our model. This is clearly due to prioritizing the contexts which are important
for users and items by giving them more weight.

6.2 Impact of Attention Weights

We have three kinds of representations for the individual user and item (opinions, feature,
and context as mentioned in Sect. 4.1). For the accumulation of these three representations,
we determine that the concatenation of the representations is better in performance com-
pared to summation. That is why we mentioned the results with concatenation only. We have
introduced learnable attention weights for each representation before accumulating them.
These learnable weights provide a different significance for each representation (i.e., opin-
ion, contextual, and feature representation for users and items) in the final embedding. User’s
(or Item’s) opinion representation contains information about the neighbouring nodes with
respect to opinion information. Similarly, the contextual representation contains neighbour-
ing nodes with respect to contextual information. The final representation for the user is
an accumulation of these along with a dense feature representation. We believe that these
representations have their own impact on the final node representation with some factors,
which we call learnable weight. It might be possible that for some users opinion-based neigh-

123
5374 A. Sattar, D. Bacciu

1.4 with-attention attention-ablated


1.2

0.8
MAE

0.6

0.4

0.2

0
LDOS-CoMoDa Tijuana-Restaurant Travel-STS InCarMusic DePaul

Fig. 5 Effect of accumulation with attention in terms of MAE

bourhood is preeminent than context-based neighbourhood and vice versa. We performed


an ablation study of this design to demonstrate the effectiveness and rationality of weighted
representations (in Eq. 14 and Eq. 15). The positive impact of attention weights on cGCMC F
(LDOS-CoMoDa and Travel-STS) and on cGCMC (DePaul, InCarMusic, Tijuana-Restaurant
and Travel-STS) in terms of MAE are shown in Fig. 5.

7 Conclusion

We have focused our work on emphasizing the impact of knowledge about the surrounding
context on user-item interaction. To this end, we have organized context, opinion, and item
features into a bipartite graph and an associated multidimensional matrix. We approached
the resulting matrix completion task using a graph convolutional autoencoder. Our graph
encoder captures the context information along with opinion in user-item interactions. We
also showed how the model leverages context information to capture the user’s behaviour
in relation to the surrounding context, giving attention to the most important contextual
aspects of the user and item. Furthermore, the bilinear decoder predicts the labelled edges
between the user and item. To demonstrate the effectiveness of our approach, we tested it
on five public datasets, showing significant improvements over state-of-the-art baselines.
We have conducted various experiments to verify how context representation gives benefit.
The application of our model is not only limited to product recommender systems in smart
devices, i.e., music/movie/travel/fashion recommendations. This model can also be used
for several intelligent predictions by developing further for specific domains like personal
medical reminders for elders and smart device setting controller based on the surrounding
context.
In this work, the accumulative approach unifies all context information, neglecting the
dynamic nature of some contextual attributes. This may result in losing the diversity of
individual context attributes. In the future, we would like to explore multi-dimensional edge
feature-based GNNs and multi-way interactions between users and items to capture more
realistically the dynamic behaviours. Furthermore, we intend to investigate the use of separate
embeddings for user and item contexts and to evaluate the performance on a large scale dataset.

123
Graph Neural Network for Context-Aware Recommendation 5375

On a different side, we want to extend our model to deal with heterogeneous graphs which
consist of nodes of different types and different context information on different edges.
Funding Open access funding provided by Universitá di Pisa within the CRUI-CARE Agreement.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References
1. Katarya R, Verma OP (2016) Recent developments in affective recommender systems. Physica A 461:182–
190
2. Karimova F (2016) A survey of e-commerce recommender systems. Eur Sci J 12(34):75–89
3. Li X, Li D (2019) An improved collaborative filtering recommendation algorithm and recommendation
strategy. Mobile Information Systems, p 3560968
4. Zarzour H, Maazouzi F, Soltani M, Chemam C (2018) An improved collaborative filtering recommen-
dation algorithm for big data. In: IFIP International Conference on Computational Intelligence and Its
Applications, pp 660–668. Springer
5. Bacciu D, Errica F, Micheli A, Podda M (2020) A gentle introduction to deep learning for graphs. Neural
Netw 129:203–221. https://doi.org/10.1016/j.neunet.2020.06.006
6. Berg Rvd, Kipf TN, Welling M (2017) Graph convolutional matrix completion. arXiv preprint
arXiv:1706.02263
7. Zhang J, Shi X, Zhao S, King I (2019) Star-gcn: Stacked and reconstructed graph convolutional networks
for recommender systems. In: The 28th International Joint Conference on Artificial Intelligence, pp
4264–4270
8. Wang X, He X, Wang M, Feng F, Chua T-S (2019) Neural graph collaborative filtering. In: Proceedings of
the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval,
pp 165–174
9. Wang X, Jin H, Zhang A, He X, Xu T, Chua T-S (2020) Disentangled graph collaborative filtering.
In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in
Information Retrieval, pp 1001–1010
10. Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: A survey of the
state of the art and future challenges. ACM Computing Surveys (CSUR) 47(1):1–45
11. Baltrunas L, Amatriain X (2009) Towards time-dependant recommendation based on implicit feedback.
In: Workshop on Context-aware Recommender Systems (CARS’09), pp 25–30. Citeseer
12. Panniello U, Tuzhilin A, Gorgoglione M, Palmisano C, Pedone A (2009) Experimental comparison of
pre-vs. post-filtering approaches in context-aware recommender systems. In: Proceedings of the Third
ACM Conference on Recommender Systems, pp 265–268
13. Karatzoglou A, Amatriain X, Baltrunas L, Oliver N (2010) Multiverse recommendation: n-dimensional
tensor factorization for context-aware collaborative filtering. In: Proceedings of the Fourth ACM Confer-
ence on Recommender Systems, pp 79–86
14. Jamali M, Ester M (2010) A matrix factorization technique with trust propagation for recommendation in
social networks. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp 135–142
15. He X, Chua T-S (2017) Neural factorization machines for sparse predictive analytics. In: Proceedings of
the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval,
pp 355–364
16. Lian J, Zhou X, Zhang F, Chen Z, Xie X, Sun G (2018) xdeepfm: Combining explicit and implicit
feature interactions for recommender systems. In: Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, pp 1754–1763
17. Xin X, Chen B, He X, Wang D, Ding Y, Jose J (2019) Cfm: Convolutional factorization machines for
context-aware recommendation. IJCAI 19:3926–3932

123
5376 A. Sattar, D. Bacciu

18. Liu H, Zhang H, Hui K, He H (2015) Overview of context-aware recommender system research. In: 3rd
International Conference on Mechatronics, Robotics and Automation. Atlantis Press
19. Sattar A, Bacciu D (2021) Context-aware graph convolutional autoencoder. In: International Work-
Conference on Artificial Neural Networks, pp 279–290. Springer
20. Peng H, Jin Y, Lv X, Wang X (2019) A context aware poi recommendation algorithm based on matrix
decomposition. J Comput Sci 42:1797–1811
21. Baltrunas L, Ludwig B, Ricci F (2011) Matrix factorization techniques for context aware recommendation.
In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp 301–304
22. Gao Q, Ma P (2020) Graph neural network and context-aware based user behavior prediction and recom-
mendation system research. Computational Intelligence and Neuroscience, p 8812370
23. Chen J, Zhang H, He X, Nie L, Liu W, Chua T-S (2017) Attentive collaborative filtering: Multimedia
recommendation with item-and component-level attention. In: Proceedings of the 40th International ACM
SIGIR Conference on Research and Development in Information Retrieval, pp 335–344
24. Wu S, Zhang W, Sun F, Cui B (2020) Graph neural networks in recommender systems: A survey. arXiv
preprint arXiv:2011.02260
25. Li Z, Cui Z, Wu S, Zhang X, Wang L (2019) Fi-gnn: Modeling feature interactions via graph neural
networks for ctr prediction. In: Proceedings of the 28th ACM International Conference on Information
and Knowledge Management, pp 539–548
26. Zhang M, Chen Y (2020) Inductive matrix completion based on graph neural networks. In: International
Conference on Learning Representations
27. Zheng L, Lu C-T, Jiang F, Zhang J, Yu, PS (2018) Spectral collaborative filtering. In: Proceedings of the
12th ACM Conference on Recommender Systems, pp 311–319
28. Wu Y, Liu H, Yang Y (2018) Graph convolutional matrix completion for bipartite edge prediction. In:
KDIR, pp 49–58
29. Yin R, Li K, Zhang G, Lu J (2019) A deeper graph neural network for recommender systems. Knowl-Based
Syst 185:105020
30. Fan W, Ma Y, Li Q, He Y, Zhao E, Tang J, Yin D (2019) Graph neural networks for social recommendation.
In: The World Wide Web Conference, pp 417–426
31. Rendle S (2010) Factorization machines. In: 2010 IEEE International Conference on Data Mining, pp
995–1000. IEEE
32. Trivedi R, Farajtabar M, Biswal P, Zha H (2018) Representation learning over dynamic graphs. arXiv
preprint arXiv:1803.04051
33. Rossi E, Chamberlain B, Frasca F, Eynard D, Monti F, Bronstein M (2020) Temporal graph networks for
deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637
34. Li X, Zhang M, Wu S, Liu Z, Wang L, Philip SY (2020) Dynamic graph collaborative filtering. In: 2020
IEEE International Conference on Data Mining (ICDM), pp 322–331. IEEE
35. Sankar A, Wu Y, Gou L, Zhang W, Yang H (2018) Dynamic graph representation learning via self-attention
networks. arXiv preprint arXiv:1812.09430
36. Xian Z, Li Q, Li G, Li L (2017) New collaborative filtering algorithms based on svd++ and differential
privacy. Mathematical Problems in Engineering
37. Mnih A, Salakhutdinov RR (2008) Probabilistic matrix factorization. In: Advances in Neural Information
Processing Systems, pp 1257–1264
38. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer
42(8):30–37

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

123

You might also like