100% found this document useful (1 vote)

192 views14 pages

A Hierarchical Attention Model For Social Contextual Image Recommendation

This document summarizes a research paper that proposes a hierarchical attention model for social contextual image recommendation. The model aims to better capture the complex factors that influence users' image preferences on social media platforms. It identifies three key aspects that affect preferences: upload history, social influence, and owner admiration. A hierarchical attention network is designed to model these aspects at different levels and learn their importance for recommendation. Experimental results on real-world datasets show the proposed model outperforms alternatives in social image recommendation.

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

192 views14 pages

A Hierarchical Attention Model For Social Contextual Image Recommendation

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2913394, IEEE
Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1

A Hierarchical Attention Model for Social

Contextual Image Recommendation
Le Wu Member, IEEE, Lei Chen, Richang Hong, Member, IEEE, Yanjie Fu,
Xing Xie, Senior Member, IEEE, Meng Wang, Senior Member, IEEE

Abstract—Image based social networks are among the most popular social networking services in recent years. With tremendous
images uploaded everyday, understanding users’ preferences on user-generated images and making recommendations have become
an urgent need. In fact, many hybrid models have been proposed to fuse various kinds of side information (e.g., image visual
representation, social network) and user-item historical behavior for enhancing recommendation performance. However, due to the
unique characteristics of the user generated images in social image platforms, the previous studies failed to capture the complex
aspects that influence users’ preferences in a unified framework. Moreover, most of these hybrid models relied on predefined weights
in combining different kinds of information, which usually resulted in sub-optimal recommendation performance. To this end, in this
paper, we develop a hierarchical attention model for social contextual image recommendation. In addition to basic latent user interest
modeling in the popular matrix factorization based recommendation, we identify three key aspects (i.e., upload history, social influence,
and owner admiration) that affect each user’s latent preferences, where each aspect summarizes a contextual factor from the complex
relationships between users and images. After that, we design a hierarchical attention network that naturally mirrors the hierarchical
relationship (elements in each aspects level, and the aspect level) of users’ latent interests with the identified key aspects. Specifically,
by taking embeddings from state-of-the-art deep learning models that are tailored for each kind of data, the hierarchical attention
network could learn to attend differently to more or less content. Finally, extensive experimental results on real-world datasets clearly
show the superiority of our proposed model.

1 I NTRODUCTION
There is an old saying “a picture is worth a thousand extreme data sparsity of the user-image interaction behavior
words”. When it comes to social media, it turns out that limits the recommendation performance [2], [26]. On one
visual images are growing much more popularity to at- hand, some recent works proposed to enhance recommen-
tract users [14]. Especially with the increasing adoption dation performance with visual contents learned from a
of smartphones, users could easily take qualified images (pre-trained) deep neural network [18], [49], [5]. On the
and upload them to various social image platforms to other hand, as users perform image preferences in social
share these visually appealing pictures with others. Many platforms, some social based recommendation algorithms
image-based social sharing services have emerged, such as utilized the social influence among users to alleviate data
Instagram1 , Pinterest2 , and Flickr3 . With hundreds of millions sparsity for better recommendation [33], [24], [3]. In sum-
of images uploaded everyday, image recommendation has mary, these studies partially solved the data sparsity issue
become an urgent need to deal with the image overload of social-based image recommendation. Nevertheless, the
problem. By providing personalized image suggestions to problem of how to better exploit the unique characteristics
each active user in image recommender system, users gain of the social image platforms in a holistical way to enhance
more satisfaction for platform prosperity. E.g., as reported recommendation performance is still under explored.
by Pinterest, image recommendation powers over 40% of In this paper, we study the problem of understanding
user engagement of this social platform [30]. users’ preferences for images and recommending images in
Naturally, the standard recommendation algorithms pro- social image based platforms. Fig. 1 shows an example of
vide a direct solution for the image recommendation a typical social image application. Each image is associated
task [2]. For example, many classical latent factor based with visual information. Besides showing likeness to im-
Collaborative Filtering (CF) algorithms in recommender ages, users are also creators of these images with the upload
systems could be applied to deal with user-image inter- behavior. In addition, users connect with others to form a
action matrix [26], [40], [26]. Successful as they are, the social network to share their image preferences. The rich
heterogeneous contextual data provides valuable clues to
• L. Wu, L. Chen, R. Hong, M. Wang are with the School of Computer and
infer users’ preferences to images. Given rich heterogeneous
Information, Hefei University of Technology, Hefei, Anhui 230009, China. contextual data, the problem of how to summarize the
Emails: {lewu.ustc, chenlei182979,hongrc.hfut,eric.mengwang}@gmail.com. heterogeneous social contextual aspects that influence users’
• Y. Fu is with the Department of Computer Science, University of preferences to these highly subjective content is still unclear.
Missouri-Rolla, Rolla, MO, USA. Email: [email protected].
• X. Xie is with Microsoft Research, Beijing, China. What’s more, in the preference decision process, different
Email: [email protected]. users care about different social contextual aspects for their
1. https://www.instagram.com personalized image preference. E.g. Lily likes images that are
2. https://www.pinterest.com similar to her uploaded images, while Bob is easily swayed
3. https://www.flickr.com by social neighbors to present similar preference as her

1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2019.2913394, IEEE
Transactions on Knowledge and Data Engineering
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2

Fig. 1. An overall framework of social contextual image recommendation, where the left part shows the data characteristics of the
platform, and the right part shows our proposed model.

social friends. In other words, the unique user preference cial contextual aspects that affect users’ preferences
for balancing these complex social contextual aspect makes from heterogeneous data sources.
the recommendation problem more challenging. 2) We design a hierarchical attention network to model
To address the challenges mentioned above, in this the hierarchical structure of social contextual recom-
paper, we design a hierarchical attention model for social mendation. In the attention networks, we feed em-
image recommendation. The proposed model is built on the beddings from state-of-the-art deep learning models
popular latent factor based models, which assumes users that are tailored for each kind of data into the
and items could be projected in a low latent space [34]. attention networks. Thus, the attention networks
In our proposed model, for each user, in addition to basic could learn to attend differently based on the rich
latent user interest vector, we identify three key aspects (i.e., contextual information for user interest modeling.
upload history, social influence and owner admiration) that 3) We conduct extensive experiments on real-world
affect each user’s preference, where each aspect summarizes datasets. The experimental results clearly show the
a contextual factor from the complex relationships between effectiveness of our proposed model.
users and images. Specifically, the upload history aspect
summarizes each user’s uploaded images to characterize
her interest. The social influence aspect characterizes the 2 R ELATED W ORK
influence from the social network structure, and the owner We summarize the related work in the following four cate-
admiration aspect depicts the influence from the uploader gories.
of the recommended image. The three key aspects are General Recommendation. Recommender systems
combined to form the auxiliary user latent embedding. could be classified into three categories: content based meth-
Furthermore, since not all aspects are equally important for ods, Collaborative Filtering (CF) and the hybrid models [2].
personalized image recommendation, we design a hierar- Among all models for building recommender systems, la-
chical attention structure that attentively weight different tent factor based models from the CF category are among
aspects for each user’s auxiliary embedding. The proposed the most popular techniques due to their relatively high
hierarchical structure aims at capturing the following two performance in practice [40], [34], [39]. These latent factor
distinctive characteristics. First, as social contextual recom- based models decomposed both users and items in a low
mendation naturally exhibits the hierarchical structure (var- latent space, and the preference of a user to an item could
ious elements from each aspect, and the three aspects of each be approximated as the inner product between the corre-
user), we likewise construct user interest representation sponding user and item latent vectors. In the real-world
with a hierarchical structure. In the hierarchical structure, applications, instead of the explicit ratings, users usually
we first build auxiliary aspect representations of each user, implicitly express their opinions through action or inaction.
and then aggregate the three aspect representations into an Bayesian Personalized Ranking (BPR) is such a popular
auxiliary user interest vector. Second, as different elements latent factor based model that deals with the implicit feed-
within each aspect, and different aspects are differentially back [40]. Specifically, BPR optimized a pairwise based
informative for each user in the recommendation process, ranking loss, such that the observed implicit feedbacks are
the hierarchical attention network builds two levels of at- preferred to rank higher than that of the unobserved ones.
tention mechanisms that apply at the element level and the As users may simultaneously express their opinions with
aspect level. several kinds of feedbacks (e.g., click behavior, consumption
We summarize the contributions of this paper as follows: behavior). SVD++ is proposed to incorporate users’ different
feedbacks by extending the classical latent factor based
1) We study the problem of image recommendation models, assuming each user’s latent factor is composed of
in social image based platforms. By considering the a base latent factor, and an auxiliary latent factor that can
uniqueness of these platforms, we identify three so- be derived from other kinds of feedbacks [26]. Due to the

performance improvement and extensibility of SVD++, it is the graph structural information into a low latent space,
widely studied to incorporate different kinds of information, such that each node is represented as an embedding in this
e.g., item text [58], multi-class preference of users [36]. latent space. Many network embedding models have been
Image Recommendation. In many image based so- proposed [37], [44], [48], [47]. The network embedding could
cial networks, images are associated with rich context in- be used for the attention networks. We distinguish from
formation, e.g., the text in the image, the hashtags. Re- these works as the focus of this paper is not to advance
searchers proposed to apply factorization machines for im- the sophisticated network embedding models. We put em-
age recommendation by considering the rich context in- phasis on how to enhance recommendation performance by
formation [6]. Recently, deep Convolutional Neural Net- leveraging various data embeddings.
works(CNNs) have been successfully applied to analyzing Attention Mechanism. Neural science studies have
visual imagery by automatic image representation in the shown that people focus on specific parts of the input rather
modeling process [27]. Thus, it is a natural idea to leverage than using all available information [22]. Attention mech-
visual features of CNNs to enhance image recommendation anism is such an intuitive idea that automatically models
performance [18], [28], [17], [5]. E.g., VBPR is an extension of and selects the most pertinent piece of information, which
BPR for image recommendation, on top of which it learned learns to assign attentive weights for a set of inputs, with
an additional visual dimension from CNN that modeled higher (lower) weights indicate that the corresponding in-
users’ visual preferences [18]. There are some other image puts are more informative to generate the output. Attention
recommendation models that tackled the temporal dynam- mechanism is widely used in many neural network based
ics of users’ preferences to images over time [17], or users’ tasks, such as machine translation [4] and image caption-
location preferences for image recommendation [35], [49], ing [53]. Recently, the attention mechanism is also widely
[35]. As well studied in the computer vision community, in used for recommender systems [19], [52], [43], [41]. Given
parallel to the visual content information from deep CNNs, the classical collaborative filtering scenario with user-item
images convey rich style information. Researchers showed interaction behavior, NAIS extended the classical item based
that many brands post images that show the philosophy recommendation models by distinguishing the importance
and lifestyle of a brand [14], images posted by users also of different historical items in a user profile [19]. With users’
reflect users’ personality [13]. Recently, Gatys et al. proposed temporal behavior, the attention networks were proposed
a new model of extracting image styles based on the feature to learn which historical behavior is more important for
maps of convolutional neural networks [10]. The proposed the user’s current temporal decision [31], [32]. A lot of
model showed high perceptual quality for extracting image attention based recommendation models have been devel-
style, and has been successfully applied to related tasks, oped to better exploit the auxiliary information to improve
such as image style transfer [11], and high-resolution image recommendation performance. E.g., ANSR is proposed with
stylisation [12]. We argue that the visual image style also a social attention module to learn adaptive social influence
plays a vital role for evaluating users’ visual experience in strength for social recommendation [43]. Given the review
recommender systems. Thus, we leverage both the image or the text of an item, attention networks were developed
content and the image style for recommendation. to learn informative sentences or words for recommenda-
Social Contextual Recommendation. Social scientists tion [15], [41]. While the above models perform the standard
have long converged that a user’s preference is similar to vanilla attention to learn to attend on a specific piece of
or influenced by her social connections, with the social information, the co-attention mechanism is concerned to
theories of homophily and social influence [3]. With the learn attention weights from two sequences [21], [56], [46].
prevalence of social networks, a popular research direction E.g., in the hashtag recommendation with both text and
is to leverage the social data to improve recommendation image information, the co-attention network is designed to
performance [33], [23], [24], [51]. E.g., Ma et al. proposed a learn which part of the text is distinctive for images, and si-
latent factor based model with social regularization terms multaneously the important visual features for the text [56].
for recommendation [33]. Since most of these social recom- Besides, researchers have made a comprehensive survey the
mendation tasks are formulated as non-convex optimizing attention based recommendation models [57]. In some real-
problems, researchers have designed an unsupervised deep world applications, there exists hierarchical structure among
learning model to initialize model parameters for better the data, several pioneering works have been proposed to
performance [9]. Besides, ContextMF is proposed to fuse deal with this kind of relationship [54], [29]. E.g., a hierar-
the individual preference and interpersonal influence with chical attention model is proposed to model the hierarchical
auxiliary text content information from social networks [24]. relationships of word, sentence and document for document
As the implicit influence of trusts and ratings are valuable classification [54]. Our work borrows ideas from the atten-
for recommendation, TrustSVD is proposed to incorporate tion mechanism, and we extend this idea by designing a
the influence of trusted users on the prediction of items for hierarchical structure to model the complex social contex-
an active user [16]. The proposed technique extended the tual aspects that influence users’ preferences. Nevertheless,
SVD++ with social trust information. Social recommenda- different from the natural hierarchical structure of words,
tion has also been considered with social circle [38], online sentences and documents in natural language processing,
social recommendation [59], social network evolution [50], the hierarchial structure that influences a user’s decision
and so on. from complex heterogenous data sources is summarized by
Besides, as the social network could be seen as a graph, our proposed model. Specifically, our proposed model has
the recent surge of network embedding is also closely re- a two-layered hierarchical structure with the bottom layer
lated to our work [8]. Network embedding models encode attention network that summarizes each aspect from the

various elements of this aspect. By taking the output of

each aspect from the bottom layer, the top-layer attention
network learns the importance of the three aspects.
The work that is most similar to ours is the Attentive
Collaborative Filtering (ACF) for image and video recom-
mendation [5]. By assuming there exists item level and
component level implicitness that underlines a user’s pref-
erence, an attention based recommendation model is pro-
posed with the component level attention and the item level
attention. Our work borrows the idea of applying attention
mechanism for recommendation, and it differs from ACF
and previous works from both the research perspective
and the application point. From the technical perspective, Fig. 2. The image embedding process. For each image i, the
we model the complex social contextual aspects of users’ original image is passed through a VGG19 network. We use the
interests from heterogeneous data sources in a unified rec- vector of the last connected layer, i.e., fic as its content repre-
sentation. The Gram matrices Gl on the feature responses of a
ommendation model. In contrast, ACF only leverages the number of layers are computed. We concatenate the vectorized
image (video) content information. From the application representations of the typical Gram matrix sequences as the
view, our proposed model could benefit researchers and image style representation, i.e., fis .
engineers in related areas when heterogeneous data are
available.
the visual images, and then give the problem definition.
Please note that, the problem of how to design sophisticated
3 H ETEROGENEOUS DATA E MBEDDING AND network embedding techniques, and the visual image fea-
P ROBLEM D EFINITION tures are well researched. Since the focus of this paper is not
to advance these topics, we adopt state-of-the-art models
In a social image platform, there are a set of users U (|U | = M )
and put emphasis on enhancing the recommendation per-
and a set of images V (|V | = N ). Besides rating images as
formance with the rich social contextual information.
standard recommender systems, users also perform two
For the social network S, the social embedding part
kinds of behaviors: uploading images and building social links.
tries to learn the distributed representation of each user
We represent users’ three kinds of behaviors with three
in the social network S, which encodes social relations in
matrices: a rating matrix R ∈ RM ×N , an upload matrix
a continuous vector space. Since the focus of this paper
L ∈ RN ×M , and a social link matrix S ∈ RM ×M . Each
is not to design more sophisticated models for network
element rai in the rating matrix R represents the implicit
embedding, we exploit Deepwalk [37] for social embedding
rating preference of user a to image i, with rai = 1 denotes
as it is time-efficient and shows high performance in many
user a likes image i, otherwise it equals 0. sba = 1 if user
network based applications. Deepwalk takes S as input and
a follows (connects to) user b, otherwise it equals 0. If
outputs the social latent representation E ∈ Rd×M , with the
the social platform is undirected, a connects to b means
a-th column ea denotes the latent representation of user a.
sab = 1 and sba = 1. We use sa = [s1a , s2a , ..., sM a ] to
denote the social connections of a, i.e., the a-th column of S. For each image, it provides rich information including its
Please note that different from traditional social networking content as well as its style. Traditionally, convolution neural
platforms (e.g., the social movie sharing platform), users networks have enjoyed great success for learning useful
in these platforms are both image consumers (i.e., reflected image visual content features in recent years [27], [42]. We
in the rating behavior) and image creators (reflected in the choose VGG19 for visual content feature extraction as it is
upload behavior). Each element lia in the upload matrix L a state-of-the-art convolutional neural network architecture
denotes whether the image i is uploaded (created) by user that shows powerful capability to capture the image seman-
a. In other words, if a is the creator of image i, then lia = 1, tics [42]. As commonly adopted by many works, we use
otherwise it equals 0. Since each image can be uploaded by the 4096 dimensional representation in the last connected
only one user, we have M
P layer in VGG19 as the visual content representation, i.e.,
a=1 lia = 1. For ease of explanation,
we use Ci to denote the creator of image i. And the image each image i’s visual feature fic has 4096 dimensions [18],
upload history of a is denoted as la , i.e., the a-th column of [55].
L. Without confusion, we use a, b, c to represent users and Besides image content representation, the image style
i, j, k to denote items. also plays a vital role for users’ visual experience. When
users browse images in social platforms, their preferences
are not only decided by “what is the content of the image?”,
3.1 Heterogeneous Data Embedding but also “does the style of the image meets my preference?”.
Since there are heterogeneous data sources in this platform, To this end, for each image i, besides its content represen-
it is natural to adopt the state-of-the-art data embedding tation fic , we propose to borrow state-of-the-art image style
techniques to preprocess the social network S and the visual representation models to capture its style representation fis .
images. The learned embeddings are easier to be exploited We choose a popular image style representation method
by the following proposed model than directly dealing with proposed by Gatys et al. [10]. This method has shown high
the heterogeneous data sources. Next, we would first briefly perceptual quality and is widely used in many image-style
introduce the embedding models for the social network and based tasks [11], [12]. This style describing model is based

on the powerful feature spaces learned by convolutional Specifically, in the above definition, sa , la , and Ci denotes
neural networks, with the assumption that the styles are the inputs of the three social contextual aspects, i.e., upload
agnostic to the spatial information in the image hidden rep- history aspect, social influence aspect and the creator admi-
resentations. With the trained VGG19 architecture, suppose ration aspect.
a layer l has Nl distinct filter feature maps, each of which In the following of this paper, we use bold capital letters
is vectorized into a size of Ml . Let Bl ∈ RNl ×Ml denotes the to denote matrices, and small bold letters to denote vectors.
filter at layer l, with bljk is the activation of the j-th filter at For any matrix (e.g., social graph S), its i-th column vector
position k . A summary Gram statistic is proposed to discard is denoted as the corresponding small letter with a subscript
the spatial information in the feature maps by computing index i (e.g., the i-th column of S is denoted as sa ). We list
their relations as: some mathematical notations in Table 1.
l
X
gij = blik bljk (1) TABLE 1
k Mathematical Notations
l Notations Description
where Gl ∈ RNl ×Nl is the Gram matrix, with gij denotes the
U userset, |U | = M
correlation between feature map i and j in layer l. Natu-
V imageset, |V | = N
rally, the set of Gram matrices G1 , G2 , ..., GL from different a,b,c,u user
layers of VGG19 provides descriptions of the image style. i,j,k,v image
In practice, researchers found that the style representations R ∈ RM ×N rating matrix, with rai denotes
on layers ‘con1 1’, ‘conv2 1’, ‘con3 1’, ‘con4 1’ and ‘con5 1’ whether a likes image i
S ∈ RM ×M social network matrix, with sba denotes
can well represent the textures of an image [11], [12]. As the whether a follows b
sizes of these Gram matrices are very large, we downsample L ∈ RN ×M upload matrix, with lia denotes
each Gram matrix into a fixed size of 32 × 32, and then whether a uploads image i
concatenate the vector representation of the downsampled sa ∈ RM the a-th column of S,
which denotes the social connections of a
Gram matrices of the five layers. Since there are 5 Gram
la ∈ RN the a-th column of L,
matrices and each each vectorized Gram matrix has 1024 which denotes the uploaded history of a
dimensions, the style representation fis of image i has 5120 Ci ∈ U the creator (owner) of image i, Ci = [a : Lai = 1]
dimensions. ea the social embedding of user a from
social embedding matrix E ∈ Rd×M
fic the visual content representation of image i
3.2 Problem Definition fis the visual style representation of image i
Given the social matrix S and upload matrix L, we identify
three key social contextual aspects, i.e., social influence, 4 T HE P ROPOSED M ODEL
upload history, and the creator admiration that may in- In this section, we present our proposed Hierarchical
fluence users’ preferences. Specifically, the social influence Attentive Social Contextual recommendation (HASC) model
aspect from each user a’s social network structure sa is well for image recommendation.
recognized as an important factor in the recommendation As shown in Fig. 3, HASC is a hierarchical neural net-
process [33], [24]. The social influence states that, each active work that models users’ preferences for to unknown images
user is influenced by her social connections, leading to the from two attention levels with social contextual modeling.
similar preferences between social connections [3]. Besides, The top layered attention network depicts the importance
for each user-item pair (a, i), we could get an upload history of the three contextual aspects (i.e., upload history, social
list la of user a, and the creator Ci of image i from the up- influence and creator admiration) for users’ decision, which
load matrix L. Based on this observation, we design the two is derived from the bottom layered attention networks that
contextual aspects in users’ preference decision process: an aggregate the complex elements within each aspect. Given a
upload history aspect that explains the consistency between user a and an image i with three identified social contextual
her upload history la and her preference for images, and the aspects, we use γal (l = 1, 2, 3) to denote a’s attentive
creator admiration aspect that shows the admiration from the degree for aspect l on the top layer (denoted as the aspect
creator Ci . These three contextual aspects characterise each importance attention with orange part in the figure). A large
user’s implicit feedback to images from various contextual attentive degree denotes the current user cares more about
situations from the heterogeneous social image data. Now, this aspect in image recommendation process. Besides, as
we define the social contextual image recommendation there are various elements within the upload history context
problem as: la and social influence context sa . We use αaj to denote a’s
Definition 1. [PROBLEM DEFINITION] Given the user preference degree for image j in the upload history context
rating matrix R, the upload matrix L, and the social la (lja = 1), with a larger value of αaj indicates that a’s
network S in a social image platform, with the social current interest is more coherent with uploaded image j by
embedding ea of each user a, and the content repre- user a. Similarly, we use βab to denote the influence strength
sentation fic and style representation fis of each image of the b to a in social neighbor context sa (sba = 1), with
i, the social contextual recommendation task aims at: a larger value of βab indicates that a is more likely to be
predicting each user a’s unknown preference for image influenced by b. Please note that, for each user a and image
i with the three social contextual aspects (sa , la , Ci ) and i, different from the upload history aspect and the social
the heterogeneous data embeddings (ea , fic and fis ) as influence aspect, the creator admiration aspect is composed
g(a, i, sa , la , Ci , ea , fac , fas ); of one element Ci (the creator). Thus, this aspect does not

Fig. 3. The overall architecture of the proposed HASC model.

have any sub layers and it is directly sent to the top layer. attentive weights (γal , αaj , and βab ) rely on our carefully
We use three attention sub-networks to learn these attentive designed attention networks that take various information
scores in a unified model. as input. We leave the details of how to model these three
Objective Prediction Function. In addition to param- attention networks in the following subsections. Next, we
eterize each user a with a base embedding pa and each show the soundness of the objective predicted function.
item i with a base embedding wi as many latent factor Relations to Other Models. By rewriting the predicted
based models [40], [26], we also take the inputs of the preference score in Eq (2), we have:
three social contextual aspects: sa , la , and Ci . To model the
complex contextual aspects, we extend the classical latent Basic Latent Factor Model Item Neighborhood Model
factor models and assume each user and each item has two z }| { N
X
embeddings. Specifically, each user a is associated with a r̂ai = pTa wi + γa1 αaj lja xTj wi
j=1
base embedding pa from the base embedding matrix P | {z }
to denote her base latent interest in the standard latent M
factor based models, and an auxiliary embedding vector qa
X
+ γa2 sba βab qTb wi + γa3 qTCi wi ,
from the auxiliary embedding matrix Q . This auxiliary user b=1
| {z }
embedding vector characterizes each user’s preference from
| {z } Owner Admiration Bias
Social Neighborhood Model
the social contextual aspects that could not be detected by
(3)
standard user-image rating behavior. Similarly, each image i
is also associated with two embeddings: a base embedding where the first part is a basic latent factor model, and the
wi from the item base embedding matrix W to denote following three parts are extracted from the three contextual
the basic image latent vector, and an auxiliary vector xi aspects. In the last three terms, xTj wi can be seen as the
from the item auxiliary embedding matrix X to characterize similarity function between image i and the user’s uploaded
each image from the social contextual inputs. Thus, by image j in the neighborhood-based collaborative filtering
combining the attention mechanism with the embeddings, from the upload history aspect [26]. qTb wi represents the
we model each user a’s predicted preference to image i as a social neighbor’s preference to image i with the social
hierarchical attention: influence aspect. As each image is uploaded by a creator,
the last term models the creator admiration aspect. This is
r̂ai = wiT (pa + γa1 x
ea + γa2 q
ea + γa3 qCi )
quite natural in the real-world, as we always like to follow
N M
X X some specific creators’ updates.
where x
ea = lja αaj xj , q
ea = sba βab qb . (2)
j=1 b=1
Please note that, if we replace all the attention scores
In the above prediction function, the representations of with equal weights (i.e., αai = PN 1 l , βab = P 1M sba ,
j=1 ja b=1
three contextual aspects are seamlessly incorporated in a and γal = 13 ), our model turns to an enhanced SVD++ model
holistic way. Specifically, the first line of Eq.(2) is a top with rich social contextual information modeling [26], [58].
layer attention network that aggregates the three contex- However, this fixed weight assignment treats each user,
tual aspects for user embedding. The detailed attention each aspect, and the elements in each aspect equally. This
subnetworks of the upload history attention and the social simply configuration neglects that each user has different
influence attention are listed in the second row. In fact, the considerations for these three contextual aspects. By using

hierarchical attention networks, we could learn each user’s of this paper, for ease of explanation, we also omit the
attentive weights from their historical behaviors. dimension reduction for the visual embeddings (i.e., Wc
and Ws ) whenever they are appeared for the attention
4.1 Hierarchical Attention Network Modeling modeling. Then, the final attentive upload history score αaj
In this subsection, we would follow the bottom-up step to is obtained by normalizing the above attention scores as:
model the hierarchical attention networks in detail. Specif- exp(αaj )
αaj = PN . (5)
ically, we would first introduce the two bottom layered k=1 exp(lka αak )
attention networks: the upload history attention network
After we obtain the attentive upload history score αaj ,
and the social influence attention network, followed by the
the upload history context of user a, denoted as x
ea , is cal-
top layered aspect importance attention network that is
culated as a weighted combination of the learned attentive
based on the bottom layered attention networks.
upload history scores:
Upload History Attention. The goal of the upload his- N
tory attention is to select the images from each user a’s x
ea =
X
lja αaj xj . (6)
upload history that are representative to a’s preferences, and j=1
then aggregate this upload history contextual information to
Social Influence Attention. The social influence atten-
characterize each user. Given each image j that is uploaded
tion module tries to select the influential social neighbors
by a, we model the upload history attentive score αaj as a
from each user a’s social connections, and then summarizes
three-layered attention neural network:
these social neighbors’ influences into a social contextual
αaj = w1 × σ(W1 [pa , qa , xj , wj , ea , vector. If user a follows b, we use βab to denote the social
Wc fjc , Ws fjs , Wc fac , Ws fas ]) (4) influence strength of b to a. Then, the social attentive score
βab could be calculated as:
where Θu = [Wc , Ws , W1 , w1 ] is the parameter set in this βab = w2 σ(W2 [pa , pb , qa , qb , ea , eb , fac , fas ]), (7)
three layered attention network, and σ(x) is a non-linear
activation function. Specifically, as the dimensions of visual where Θs = [W2 , w2 ] are the parameters in the social
content embeddings (i.e., fjc and fac ) and the visual style influence attention network. This social influence attention
embeddings (i.e., fjs and fas ) are much higher than the part also contains three kinds of data embeddings: the user
dimensions of other kinds of embeddings, Wc ∈ RD×4096 interest embeddings of pa , pb , qa , qb , the social embeddings
and Ws ∈ RD×5120 are the parameters of the bottom layer of ea and eb , and the visual embeddings of user a with
that performs dimension reduction of the visual content and content representation fac and style representation fas .
style representations. W1 ∈ R(8D+d)×d1 denotes the matrix Then, the final attentive social influence score βab is
parameter of the second layer in the attention network, as all obtained by normalizing the above attention scores as:
data embedding vectors has D dimensions except the social exp(βab )
embedding ea has d1 dimensions. And w1 ∈ Rd1 is the βab = PM . (8)
c=1 exp(sca βac )
vector parameter of the third layer in the attention network.
In this attention modeling process, we take three different After we obtain the attentive social influence score βab ,
kinds of embeddings as input: the social context of user a, denoted as qea , is calculated as
the a weighted combination as:
• Latent Embedding: the latent embedding includes M
[pa , qa , xj , wj ], where pa and qa are the basic and
X
qea = sba βab qb . (9)
auxiliary embeddings of user a, and xj and wj are b=1
the basic and auxiliary embeddings of item j . Since each image is uploaded by one creator, for each
• Social Embedding: the social embedding part con- image i, the corresponding uploader is represented as Ci .
tains the learned social embedding ea of each user, Correspondingly, the owner appreciation context could be
which models the global and local structure of each simply represented as the the auxiliary embedding qCi from
user in the social network S. the user auxiliary embedding matrix Q.
• Visual Embedding: the visual embedding part in- Aspect Importance Attention Network. The aspect im-
cludes the visual representations of user a and item portance attention network takes the contextual represen-
j . Specifically, each image is characterized by con- tation of each aspect from the bottom layered attention
tent representation fjc and style representation fjs . networks as input, and models the importance of each
Besides, as users show their preferences for images aspect in the user’s decision process. Specifically, for each
from their historical implicit feedbacks, each user a’s pair of user a and image i, we have two contextual rep-
visual content representation and style PN
representa- resentations from the bottom layer of HASC as: upload
r fc
c
tion can also be summarized as: fa = PN rai i , fas =
i=1
history contextual representation x e a , the social influence
PN i=1 ai
s
i=1 rai fi
. contextual representation q e a , and the owner appreciation
P N
r
i=1 ai contextual representation qCi . Then, the aspect importance
By feeding all the sophisticated designed embeddings score γal (l=1, 2, 3) is modeled with an aspect importance
from heterogeneous data sources as the input, the upload attention network as:
history attention network learns to focus on the specific γal = w3 σ(W3 al ), (10)
information. Please note that, we omit the bias terms in
the attention network without confusion. In the following

3 3
where Θa = [W , w ] is the parameter set of this attention Algorithm 1 The learning algorithm of HASC
network, and al (l = 1, 2, 3) denotes the input of the top Input: Rating matrix R, social matrix S, Uploader matrix L;
layered attention network, which is the output of the bottom batch size m; max epoch T ;
layered attention networks, i.e., a1 = x fa is the upload Output: Latent embedding matrix Θ1 = [P, Q, W, X] and
parameters in the attention networks Θ2 ;
history contextual representation, a2 = q fa is the social 1: Initialize Θ with a Gaussian distribution with a mean of
influence contextual representation, and a3 = qa denotes 0 and a standard variation of 0.1;
the representation of current active user a. 2: for epoch ← 1 to T do
Then, the final aspect importance score γal is obtained 3: Get training data D with randomly selected 5 times
by normalizing the above attention scores as: negative feedbacks< a, i, j > (a ∈ U, i ∈ Ra , j ∈ V −Ra );
|D|
exp(γal ) 4: for mini epoch ← 1 to m do
γal = P3 . (11) 5: Get mini batch : randomly select m pairs
k=1 exp(γak ) < ak , ik , j k >m
k=1 in the training data;
For each user a, the learned aspect importance scores 6: for Each pair < ak , ik , j k > in the mini batch do
are tailored to each user, which distinguish the importance 7: Compute predicted rating of positive
of the three social contextual aspects in the user’s decision item r̂ai (Eq.(2));
8: Compute predicted rating of negative
process. For all learned aspect importance scores, the larger item r̂aj (Eq.(2));
the value, the more likely the user’s decision is influenced 9: Compute the loss Lk (Eq.(12));
by this corresponding social contextual aspect. 10: end for
1 Pm k
11: Update Θ with loss as m k=1 L ;
12: end for
4.2 Model Learning 13: end for
As we focus on implicit feedbacks of users, similar as the 14: Return Θ1 = [P, Q, W, X] and parameters in the atten-
tion Θ2 .
widely used ranking based loss function in ranking based
latent factor models [40], we also design a ranking based
loss function as: dataset from one of the largest social image sharing platform
M
min L =
X X
s(r̂ai − r̂aj ) + λ||Θ1 || 2
(12)
Flickr, which is extended from the widely used NUS-WIDE
Θ
a=1 (i,j)∈Da
dataset [7], [45]. NUS-WIDE contains nearly 270,000 images
with 81 human defined categories from Flickr. Based on
where s(x) is a sigmoid function that transforms the input this initial data, we get the uploader information according
into range (0, 1). Θ = [Θ1 , Θ2 ], with Θ1 = [P, Q, W, X] denotes to the image IDs provided in NUS-WIDE dataset from the
the embedding matrices and Θ2 = [Θu , Θs , Θa ] denotes the public APIs of Flickr. We treat all the uploaders as the initial
parameters in each attention network. λ is a regularization userset, and the associated images as the imageset. We then
term that regularizes the user and image embeddings. Da = crawl the social network of the userset, and the implicit
{(i, j)|i ∈ Ra ∧j ∈ V − Ra } is the training data for a with Ra feedbacks of the userset to the imageset.
the imageset that a positively shows feedback. After data collection, in data preprocessing process, we
All the parameters in the above loss function are differ- filter out users that have less than 2 rating records and 2
entiable. In practice, we implement HASC with TensorFlow social links. We also filter out images that have less than
to train model parameters with mini batch Adam. The 2 records. We call the filtered dataset as F L. As shown in
detailed training algorithm is shown in Algorithm 1. In prac- Table 2, this dataset is very sparse with about 0.15% density.
tice, we could only observe positive feedbacks of users with Besides, we further filter F L dataset to ensure each user
huge missing unobserved values, similar as many implicit and each image have at least 10 rating records. This leads
feedback works, for each positive feedback, we randomly to a smaller but denser dataset as F S. Table 2 shows the
sample 5 missing unobserved feedbacks as pseudo negative statistics of the two datasets after pruning. Please note that
feedbacks at each iteration in the training process [50], [49], the number of images is much more than that of the users.
[5]. As each iteration the pseudo negative samples change, This is consistent with the observation that the number of
each missing value gives very weak negative signal. images usually far exceeds that of users in social image
platforms [1], as each user could be a creator to upload
5 E XPERIMENTS
multiple images. In data splitting process, we follow the
In this section, we show the effectiveness of our proposed leave-one-out procedure in many research works [5], [20].
HASC model. Specifically, we would answer the following Specifically, for each user, we select the last rating record as
questions: Q1: How does our proposed model perform com- the test data, and the remaining data are used as the training
pared to the baselines (Sec. 5.2)? Q2: How does the model data. To tune model parameters, we randomly select 5% of
perform under different sparsity (Sec. 5.3)? Q3: How does the training data to constitute the validation dataset.
the proposed social contextual aspects and the hierarchical
attention perform (Sec. 5.4)? TABLE 2
The statistics of the two datasets.
Dataset Users Images Ratings Social Links Rating Density
5.1 Experimental Settings F S 4,418 31,460 761,812 184,991 0.55%
Dataset. To the best of our knowledge, there is no public F L 8,358 105,648 1,323,963, 378,713 0.15%

available dataset that contains heterogenous data sources in Evaluation Metrics Since we focus on recommending
a social image based network as described in Fig. 1. To show images to users, we use two widely adopted ranking metric
the effectiveness of our proposed model, we crawl a large for top-K recommendation evaluation: the Hit Ratio (HR)

0.55
0.33

NDCG@K
HR@K

0.45 BPR BPR

VBPR
0.28 VBPR
ACF ACF
SR SR
ContextMF ContextMF
VPOI VPOI
0.35 HASC HASC
0.23
5 6 7 8 9 10 5 6 7 8 9 10
Top-K Top-K
(a) HR@K on F S (b) NDCG@K on F S

0.6
0.40

NDCG@K
HR@K

BPR BPR
VBPR VBPR
0.5 0.35
ACF ACF
SR SR
ContextMF ContextMF
VPOI VPOI
HASC HASC
0.4 0.3
5 6 7 8 9 10 5 6 7 8 9 10
Top-K Top-K
(c) HR@K on F L (d) NDCG@K on F L
Fig. 4. Overall performance of different models on the two datasets. (Better viewed in color.)

and Normalized Discounted Cumulative Gain (NDCG) [18], • ACF: it models the item level and component level
[5]. HR measures the percentage of images that are liked by attention for image recommendation with two atten-
users in the top-K list, and NDCG gives a higher score to tion networks. For fair comparison, we enrich this
the hit images that are ranked higher in the ranking list. As baseline by leveraging the upload history as users’
the image size is huge, it is inefficient to take all images as auxiliary feedback in this model [5].
candidates to generate recommendations. For each user, we • VPOI: it is a visual based POI recommendation al-
randomly select 100 unrated images as candidates, and then gorithm. This algorithm relies on the collective ma-
mix them with the records in the validation and test data to trix factorization to consider the associated images
select the top-K results. This evaluation process is repeated with each POI and the uploaded images of each
for 10 times and we report the average results [18], [5]. For user. To adapt the POI recommendation to image
both metrics, the larger the value, the better the ranking recommendation, we treat each image as a POI and
performance. the uploaded images of each user as the associated
Baselines. We compare our proposed HASC model with images of her. [49].
the following baselines:
Parameter setting. In the social embedding process with
• BPR: it is a classical ranking based latent factor Deepwalk [37], we set the parameters as: the window size
based model for recommendation with competing w = 10 and walks per vertex ρ = 80. The social embedding
performance. This method has been well recognized size d is set in the range [32, 64, 128]. We find when d = 128,
as a strong baseline for recommendation [40]. the social embedding reaches the best performance. Hence,
• SR: it is a social based recommendation model that we set d = 128 in Deepwalk. There are two important
encodes the social influence among users with social parameters in our proposed model: the dimension D of
regularization in classical latent factor based mod- the user and image embeddings, and the regularization
els [33]. parameter λ in the objective function (Eq.(12)). We choose
• ContextMF: this method models various social con- D in [10, 15, 20, 30] and λ in [0.001, 0.01, 01], and perform
textual factors, including item content topic, user grid search to find the best parameters. The best setting
personal interest, and inter-personal influence in is D = 15 and λ = 0.01. We find the dimension of the
a unified social contextual recommendation frame- attention networks does not impact the results much. Thus,
work [24]. we empirically set the dimensions of the parameters in
• VBPR: it extends BPR by modeling both the visual the attention networks as 20 (i.e., parameters in Θ2 ). The
and latent dimensions of users’ preferences in a uni- activation function σ(x) is set as the Leakly ReLU. To
fied framework, where the visual content dimension initialize the model, we randomly set the weights in the
is derived from a pre-trained VGG network. attention networks with a Gaussian distribution of mean

0 and standard deviation 0.1. Since the objective function BPR baseline can not work well under this situation as
of HASC is non-convex, we initialize P and W from the it only modeled the sparse user-image implicit feedbacks.
basic BPR model, and Q and X with the same Gaussian Under this situation, the improvement is significant for all
distribution as the parameters of the attention networks to models over BPR as these models utilized different auxiliary
speed up convergence. We use mini-batch Adam to optimize data for recommendation. E.g., when users have less than 4
the model, where the batch size is set as 512 and the initial ratings, our proposed HASC model improves over BPR by
learning rate is set as 0.0005. There are several parameters more than 35%. As user rating scale increases, the perfor-
in the baselines, for fair comparison, all the parameters in mance of all models increase quickly with more training
the baselines are also tuned to have the best performance. rating records, and HASC still consistently outperforms the
For all models, we stop model training when both the HR@5 baselines.
and NDCG@5 on the validation dataset begins to decrease. TABLE 3
The improvement of using different attention mechanism
5.2 Overall Performance compared to BPR.

Fig. 4 shows the overall performance of all models on HR@K Bottom Layer Top Layer F S F L
Attention Attention HR NDCG HR NDCG
and NDCG@K on the two datasets with varying sizes of AVG AVG 6.44% 10.28% 5.54% 9.02%
K , where the top two subfigures depict the results on F S MAX MAX 5.82% 9.55% 4.98% 8.10%
dataset and the bottom two subfigures depict the results on AVG ATT 7.33% 11.15% 5.95% 9.93%
F L dataset. As shown in this figure, our proposed HASC MAX ATT 6.84% 10.96% 5.72% 9.55%
ATT AVG 12.75% 19.23% 8.30% 13.28%
model always performs the best. With the increase of the ATT MAX 12.20% 18.56% 8.02% 12.85%
top-K list size, the performance of all models increase. The ATT ATT 14.57% 22.55% 10.67% 16.70%
performance trend is consistent over different top-K values
TABLE 4
and different metrics. We find that considering either the The improvement of modeling different contextual aspects with
social network or the visual image information could allevi- our proposed model compared to BPR.(U:upload history, S:
ate the data sparsity problem and improve recommendation social influence, C: creator admiration)
performance. E.g., VBPR improves over BPR about 3% by F S F L
Aspects
incorporating the visual information in the modeling pro- HR NDCG HR NDCG
cess. ACF further improves VBPR by assigning the attentive U 8.70% 16.52% 6.44% 11.03%
S 9.63% 16.78% 5.29% 9.65%
weights to different images the user rated and uploaded
C 8.57% 14.53% 4.37% 7.93%
in the past. SR also has better performance as it leverages U+S+C 14.57% 22.55% 10.67% 16.70%
the social network information, and ContextMF further im-
proves the performance with content modeling. On average, 5.4 Attention Analysis
our proposed model shows about 20% improvement over
In this part, we conduct experiments to give more detailed
BPR baseline, and more than 10% improvement over the
analysis of the proposed attention network. We would eval-
best baselines on both datasets with regard to the NDCG@5
uate the soundness of the designed attention structure and
metric. Last but not the least, by comparing the results of
the superiority of combining the various data embeddings
F S and F L, we observe that for each method, the results
for attention modeling.
on F L always outperform F S. We guess a possible reason
In the experiments, we use the Leakly ReLU as the
is that, though F S is denser than F L, the larger F L has
activation function σ(x) for attention modeling, and then
nearly two times as many records as F S for training. As
attentively combine the elements of each set with a soft
the overall trend is similar on the two metrics with different
attention. Alternately, instead of attentively combining all
values of K , in the following of the subsections, for page
the elements, a direct solution is to use the hard atten-
limit, we only show the top-5 results.
tion with MAX operation that selects the element with
the largest attentive score at each layer of the hierarchi-
5.3 Performance under Different Data Sparsity cal attention network. E.g., for the upload history aspect,
A key characteristic of our proposed model is that it alle- Max learns the attentive upload history score in Eq.(6) as:
viates the data sparsity issue with various social contextual e a = xj , where lja = 1 ∧ (∀lka = 1, αja ≥ αka ). Partic-
x
aspects modeling. In this subsection, we investigate the per- ularly, if we simply set the attentive scores with the average
formance of various models under different data sparsity. pooling (i.e., αai = |L1a | , βab = |S1a | , γal = 13 ), our model
We mainly focus on the F L dataset as it is more chal- degenerates to an enhanced SVD++ with social contextual
lenging with sparser user rating records compared to the modeling but without any attentive modeling. If we do not
denser F S dataset. Specifically, we bin users into different model any social contextual aspects, our model degenerates
groups based on the number of the observed feedbacks in to the BPR model [40]. Table 3 shows the results of different
the training data, and then show the performance under attention mechanism. As shown in this table, the best results
different groups. Fig. 5 shows the results, where the left part are achieved by using our proposed attention mechanism,
summarizes the user group distribution of the training data followed by AVG and MAX. We guess a possible reason is
and the right part depicts the performance with different that: each user’s interests are diversified, and it is challeng-
data sparsity. As shown in the left part, more than 5% ing to infer each user’s interests from the limited training
users have less than 4 ratings, and 20% users have less data. If we simply using a hard attention with the maximum
than 16 ratings with more than 100 thousand images on value or adopting average aggregation, many valuable con-
the F L dataset. When the rating scale is very sparse, the textual information is neglected in this process. Besides, we

0.35
30 BPR

Percentage(%)
VBPR

NDCG@5
0.30
20 ACF
SR
ContextMF
0.25
10 VPOI
HASC
0 0.20
[0,4) [4,16) [16 64) [64 256) [256,) [0,4) [4,16) [16,64) [64,256) [256,)
Num. of ratings for each user Num. of ratings for each user

Fig. 5. Performance under different sparsity.

observe that ATT that operates at the bottom layer achieves the social embeddings (i.e., ea ), and the visual embeddings
much better performance than its counterparts that operates with content representations (i.e., fic of image i and fac of
on the top layer (e.g., the comparison results between the user a) and style representations (i.e., fis of image i and
fourth row and the sixth row). Since each aspect at the fas of user a ). Table 5 shows the performance of HASC
bottom layer usually contains much more elements than the with different kinds of input embeddings. From this table,
top layer, attentively summarizing each contextual aspect at we have several observations. First, as the auxiliary latent
the bottom layer would provide valuable information for embedding representation could model each user and each
the top layer. In contrast, if we use AVG or MAX at the item from the rich social contextual information, taking the
bottom layer, the results are not satisfactory when we use auxiliary embeddings could improve the performance than
“ATT” at the second layer, since the input of the second solely feeding the base embeddings for attention modeling.
layer lacks many important information. Second, the improvement of social embeddings is not very
After showing the soundness of our proposed atten- significant. We guess a possible reason is that, the social
tion structure, Table 4 presents the performance of using influence aspect already considers the social neighborhood
different contextual aspects with our proposed hierarchical information for users’ interest modeling. As the social em-
attention. As shown in this table, each aspect improves beddings represent the overall social network with both
the performance. By combining all social contextual aspects local and global structure, the improvement is limited with
with hierarchical attention, the model reaches the best per- the additional global network structure modeling. Third, we
formance. observe that the improvement of the visual embeddings is
very significant. Both the content and the style informa-
TABLE 5 tion could enhance the recommendation performance. By
Performance of different kinds of inputs for attention modeling combining content and style embeddings, the performance
(Base: base embedding, Aux: auxiliary embedding, Soc: social
embedding, Vis C(Vis C): visual content(style) embedding). further improves. This observation empirically shows the
with “Base” denotes the base embedding, “Aux” denotes the complementary relationship of content and style in visual
auxiliary embedding, ”Soc” denotes the social embedding, and images. Last but not least, by feeding the three differ-
“Vis C”, “Vis S”, “Vis CS” denotes the visual content feature, ent kinds of data embeddings into the attention network
visual style feature, and both visual features.
embedding, the proposed HASC could achieve the best
F S F L performance.
Input Embedding
HR NDCG HR NDCG
Base 0.358 0.257 0.439 0.319 In the previous experiments, we use the DeepWalk as
Base+Aux 0.366 0.264 0.445 0.323 the social network embedding model to obtain the social
Base+Aux+Soc 0.367 0.270 0.450 0.331 network embedding vector of each user. Now we would
Base+Aux+Vis C 0.388 0.278 0.453 0.335 show the effectiveness of adopting different network em-
Base+Aux+Vis S 0.383 0.275 0.451 0.332
Base+Aux+Vis CS 0.393 0.282 0.464 0.342
bedding techniques. We choose two state-of-the-art network
Base+Aux+Soc+Vis CS 0.400 0.289 0.475 0.347 embedding models: LINE [44] and GCN [25], and compare
the performance. The results are shown in Table 6. As can
be seen from this table, when the item visual embeddings
TABLE 6 are not incorporated, using the advanced graph embedding
Performance of different kinds of social embedding techniques
for the attention modeling. techniques (e.g., GCN), could partially improve the recom-
mendation performance, as these advanced models could
F S F L better capture the social network structure. When all the
Input Embedding
HR NDCG HR NDCG
Base+Aux+DeepWalk 0.367 0.270 0.450 0.331 input embeddings are incorporated, these advanced graph
Base+Aux+LINE 0.369 0. 273 0.452 0.334 embedding models show similar performance compared to
Base+Aux+GCN 0.371 0.276 0.459 0.340 the DeepWalk based network embedding model. We guess
Base+Aux+DeepWalk+Vis CS 0.400 0.289 0.475 0.347 the reason is that, as stated as Table 5, the improvement of
Base+Aux+LINE+Vis CS 0.400 0.289 0.474 0.345
Base+Aux+GCN+Vis CS 0.401 0.290 0.475 0.348
the social embedding is not as significant as the visual based
input for attention modeling when all the input embeddings
Besides, in the attention modeling process, we also learn are considered.
the attentive weights by modeling different kinds of input Attention Weights Visualization. Besides given the
embeddings from the heterogeneous data sources. For each overall results of different attention modeling setting, we
attention layer, it consists the following kinds of inputs: give a visualization of the learned attention weights of
the latent interest representations of base embeddings (i.e., users from the F L dataset. Firstly, for each user, we group
pa and wi ) and auxiliary embeddings (i.e., qa and xi ), her into three categories according the aspect that has the

analyze this user’s records and guess a possible reason is

max ra1 that: the style and the content of the test image has rarely
appeared in the user’s training data. As this test image
Owner admiration ra3
0.6
max ra2
0.5
differs from the distribution of the training images of this
max ra3 user, most models could not perform well. However, the
0.4
C model that leverages the owner admiration shows better
0.3
results than the remaining models, as this user has liked
0.2 several images uploaded by the owner. This example gives
0.1 us an intuitive explanation that shows when our proposed
0.5
model may not perform very well. Nevertheless, we must
0.6
0.4
0.4
0.5 notice that this case is caused by the situation that the test
0.3
0.3
0.2 0.2 pattern is not consistent with the patterns in the training
0.1 0.1
data, which is uncommon. Therefore, we could empirically
Upload history ra1
Social influence ra2 conclude that our proposed model shows the best results for
Fig. 6. Visualization of aspect weights of randomly sampled most cases.
users.
6 C ONCLUSIONS
largest attention value. In other words, for each user a in In this paper, we have proposed a hierarchical attentive
the first group, she has the largest aspect weight for upload social contextual model of HASC for social contextual image
history, i.e., γa1 > γa2 ∧ γa1 > γa3 . Then, for each group, recommendation. Specifically, in addition to user interest
we randomly select 10% of users and visualize them in modeling, we have identified three social contextual aspects
Fig. 6. As observed in this figure, each randomly sampled that influence a user’s preference to an image from hetero-
user has her own attentive weights for balancing the three geneous data: the upload history aspect, the social influence
contextual aspects. Besides, most of the users belong to the aspect, and the owner admiration aspect. We designed a
first group that has the largest value of the upload history hierarchical attention network that naturally mirrored the
aspect, which empirically shows that many users show hierarchical relationship of users’ interest given the three
similar preferences between their uploaded images and the identified aspects. In the meantime, by feeding the data
liked images. This observation is also consistent with Table 4 embedding from rich heterogeneous data sources, the hier-
that shows leveraging the upload history has the largest archical attention networks could learn to attend differently
performance gain compared to the remaining two aspects to more or less important content. Extensive experiments on
on F L dataset. real-world datasets clearly demonstrated that our proposed
HASC model consistently outperforms various state-of-the-
5.5 Case Study art baselines for image recommendation.
In order to better understand the proposed model, we
visualize several typical users and the experimental results R EFERENCES
of different recommendation models in Fig.7. In this figure,
[1] Flickr Statistics. https://expandedramblings.com/index.php/
each row represents a user. The first column shows the flickr-stats/, 2017. [Online; accessed 20-Jan-2018].
images liked by the user in the training data, and the second [2] G. Adomavicius and A. Tuzhilin. Toward the next generation
column shows the test image of each user in the test data. of recommender systems: A survey of the state-of-the-art and
possible extensions. TKDE, 17(6):734–749, 2005.
Please note that, due to page limit, we only show six typical [3] A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and
training images of each user if she has rated more than 6 correlation in social networks. In KDD, pages 7–15. ACM, 2008.
images in the training data. The third column shows the [4] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation
NDCG@5 results of different models. Specifically, to vali- by jointly learning to align and translate. In ICLR, 2015.
[5] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua. Attentive
date the effectiveness of different aspects in the modeling collaborative filtering: Multimedia recommendation with item-
process, we use U, S, and C to denote the three simplified and component-level attention. In SIGIR, pages 335–344. ACM,
versions of our proposed HASC model that only consider 2017.
the upload history aspect (i.e., γa2 = γa3 = 0), the social [6] T. Chen, X. He, and M.-Y. Kan. Context-aware image tweet
modelling and recommendation. In MM, pages 1018–1027. ACM,
influence aspect(i.e., γa1 = γa3 = 0), and the owner ad- 2016.
miration aspect(i.e., γa1 = γa2 = 0). We present the learned [7] T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-
attention weights of different aspects of our proposed HASC wide: a real-world web image database from national university
of singapore. In MM, page 48. ACM, 2009.
model in the fourth column. The last column gives some [8] P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network
intuitive explanations of the experimental results. As shown embedding. TKDE, 2018.
in this figure, by learning the importance of different aspects [9] S. Deng, L. Huang, G. Xu, X. Wu, and Z. Wu. On deep learning
with attentive modeling, HASC could better learn each for trust-aware recommendations in social networks. TNNLSS,
28(5):1164–1177, 2017.
user’s preference from various social contextual aspects. [10] L. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using
Thus, it shows the the best performance for the users in convolutional neural networks. In NIPS, pages 262–270, 2015.
the first three rows. In the fourth row, we present a case [11] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using
convolutional neural networks. In CVPR, pages 2414–2423, 2016.
that all the models do not perform well expect than the
[12] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shecht-
simplified C model from HASC that leverages the single man. Controlling perceptual factors in neural style transfer. In
creator admiration aspect into consideration. We carefully CVPR, pages 3985–3993, 2017.

Train Test NDCG@5 Attention weights Results explanation

U 0.32 BPR 0.22 Upload ra1 0.49

For the test image, its style resembles many
a S 0.45 SVD++ 0.43 0.26 images in the training data. 1/7 of a’s followers’

……
Social ra2
C 0.44 HASC 0.68 0.24 liked the image. User a has liked 1/8 of the
Owner ra3
images by the owner.

U 0.36 BPR 0.18 Upload rb1 0.46 For the test image, its style and content looks
like the training images in the first row. None of
b S 0.28 SVD++ 0.43 0.22

……
Social rb2 b’s followers’ liked the image. User a has liked
C 0.35 HASC 0.86 Owner rb3 0.30 1/3 of the images by the owner.

U 0.48 BPR 0.30 0.36 For the test image, its content looks like many
Upload rc1
0.33 images in the training data. 1/4 of a’s followers’
……

S 0.44 SVD++ 0.52 rc2

c Social
liked the image. User c has liked 1/10 of the
C 0.26 HASC 0.66 Owner rc3 0.31
images by the owner.

U 0.36 BPR 0.08 0.40 For the test image, its content and style of rarely
Upload rd1
S 0.28 SVD++ 0.52 0.29
appeared in the user d’s training data. The user
d rd2
……

Social
liked 3/7 of the images uploaded by the owner.
C 0.68 HASC 0.44 Owner rd3 0.31 None of user d’s followers’ like this image.

Fig. 7. The case study of several typical users. In this figure, each row represents a user. The first and the second column are the
training and test images of the user. The Top-5 recommendation results of NDCG@5 are shown in the third column. In the third
column, the left three models are simplified versions of our proposed HASC model that only leverage one aspect, and the model
with best performance is shown with bold italic letters.

[13] F. Gelli, X. He, T. Chen, and T.-S. Chua. How personality affects [29] J. Li, M.-T. Luong, and D. Jurafsky. A hierarchical neural autoen-
our likes: Towards a better understanding of actionable images. In coder for paragraphs and documents. arXiv:1506.01057, 2015.
MM, pages 1828–1837. ACM, 2017. [30] D. C. Liu, S. Rogers, R. Shiau, D. Kislyuk, K. C. Ma, Z. Zhong,
[14] F. Gelli, T. Uricchio, X. He, A. Del Bimbo, and T.-S. Chua. Beyond J. Liu, and Y. Jing. Related pins at pinterest: The evolution of a
the product: Discovering image posts for brands in social media. real-world recommender system. In WWW, pages 583–592, 2017.
In MM. ACM, 2018. [31] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang. Stamp: short-term
[15] Y. Gong and Q. Zhang. Hashtag recommendation using attention- attention/memory priority model for session-based recommenda-
based convolutional neural network. In IJCAI, pages 2782–2788, tion. In SIGKDD, pages 1831–1839. ACM, 2018.
2016. [32] P. Loyola, C. Liu, and Y. Hirate. Modeling user session and intent
[16] G. Guo, J. Zhang, and N. Yorke-Smith. A novel recommendation with an attention-based encoder-decoder architecture. In RecSys,
model regularized with user trust and item ratings. TKDE, pages 147–151. ACM, 2017.
28(7):1607–1620, 2016. [33] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender
[17] R. He, C. Fang, Z. Wang, and J. McAuley. Vista: a visually, systems with social regularization. In WSDM, pages 287–296.
socially, and temporally-aware model for artistic recommendation. ACM, 2011.
In Recsys, pages 309–316. ACM, 2016. [34] A. Mnih and R. R. Salakhutdinov. Probabilistic matrix factoriza-
[18] R. He and J. McAuley. Vbpr: Visual bayesian personalized ranking tion. In NIPS, pages 1257–1264, 2008.
from implicit feedback. In AAAI, pages 144–150, 2016. [35] W. Niu, J. Caverlee, and H. Lu. Neural personalized ranking for
[19] X. He, Z. He, J. Song, Z. Liu, Y.-G. Jiang, and T.-S. Chua. Nais: Neu- image recommendation. In WSDM, pages 423–431. ACM, 2018.
ral attentive item similarity model for recommendation. TKDE, [36] W. Pan and Z. Ming. Collaborative recommendation with multi-
2018. class preference context. IEEE Intelligent Systems, 32(2):45–51, 2017.
[20] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua. Neural [37] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning
collaborative filtering. In WWW, pages 173–182, 2017. of social representations. In KDD, pages 701–710. ACM, 2014.
[21] B. Hu, C. Shi, W. X. Zhao, and P. S. Yu. Leveraging meta- [38] X. Qian, H. Feng, G. Zhao, and T. Mei. Personalized recommenda-
path based context for top-n recommendation with a neural co- tion combining user interest and social circle. TKDE, 26(7):1763–
attention model. In SIGKDD, pages 1531–1540. ACM, 2018. 1777, 2014.
[22] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual [39] S. Rendle. Factorization machines with libfm. TIST, 3(3):57, 2012.
attention for rapid scene analysis. PAMI, 20(11):1254–1259, 1998. [40] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme.
[23] M. Jamali and M. Ester. A matrix factorization technique with trust Bpr: Bayesian personalized ranking from implicit feedback. In
propagation for recommendation in social networks. In RecSys, UAI, pages 452–461. AUAI Press, 2009.
pages 135–142. ACM, 2010. [41] S. Seo, J. Huang, H. Yang, and Y. Liu. Interpretable convolutional
[24] M. Jiang, P. Cui, F. Wang, W. Zhu, and S. Yang. Scalable recom- neural networks with dual local and global attention for review
mendation with social contextual information. TKDE, 26(11):2789– rating prediction. In Recsys, pages 297–305. ACM, 2017.
2802, 2014. [42] K. Simonyan and A. Zisserman. Very deep convolutional net-
[25] T. N. Kipf and M. Welling. Semi-supervised classification with works for large-scale image recognition. In ICLR, 2015.
graph convolutional networks. In ICLR, 2017. [43] P. Sun, L. Wu, and M. Wang. Attentive recurrent social recommen-
[26] Y. Koren. Factorization meets the neighborhood: a multifaceted dation. In SIGIR, pages 185–194. ACM, 2018.
collaborative filtering model. In KDD, pages 426–434. ACM, 2008. [44] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line:
[27] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classifi- Large-scale information network embedding. In WWW, pages
cation with deep convolutional neural networks. In NIPS, pages 1067–1077, 2015.
1097–1105, 2012. [45] J. Tang, X. Shu, G.-J. Qi, Z. Li, M. Wang, S. Yan, and R. Jain. Tri-
[28] C. Lei, D. Liu, W. Li, Z.-J. Zha, and H. Li. Comparative deep clustered tensor completion for social-aware image tag refinement.
learning of hybrid representations for image recommendations. In PAMI, 39(8):1662–1674, 2017.
CVPR, pages 2545–2553, 2016. [46] Y. Tay, A. T. Luu, and S. C. Hui. Multi-pointer co-attention

networks for recommendation. In SIGKDD, pages 2309–2318. Richang Hong (M’12) is currently a professor
ACM, 2018. at HFUT. He received the Ph.D. degree from
[47] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and USTC, in 2008. He has co-authored over 60 pub-
Y. Bengio. Graph attention networks. In ICLR, 2018. lications in the areas of his research interests,
[48] D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. which include multimedia question answering,
In KDD, pages 1225–1234. ACM, 2016. video content analysis, and pattern recognition.
[49] S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, and H. Liu. What He is a member of the Association for Computing
your images reveal: Exploiting visual contents for point-of-interest Machinery. He was a recipient of the best paper
recommendation. In WWW, pages 391–400, 2017. award in the ACM Multimedia 2010.
[50] L. Wu, Y. Ge, Q. Liu, E. Chen, R. Hong, J. Du, and M. Wang.
Modeling the evolution of users’ preferences and social links in
social networking services. TKDE, 29(6):1240–1253, 2017.
[51] L. Wu, P. Sun, R. Hong, Y. Ge, and M. Wang. Collaborative neural
social recommendation. TSMC: Systems, 2019.
[52] J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua. Attentional
factorization machines: Learning the weight of feature interactions
via attention networks. In IJCAI, pages 3119–3125.
[53] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov,
R. Zemel, and Y. Bengio. Show, attend and tell: Neural image Yanjie Fu received his Ph.D. degree from
caption generation with visual attention. In ICML, pages 2048– Rugters University in 2016, the B.E. degree from
2057, 2015. University of Science and Technology of China
[54] Z. Yang, D. Yang, C. Dyer, X. He, A. J. Smola, and E. H. Hovy. in 2008, and the M.E. degree from Chinese
Hierarchical attention networks for document classification. In Academy of Sciences in 2011. He is currently
HLT-NAACL, pages 1480–1489, 2016. an Assistant Professor at the Missouri University
[55] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W.-Y. Ma. Collaborative of Science and Technology. His general interests
knowledge base embedding for recommender systems. In KDD, are data mining and big data analytics. He has
pages 353–362. ACM, 2016. published proficiently in referred journals and
[56] Q. Zhang, J. Wang, H. Huang, X. Huang, and Y. Gong. Hashtag conference proceedings, such as IEEE TKDE,
recommendation for multimodal microblog using co-attention ACM TKDD, IEEE TMC and ACM SIGKDD.
network. In IJCAI, pages 3420–3426, 2017.
[57] S. Zhang, L. Yao, A. Sun, and Y. Tay. Deep learning based
recommender system: A survey and new perspectives. CSUR,
52(1):5, 2019.
[58] S. Zhang, L. Yao, and X. Xu. Autosvd++: An efficient hybrid
collaborative filtering model via contractive auto-encoders. In
SIGIR, pages 957–960. ACM, 2017.
[59] Z. Zhao, H. Lu, D. Cai, X. He, and Y. Zhuang. User preference
learning for online social recommendation. TKDE, 28(9):2522– Xing Xie (SM’09) is currently a senior re-
2534, 2016. searcher in Microsoft Research Asia, and a
guest PhD advisor at USTC. His research inter-
est include spatial data mining, location-based
services, social networks, and ubiquitous com-
puting. In recent years, he was involved in the
program or organizing committees of over 70
conferences and works. Especially, he initiated
the LBSN workshop series and served as pro-
Le Wu is currently an assistant professor at the gram co-chair of ACM Ubicomp 2011. He is a
Hefei University of Technology (HFUT), China. senior member of ACM and the IEEE, and a
She received the Ph.D. degree from the Univer- distinguished member of China Computer Federation (CCF).
sity of Science and Technology of China (USTC).
Her general area of research interests is data
mining, recommender systems and social net-
work analysis. She has published more than
30 papers in referred journals and conferences.
Dr. Le Wu is the recipient of the Best of SDM
2015 Award, and the Distinguished Dissertation
Award from China Association for Artificial Intel-
ligence (CAAI) 2017. Meng Wang is a professor at the Hefei Univer-
sity of Technology, China. He received his B.E.
degree and Ph.D. degree in the Special Class
for the Gifted Young and the Department of
Electronic Engineering and Information Science
from the University of Science and Technology
of China (USTC), Hefei, China, in 2003 and
2008, respectively. His current research interests
Lei Chen is currently working towards the M.S. include multimedia content analysis, computer
degree at Hefei University of Technology, China. vision, and pattern recognition. He has authored
He received the B.S. degree from Anhui Uni- more than 200 book chapters, journal and con-
versity in 2016. His research interests include ference papers in these areas. He is the recipient of the ACM SIGMM
multimedia analysis and data mining. Rising Star Award 2014. He is an associate editor of IEEE Transactions
on Knowledge and Data Engineering (IEEE TKDE), IEEE Transactions
on Circuits and Systems for Video Technology (IEEE TCSVT), IEEE
Transactions on Multimedia (IEEE TMM), and IEEE Transactions on
Neural Networks and Learning Systems (IEEE TNNLS).

Common questions

The model employs a hierarchical attention network that differentially weights the three social contextual aspects (upload history, social influence, and owner admiration) based on their relevance to each specific user. The aspect importance attention on the top layer assigns different weights to these aspects, allowing the system to prioritize them according to their significance for the user's image recommendation process, thus providing a more customized recommendation experience .

The hierarchical attention network model ensures more adaptive image recommendations by modeling contextual aspects through attention mechanisms that dynamically adjust the weightage based on each user's interactions and preferences. Unlike traditional collaborative filtering which relies heavily on fixed user-item interactions, this model's attentional layers can pivot focus based on contextual relevance, such as recent uploads or social influence, yielding recommendations that better reflect current user interests and behaviors .

The hierarchical attention network distinguishes itself from the basic latent factor model like SVD++ by incorporating attentional weights specific to each user's preferences and the contextual importance of different aspects. Unlike SVD++, which treats aspects equally, the hierarchical model adapts weights dynamically by considering the user’s upload history, social influences, and creator admiration, thereby personalizing recommendations based on more nuanced behavioral patterns .

The hierarchical structure in the model is crucial for managing complex relationships in social image platforms by organizing various contextual elements into a cohesive framework. It constructs user interest representation by first creating auxiliary aspect representations, then combining them into an auxiliary user interest vector. This approach allows the model to process different levels of information and prioritize them according to relevance, effectively capturing the nuances of social interactions and individual preferences in recommendations .

The hierarchical attention model in social image recommendations enhances traditional latent factor models by incorporating three key social contextual aspects: upload history, social influence, and owner admiration, which helps in capturing complex user preferences. The model uses a hierarchical structure to weigh these aspects differently based on their importance to each user, improving recommendation quality by leveraging rich contextual information .

The upload history attention in the hierarchical model selects and aggregates images from each user's upload history that best represent their preferences. It uses a three-layer neural network to calculate the upload history attentive score, ensuring that selected images reflect the user’s current interests. This personalized attention helps in tailoring recommendations to be more relevant based on past behavior .

User-specific attentive weights significantly enhance the recommendation model's performance by customizing the importance of different social contextual aspects for each individual user. These weights ensure that the model captures the unique preference profiles and interactions of each user with social networks and content creators, leading to more accurate and personalized image recommendations. This tailored approach addresses the inherent variability in user behavior and content interest, making the system adaptable and responsive to individual user dynamics .

The proposed model integrates visual content and style embeddings to enhance image recommendation performance. Visual embeddings are included in the upload history attention network, where they help characterize the user's interest by recognizing patterns in the visual appeal of previously uploaded images. These embeddings help the model understand the aesthetic preferences of users, contributing to more precise recommendations that align with their visual tastes .

The three social contextual aspects play specific roles in the image recommendation model: 1) Upload history summarizes user interests based on previously uploaded images. 2) Social influence assesses the influence from the user's social network. 3) Owner admiration evaluates the influence of the uploader of the recommended image. Each aspect provides a different perspective on user preferences, which are combined into an auxiliary user latent embedding to improve recommendation accuracy .

Replacing attention scores with equal weights in the model could significantly deteriorate recommendation quality by disregarding the differential importance of each element and aspect for different users. Such a configuration treats all users and their interactions uniformly, ignoring the personalized nuances of user preferences derived from historical behavior and social context. This lack of personalization could lead to less accurate recommendations, as the model would fail to recognize the distinct influence of social contexts on individual user choices .

A Hierarchical Attention Model For Social Contextual Image Recommendation
No ratings yet
A Hierarchical Attention Model For Social Contextual Image Recommendation
4 pages
Social Media Popularity Prediction Based On Multi Modal Self Attention Mechanisms
No ratings yet
Social Media Popularity Prediction Based On Multi Modal Self Attention Mechanisms
8 pages
Meta-Path Augmented Sequential Recommendation With Contextual Co-Attention Network
No ratings yet
Meta-Path Augmented Sequential Recommendation With Contextual Co-Attention Network
24 pages
A Sequence-Oblivious Generation Method For Context
No ratings yet
A Sequence-Oblivious Generation Method For Context
11 pages
Learn To Personalized Image Search From The Photo Sharing Websites
No ratings yet
Learn To Personalized Image Search From The Photo Sharing Websites
4 pages
Bibliography & Abbreviations
No ratings yet
Bibliography & Abbreviations
3 pages
Context-Awareness and Viewer Behavior Prediction in Social-TV Recommender Systems Survey and Challenge
No ratings yet
Context-Awareness and Viewer Behavior Prediction in Social-TV Recommender Systems Survey and Challenge
9 pages
Applied Sciences
No ratings yet
Applied Sciences
19 pages
A Social Network-Based Recommender System SNRS
No ratings yet
A Social Network-Based Recommender System SNRS
32 pages
A Neural Influence Diffusion Model For Social Recommendation
No ratings yet
A Neural Influence Diffusion Model For Social Recommendation
10 pages
DiffNet++ A Neural in Uence and Interest
No ratings yet
DiffNet++ A Neural in Uence and Interest
14 pages
Popular Tag Recommendation by Neural Network in Social Media
No ratings yet
Popular Tag Recommendation by Neural Network in Social Media
13 pages
Learn To Personalized Image Search From The Photo Sharing Websites
No ratings yet
Learn To Personalized Image Search From The Photo Sharing Websites
7 pages
A Hybrid Recommendation Model in Social Media Based On Deep Emotion Analysis and Multi-Source View Fusion
No ratings yet
A Hybrid Recommendation Model in Social Media Based On Deep Emotion Analysis and Multi-Source View Fusion
16 pages
Sensors 17 00631 PDF
No ratings yet
Sensors 17 00631 PDF
27 pages
A Memory-Efficient Approach To The Scalability of
No ratings yet
A Memory-Efficient Approach To The Scalability of
14 pages
Lightgt: A Light Graph Transformer For Multimedia Recommendation
No ratings yet
Lightgt: A Light Graph Transformer For Multimedia Recommendation
10 pages
Improving Image Tag Recommendation Using Favorite Image Context
No ratings yet
Improving Image Tag Recommendation Using Favorite Image Context
4 pages
Big Data Recommendation
No ratings yet
Big Data Recommendation
9 pages
Enhancing Collaborative Filtering by User Interest Expansion Via Personalized Ranking
No ratings yet
Enhancing Collaborative Filtering by User Interest Expansion Via Personalized Ranking
16 pages
Outbrain Click Prediction Analysis
No ratings yet
Outbrain Click Prediction Analysis
8 pages
Ad Recommendation in A Collapsed and Entangled World
No ratings yet
Ad Recommendation in A Collapsed and Entangled World
11 pages
Mathematics 13 01386
No ratings yet
Mathematics 13 01386
13 pages
J166 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)
No ratings yet
J166 A Novel Context-Aware Recommender Systems Based On A J166 Deep Sequential Learning Approach (CReS)
24 pages
ETCW26
No ratings yet
ETCW26
6 pages
Web Crawling Based Context Aware Recommender Syste
No ratings yet
Web Crawling Based Context Aware Recommender Syste
25 pages
Temporal-Contextual Recommendation in Real-Time: Yifei Ma Balakrishnan (Murali) Narayanaswamy
No ratings yet
Temporal-Contextual Recommendation in Real-Time: Yifei Ma Balakrishnan (Murali) Narayanaswamy
9 pages
SEAN Recommendation
No ratings yet
SEAN Recommendation
11 pages
Group Attention For Collaborative: Filtering With Sequential Feedback and Context Aware Attributes
No ratings yet
Group Attention For Collaborative: Filtering With Sequential Feedback and Context Aware Attributes
22 pages
Incorporating Service Proximity Into Web Service Recommendation Via Tensors Recomposition
No ratings yet
Incorporating Service Proximity Into Web Service Recommendation Via Tensors Recomposition
118 pages
New Literature
No ratings yet
New Literature
8 pages
Social Networks Science Design, Implementation, Security, and Challenges From Social Networks Analysis To Social Networks Intelligence
No ratings yet
Social Networks Science Design, Implementation, Security, and Challenges From Social Networks Analysis To Social Networks Intelligence
15 pages
Rich Human Feedback For Text-to-Image Generation: Research/google-Research/tree/master/richhf 18k
No ratings yet
Rich Human Feedback For Text-to-Image Generation: Research/google-Research/tree/master/richhf 18k
19 pages
FuzzAttention On Session-Based Recommender System
No ratings yet
FuzzAttention On Session-Based Recommender System
6 pages
03-Knowledge-Based Recommender Systems - Overview and Research Directions
No ratings yet
03-Knowledge-Based Recommender Systems - Overview and Research Directions
19 pages
Personality-Aware Product Recommendation System Based On User Interests Mining and Metapath Discovery
No ratings yet
Personality-Aware Product Recommendation System Based On User Interests Mining and Metapath Discovery
13 pages
Effective Content Recommendation in New Media Leveraging Algorithmic Approaches
No ratings yet
Effective Content Recommendation in New Media Leveraging Algorithmic Approaches
10 pages
Personalized Image Search
No ratings yet
Personalized Image Search
7 pages
Mediator in Social Network For User Interest Activity in Big Data
No ratings yet
Mediator in Social Network For User Interest Activity in Big Data
5 pages
Research Article Hierarchical Social Recommendation Model Based On A Graph Neural Network
No ratings yet
Research Article Hierarchical Social Recommendation Model Based On A Graph Neural Network
10 pages
Research Article: An Effective Recommender Algorithm For Cold-Start Problem in Academic Social Networks
No ratings yet
Research Article: An Effective Recommender Algorithm For Cold-Start Problem in Academic Social Networks
12 pages
Two-Stage Next-Item Recommendation System
No ratings yet
Two-Stage Next-Item Recommendation System
11 pages
Graph Neural Network For Context-Aware Recommendation: Asma Sattar Davide Bacciu
No ratings yet
Graph Neural Network For Context-Aware Recommendation: Asma Sattar Davide Bacciu
20 pages
Ijcai2018 Yin
No ratings yet
Ijcai2018 Yin
7 pages
XGBoost for Facebook Friend Recommendations
No ratings yet
XGBoost for Facebook Friend Recommendations
7 pages
Mathematical Problems in Engineering - 2019 - Latif - Content Based Image Retrieval and Feature Extraction A Comprehensive
No ratings yet
Mathematical Problems in Engineering - 2019 - Latif - Content Based Image Retrieval and Feature Extraction A Comprehensive
21 pages
2023 KEDIR Pattern Based Hybrid Book Recommendation System
No ratings yet
2023 KEDIR Pattern Based Hybrid Book Recommendation System
12 pages
N2VSCDNNR: A Local Recommender System
No ratings yet
N2VSCDNNR: A Local Recommender System
11 pages
Programming Questions Recommendation System
No ratings yet
Programming Questions Recommendation System
12 pages
HADOOP Based Recommendation Algorithm For Micro-Video URL
No ratings yet
HADOOP Based Recommendation Algorithm For Micro-Video URL
9 pages
AI-Driven Social Media Recommendations
No ratings yet
AI-Driven Social Media Recommendations
16 pages
Minor Report
No ratings yet
Minor Report
47 pages
Learning Hierarchical Review Graph Representations For Recommendation
No ratings yet
Learning Hierarchical Review Graph Representations For Recommendation
14 pages
Social Re-Ranking for Image Retrieval
No ratings yet
Social Re-Ranking for Image Retrieval
8 pages
SIGIR2020 ShiShaoyun
No ratings yet
SIGIR2020 ShiShaoyun
10 pages
Be Java Projets 2020-21
No ratings yet
Be Java Projets 2020-21
40 pages
Diffurec: A Diffusion Model For Sequential Recommendation: Zihao Li and Chenliang Li
No ratings yet
Diffurec: A Diffusion Model For Sequential Recommendation: Zihao Li and Chenliang Li
28 pages
Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T
No ratings yet
Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T
17 pages
Pharmacy Management System in Zanzibar
100% (2)
Pharmacy Management System in Zanzibar
31 pages
Ia 2014 1002742 1
No ratings yet
Ia 2014 1002742 1
131 pages
Fast Nearest Neighbor Search With Keywords1
No ratings yet
Fast Nearest Neighbor Search With Keywords1
13 pages
Synthetic Social Media Data Generation
No ratings yet
Synthetic Social Media Data Generation
16 pages
Query Optimization
No ratings yet
Query Optimization
30 pages
Heuristic Query Optimization Guide
No ratings yet
Heuristic Query Optimization Guide
30 pages
Pharmacy Management System in Zanzibar
100% (2)
Pharmacy Management System in Zanzibar
31 pages
Fast Nearest Neighbor Search With Keywords1
No ratings yet
Fast Nearest Neighbor Search With Keywords1
13 pages
Hash-Based String Similarity Search
No ratings yet
Hash-Based String Similarity Search
14 pages
Basepaper (Water Fraud)
No ratings yet
Basepaper (Water Fraud)
7 pages
A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps
No ratings yet
A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps
10 pages
Project Designing
No ratings yet
Project Designing
22 pages
Weakly-Supervised Deep Embedding For Product Review Sentiment Analysis
No ratings yet
Weakly-Supervised Deep Embedding For Product Review Sentiment Analysis
12 pages
Interrupt Vector Table (IVT)
No ratings yet
Interrupt Vector Table (IVT)
3 pages
Reviving Sequential Program Birthmarking For Multithreaded Software Plagiarism Detection PDF
No ratings yet
Reviving Sequential Program Birthmarking For Multithreaded Software Plagiarism Detection PDF
23 pages
The Impact of Raw Materials Price Volatility On Cost of Goods Sold (COGS) For Product Manufaturing
No ratings yet
The Impact of Raw Materials Price Volatility On Cost of Goods Sold (COGS) For Product Manufaturing
14 pages
8086 Microprocessor Architecture Overview
100% (1)
8086 Microprocessor Architecture Overview
162 pages
Online Product Quantization for ANN
No ratings yet
Online Product Quantization for ANN
14 pages
Keyword-Aware Travel Route Framework
No ratings yet
Keyword-Aware Travel Route Framework
14 pages
Understanding the Program Segment Prefix
No ratings yet
Understanding the Program Segment Prefix
3 pages
File Compression Project
No ratings yet
File Compression Project
2 pages
CAP 07 - Applying Activity Theory To Video Analysis - How To Make Sense of Video Data in Human - Computer Interaction
No ratings yet
CAP 07 - Applying Activity Theory To Video Analysis - How To Make Sense of Video Data in Human - Computer Interaction
17 pages
Consumer Behavior Midterm Review
No ratings yet
Consumer Behavior Midterm Review
14 pages
Passage 1 Dinosaur Footprints Extinction
No ratings yet
Passage 1 Dinosaur Footprints Extinction
22 pages
What Is - Body Doubling - in The Context of ADHD
No ratings yet
What Is - Body Doubling - in The Context of ADHD
3 pages
Understanding Behavioral Assessment Techniques
100% (1)
Understanding Behavioral Assessment Techniques
69 pages
Elwood Technologymemorycollective 2015
No ratings yet
Elwood Technologymemorycollective 2015
9 pages
Stages of Concentration
No ratings yet
Stages of Concentration
20 pages
Modeling Consumer Learning From Online Product Reviews
No ratings yet
Modeling Consumer Learning From Online Product Reviews
17 pages
Phenomenological Mapping and Comparison of Shamanic Buddhist Yogic and Schizophrenic Experiences
No ratings yet
Phenomenological Mapping and Comparison of Shamanic Buddhist Yogic and Schizophrenic Experiences
30 pages
Allsup - Creating An Educational Framework For Popular Music in Public Schools. Anticipating The Second-Wave
No ratings yet
Allsup - Creating An Educational Framework For Popular Music in Public Schools. Anticipating The Second-Wave
28 pages
Steps in Summary Writing: Combining Main Ideas
75% (4)
Steps in Summary Writing: Combining Main Ideas
2 pages
Listening vs. Hearing
No ratings yet
Listening vs. Hearing
27 pages
Clinical Behavior Analysis and RFT
No ratings yet
Clinical Behavior Analysis and RFT
28 pages
Psychology Practicals Guide
No ratings yet
Psychology Practicals Guide
212 pages
Ped 101 Midterm Reviewer
100% (2)
Ped 101 Midterm Reviewer
6 pages
FLASH SONAR PROGRAM Helping Blind People Learn To See
No ratings yet
FLASH SONAR PROGRAM Helping Blind People Learn To See
27 pages
Furr - Structuring The Group Experience
No ratings yet
Furr - Structuring The Group Experience
22 pages
Impaired Verbal Communication
80% (5)
Impaired Verbal Communication
1 page
Margulies 2016
No ratings yet
Margulies 2016
6 pages
Article Pour Evaluation Neurosciences Wulff (2021)
No ratings yet
Article Pour Evaluation Neurosciences Wulff (2021)
12 pages
Parenting Tips for Ages 2-8
No ratings yet
Parenting Tips for Ages 2-8
2 pages
Chapter 3 - Consumer Behaviour 2021
No ratings yet
Chapter 3 - Consumer Behaviour 2021
25 pages
Classroom Management Guide
No ratings yet
Classroom Management Guide
3 pages
Definition of A Business Meeting
0% (1)
Definition of A Business Meeting
4 pages
Understanding Commitment and Leadership
No ratings yet
Understanding Commitment and Leadership
4 pages
Science Current Directions in Psychological: Working Memory Capacity As Executive Attention
No ratings yet
Science Current Directions in Psychological: Working Memory Capacity As Executive Attention
6 pages
Learn To Meditate
100% (1)
Learn To Meditate
23 pages
The Power of Now
No ratings yet
The Power of Now
7 pages
08 The Foundation For Incident Command
No ratings yet
08 The Foundation For Incident Command
35 pages
Mitchell Et Al 1997
100% (3)
Mitchell Et Al 1997
35 pages

A Hierarchical Attention Model For Social Contextual Image Recommendation

Uploaded by

A Hierarchical Attention Model For Social Contextual Image Recommendation

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

A Hierarchical Attention Model for Social

various elements of this aspect. By taking the output of

Fig. 3. The overall architecture of the proposed HASC model.

0.45 BPR BPR

Fig. 5. Performance under different sparsity.

analyze this user’s records and guess a possible reason is

Train Test NDCG@5 Attention weights Results explanation

U 0.32 BPR 0.22 Upload ra1 0.49

S 0.44 SVD++ 0.52 rc2

Common questions

How does the proposed model address the unequal importance of different social contextual aspects for each user?

How does the proposed hierarchical attention network model ensure more adaptive image recommendations compared to traditional collaborative filtering methods?

How does the hierarchical attention network distinguish itself from the basic latent factor model like SVD++?

What role does the hierarchical structure play in handling complex relationships in social image platforms?

How does the hierarchical attention model improve social image recommendations over traditional latent factor models?

In what way does the upload history attention in the hierarchical model contribute to personalizing image recommendations?

Explain the importance of user-specific attentive weights in enhancing the recommendation model's performance.

Discuss how the proposed model utilizes visual embeddings and their influence on image recommendation performance.

What are the roles of the three social contextual aspects in the proposed image recommendation model?

Evaluate the impact of replacing attention scores with equal weights in the model. What consequences might this have on recommendation quality?

You might also like