0% found this document useful (0 votes)
28 views10 pages

Cross Domain Recommendation Via Bi-Directional Transfer Graph Collaborative Filtering Networks

The paper presents BiTGCF, a novel bi-directional transfer learning method for cross-domain recommendation that leverages Graph Collaborative Filtering networks to address data sparsity issues. BiTGCF enhances user-item feature propagation and incorporates both common and domain-specific features during knowledge transfer between domains. Experimental results demonstrate that BiTGCF outperforms existing state-of-the-art methods on multiple datasets, improving recommendation performance in both domains simultaneously.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views10 pages

Cross Domain Recommendation Via Bi-Directional Transfer Graph Collaborative Filtering Networks

The paper presents BiTGCF, a novel bi-directional transfer learning method for cross-domain recommendation that leverages Graph Collaborative Filtering networks to address data sparsity issues. BiTGCF enhances user-item feature propagation and incorporates both common and domain-specific features during knowledge transfer between domains. Experimental results demonstrate that BiTGCF outperforms existing state-of-the-art methods on multiple datasets, improving recommendation performance in both domains simultaneously.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

Cross Domain Recommendation via Bi-directional Transfer


Graph Collaborative Filtering Networks
Meng Liu, Jianjun Li∗ Guohui Li Peng Pan
School of Computer Science and School of Software Engineering, School of Computer Science and
Technology, Huazhong University of Huazhong University of Science and Technology, Huazhong University of
Science and Technology, China Technology, China Science and Technology, China
{sunshinel,jianjunli}@hust.edu.cn [email protected] [email protected]

ABSTRACT of users, has become one of the most important services on the
Data sparsity is a challenge problem that most modern recom- Internet. The personalized recommendation aims to predict a group
mender systems are confronted with. By leveraging the knowl- of items that users are more likely to purchase in the future by fully
edge from relevant domains, the cross-domain recommendation employing uses’ historical interactions.
technique can be an effective way of alleviating the data sparsity Collaborative filtering (CF) is a widely used method [18, 27]
problem. In this paper, we propose a novel Bi-directional Transfer for personalized recommendation, which learns the recommender
learning method for cross-domain recommendation by using Graph model based on interaction history of similar users or items. Gen-
Collaborative Filtering network as the base model (BiTGCF). BiT- erally speaking, the key component in CF models is to learn the
GCF not only exploits the high-order connectivity in user-item latent features (embeddings) of users and items effectively and
graph of single domain through a novel feature propagation layer, then perform the prediction based on these embeddings. The tradi-
but also realizes the two-way transfer of knowledge across two tional CF methods, represented by matrix factorization (MF), obtain
domains by using the common user as the bridge. Moreover, dis- the latent factors of users and items by factorizing the user-item
tinct from previous cross-domain collaborative filtering methods, interactive matrix [26]. Neural CF models, such as NeuMF [10],
BiTGCF fuses users’ common features and domain-specific features replace inner product by multiple neural network layers to obtain
during transfer. Experimental results on four couple benchmark effectively matching function [5, 6, 23, 25]. Due to the powerful
datasets verify the effectiveness of BiTGCF over state-of-the-art nonlinearity fitting ability of neural network, neural CF models in
models in terms of bi-directional cross domain recommendation. general can get better fitting results and have gradually become
the mainstream.
CCS CONCEPTS In recent years, inspired by the success of graph convolutional
networks (GCN) in effectively extracting features in non-Euclidean
• Information systems → Recommender systems; • Comput-
spaces, some researchers try to exploit the user-item bipartite graph
ing methodologies → Transfer learning.
structure by propagating embeddings on it, aiming at achieving
more effective embeddings [1, 4, 14, 31]. For example, Wang et
KEYWORDS
al. [29] proposed NGCF, which follows the same propagation rules
Recommender Systems; Collaborative Filtering; Transfer Learning; as in GCN (including feature transformation, neighborhood ag-
Graph Convolution Network gregation and nonlinear activation) to capture the high-order con-
ACM Reference Format: nectivity between users and items by stacking multiple feature
Meng Liu, Jianjun Li, Guohui Li, and Peng Pan. 2020. Cross Domain Recom- propagation layers, and achieves promising results. Recently, He et
mendation via Bi-directional Transfer Graph Collaborative Filtering Net- al. [9] found two common designs in GCNs, transformation function
works. In The 29th ACM International Conference on Information and Knowl- and nonlinear activation, have no positive effect on collaborative fil-
edge Management (CIKM ’20), October 19–23, 2020, Virtual Event, Ireland. tering or even degrade the performance. They proposed LightGCN,
ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3340531.3412012
which greatly simplifies the design of NGCF but can yield better
performance. In general, the integration of higher-order neighbor
1 INTRODUCTION information makes GCN-based methods a great success. However,
With the rapid increase of commodity types and quantities, person- due to the large number of items in real life, recommender systems
alized recommendation, which can predict the purchase intention inevitably face the problem of data sparsity, which has become the
∗ Jianjun Li is the corresponding author.
main factor limiting the effectiveness of existing models.
An effective solution to the data sparsity problem is transferring
Permission to make digital or hard copies of all or part of this work for personal or knowledge [13] from other related domains by transfer learning. In
classroom use is granted without fee provided that copies are not made or distributed real life, a user inevitably interacts with multiple domains to meet
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM the demand of her life. When the interaction history is less in do-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, main A, it is natural to consider getting some common knowledge
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
from correlated domain B that includes more data. In recent years,
CIKM ’20, October 19–23, 2020, Virtual Event, Ireland cross-domain collaborative filtering (CDCF) [2, 11, 12, 15, 22, 28]
© 2020 Association for Computing Machinery. has attracted increasing research attention. But like every coin
ACM ISBN 978-1-4503-6859-9/20/10. . . $15.00
https://doi.org/10.1145/3340531.3412012

885
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

has two sides, the correlations between domains make CDCF pos-
Pre-CDCF Bi-TGCN
sible, the differences between domains also render it difficult to
transfer knowledge. Early on, CodeBook Transfer (CBT) [15] was Writing
Style,
proposed to first compress the dense rating matrix of auxiliary Author
domain into cluster-level rating pattern, called codebook, by or-
Type
thogonal nonnegative matrix tri-factorization (ONMTF), and then Extract
Extract Common
realize knowledge transfer by sharing the codebook. Later, some and Share Features Fuse with
Music Common Domain-
variants of CBT that follow the similar transfer mechanism have Frames Features Specific
been proposed [7, 21, 24]. This kind of method does not require the Actors Features

users of two domains to overlap. Unlike CBT’s two-stage migration,


Collective matrix factorization (CMF) [28] collectively factorizes (a) (b)
the rating matrix of two domains with the same users (or items),
and transfers knowledge by sharing the users’ (or items’) latent fea- Figure 1: (a) Different preference of user in book and movie
tures [28]. This is an effective way to refine users’ features in single domain; (b) Different ways to extract user’s feature in dif-
domain with users’ features learned from two domains. On this ba- ferent domains: Previous CDCFs vs BiTGCF. The circle with
sis, some improvements have been proposed [17, 19]. For example, the same color in (b) represents common feature.
CoNet [11] takes neural network as the basic model and uses cross
connections unit to improve the learning of matching functions in
the current domain, while PPGN [35] extracts more effective com- • By using graph collaborative filtering network as the base
mon users’ features by using GCN on the joint interaction graph of model, we propose BiTGCF, a novel bi-directional transfer
two domains. learning model for cross-domain recommendation. In BiT-
Despite their effectiveness, most current CDCF methods only GCF, we design a new feature propagation module, which
focus on refining user representations by sharing better common borrows the idea from LightGCN to simplify the feature prop-
features, but without considering users’ domain-specific features, agation model in a more reasonable manner. Moreover, we
which may limit the effect of transfer when the domain-specific propose a novel knowledge transfer module, which extends
features take a major proportion. Take Figure 1 for an example, the flow of features from in-domain to inter-domain, and
there are two domains: Film and Book. Some of the user features more importantly, considers the integration of uses’ common
in these two domains should be domain-specific, for instance, in features and domain-specific features.
the film domain, a user shows her preference for music, frame, etc., • We empirically demonstrate that the proposed model BiT-
which is not available in the user’s features in the book domain. GCF outperforms the state-of-the-art approaches (for both
But in existing CDCF methods, after transferring (sharing) user’s single and cross-domain) on four couple cross-domain datasets.
features, the user’s preferences in the two domains are exactly In addition, extensive experiments are conducted to verify
the same. Consequently, the matched movies based on the shared the effectiveness and applicability of our feature transfer
features may not be satisfactory. module.
In view of the limitation of existing CDCF methods and the pow- The rest of this paper is organized as follows: Section 2 discusses
erful feature extracting ability of GCN, in this work, we propose related work; Section 3 formally defines the research problem and
a novel Bi-direction Transfer learning method for cross-domain briefly reviews two representative recommendation models based
recommendation by using Graph Collaborative Filtering network on GCN; Section 4 details our proposed model; Section 5 presents
as the base model (BiTGCF). The major difference between BiTGCF the experimental study; Finally, Section 6 draws a conclusion.
with previous work lies in: (1) A new feature propagation mod-
ule. Inspired by LightGCN, we remove the non-linear activation 2 RELATED WORK
function and the transformation matrices in our GCN model for
collaborative filtering, which greatly reduces the number of model 2.1 Model-based CF Methods
parameters, as well as the risk of model overfitting. But different Collaborative filtering (CF) is a commonly used technology in mod-
from LightGCN, we retain the inner product operation, the self- ern recommender systems that parameterizes users and items as
connection operation and the layer combination manner. We argue embeddings and reconstructs the interactive history to learn the
that these operations are beneficial for increasing the feature flow embedding parameters. The core of CF lies in how to design the
among nodes, which in turn can boost the recommendation per- model so that it can learn more effective embeddings. Earlier CF
formance. The experimental study validates our conjecture. (2) A models, such as matrix factorization (MF), projects the user (or
bi-directional feature transfer module. Compared with previous item) ID into embedding space, and models the matching relation-
CDCF methods, our model takes into account domain-specific fea- ship between user and item via inner product. The development
tures in different domains when refining user features. A simple of neural network provides a new idea for learning projection and
yet effective balancing mechanism is designed to balance user’s matching function in CF models. For example, NeuMF [10] uses
common features and domain-specific features. Through the bidi- the stacked fully connection layer to replace the inner product in
rectional knowledge transfer between the two domains, our model MF, and DMF [33] replaces the linear projection method in MF
can improve the recommendation performance of both domains with the stacked full connection layer. Recently, researchers have
simultaneously. The main contributions of this paper are as follows, found that different historical interactions contribute differently

886
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

to the prediction of current interactions. To this end, attention graph of two domains, and then transfers knowledge by sharing
mechanisms, such as ACF [3] and DeepICF [32], were introduced to user features. Compared with the shallow cross-domain matrix
automatically learn the importance of each historical interaction. factorization models, the deep transfer methods generally exhibit
better performance, due to their stronger feature extraction ability.
2.2 Graph Convolutional Networks based
Recommendation 3 PRELIMINARY
Inspired by the development of graph neural networks [14, 16, 30], 3.1 Problem Definition
there are some efforts on exploiting user-item interaction graph We consider two domains 𝐷 A and 𝐷 B . The set of users in both do-
to infer user’s preference. GC-MC [31] applies the graph convo- mains are shared, denoted by 𝑈 (of size 𝑚 = |𝑈 |). Let the set of items
lutional network to exploit the connections between users and in 𝐷 A and 𝐷 B be 𝐼 A (of size 𝑛𝑎 = |𝐷 A |) and 𝐼 B (of size 𝑛𝑏 = |𝐷 B |),
items when encoding interactive features. SpectralCF [36] utilizes respectively. The purpose of bi-directional cross domain transfer is
a spectral convolution operation to explore all possible connectiv- to improve the recommendation performance in both domains. We
ity between users and items in the spectral domain. However, the consider Top-𝑁 recommendation with implicit feedback in each
eigen-decomposition in SpectralCF, which is a necessary step, is domain. Let 𝑅A ∈ R𝑚×𝑛𝑎 (𝑅B ∈ R𝑚×𝑛𝑏 , resp.) denote the user-item
very time-consuming. Recently, Wang et al. [29] proposed the Neu- interaction matrix of 𝐷 A (𝐷 B , resp.) from users’ implicit feedback,
ral Graph Collaborative Filtering (NGCF) framework to integrate A ∈ {0, 1} (𝑟 B ∈ {0, 1}, resp.) is 1 if the interaction
where an entry 𝑟𝑢𝑖 𝑢𝑗
GCN into the embedding process. By stacking multiple embedding between user 𝑢 on item 𝑖 (item 𝑗, resp.) is observed, and 0 otherwise.
propagation layers, NGCF can capture the collaborative signal in The recommendation problem with implicit feedback is abstracted
high-order connectivities between users and items. However, its as learning a function to estimate the scores of unobserved entries
designs are rather burdensome. LR-GCCF [4] removes the non- in interaction matrix, which are later used for ranking. Specifically,
linear activation function to facilitate turning for large dataset. for domain A,
More importantly, it takes residual learning approach to explain the 𝐴
𝑟ˆ𝑢𝑖 = 𝑓 (𝑢, 𝑖 |Θ) (1)
reason of concatenating all the layer’s output. Later, LightGCN [9]
simplifies NGCF by removing the operations such as activation where 𝑓 is the interaction function, Θ represents all learnable pa-
𝐴 is the predicted score. For matrix factorization
rameters, and 𝑟ˆ𝑢𝑖
function and transformation function that have no positive impact
on collaborative filtering. (MF) techniques, the matching function is the fixed dot product.
For deep-learning based CF, such as NeuMF [10], the interaction is
2.3 Transfer Learning and Cross Domain implemented by non-linear neural networks.
Collaborative Filtering Obviously, extracting satisfactory embeddings for users and
items is the key for better recommendation. Recently, GCN has
In recent years, transfer learning has emerged as a new learning shown its powerful ability in capturing the collaborative signal in
framework to address the data sparsity problem by extracting and high-order connectivities for more effective embedding learning.
transferring knowledge from related domain. Cross Domain Col- In view of this, in our transfer learning approach for cross-domain
laborative Filtering (CDCF) is the application of transfer learning recommendation, each domain is also modeled by a graph convolu-
in recommendation, which focuses on how to transfer knowledge tional network, and the GCN in both domains are jointly learned
(features) in an effective way. to improve the performance through bidirection high-order fea-
The way to transfer knowledge is various, such as Collective ma- ture transfer. Before introducing our model in detail, we briefly
trix factorization (CMF) [28] and codebook transfer [7, 15], which review two representative recommendation models based on GCN,
are based on Matrix Factorization (MF) applied in each domain. NGCF [29] and LightGCN [9], in the following subsection.
These approaches transfer interaction information from an auxil- .
iary domain to improve the performance in a target domain with
shallow model. Specifically, CMF jointly factorizes the rating matrix 3.2 Brief Review of NGCF and LightGCN
from two domains by sharing the user latent factors. This method
effectively realizes the transfer and improvement of common user The basic idea of GCN is to aggregate the features of neighbors so
hidden features. The rise of deep learning contributes a lot to the as to obtain better feature expression of nodes on the graph. GCN
development of CDCF, and some studies have tried to fuse CDCF layer in general can be abstracted as:
with deep learning, such as CoNet [11] and its heterogeneous vari- (𝑘+1) (𝑘) (𝑘)
e𝑢 = 𝐴𝐺𝐺 (e𝑢 , e𝑖 : 𝑖 ∈ N𝑢 ). (2)
ants [19]. With MLP as the basic model, CoNet shares user features
in the embedding process and completes the transfer of interac- where 𝐴𝐺𝐺 (·) is an aggregation function such as weighted sum
(𝑘) (𝑘)
tion features between the two domains through cross-mapping. aggregator and mean aggregator, e𝑢 and e𝑢 respectively denote
DARec [34] extractes and transfer patterns from rating matrices the refined embeddings of user 𝑢 and item 𝑖 after 𝑘 layers propaga-
in related domain, following the idea of domain adaptation. Later, tion, and N𝑢 denotes the first-hop neighbors of user 𝑢. In order to
DDTCDR [17] utilizes user information and items’ metadata from better understand the application of GCN in recommendation, we
online platform by using autoencoder, then adopts latent orthog- briefly introduce the embedding propagation in NGCF and Light-
onal mapping to extract user preferences over multiple domains. GCN. Note that we only show the user feature propagation process
PPGN [35] adopts graph convolutional network to explore the high- in these two models, the item feature propagation process can be
order connectivity between users and items on the joint interaction obtained analogously.

887
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

• Embedding Propagation Layer in NGCF: Concat


Layer 3
(𝑘+1) (𝑘 ) (𝑘 )
Õ 1 (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) Layer 2
e𝑢 = 𝜎 (W1 e𝑢 + p (W1 e𝑖 +W2 (e𝑖 ⊙e𝑢 ))) Layer 1
𝑖∈N𝑢 |N𝑢 | |N𝑖 |
(3) Feature
(𝑘) Transfer
where 𝜎 (·) is the non-linear activate function, W1 and
(𝑘)
W2 are trainable transformation matrices, N𝑢 and N𝑖 de- Feature
Propagation
note the first-hop neighbors of 𝑢 and 𝑖, and ⊙ is the element-
wise product. Distinct from conventional graph convolu-
tional networks that consider the contribution of e𝑖 only,
NDCG additionally encodes the interaction between e𝑖 and
e𝑢 into the message being passed via e𝑖 ⊙ e𝑢 . The final user
embedding is obtained by concatenating the output of 𝐿
(1) (2) (𝐿) Embedding Feature Propagation and Transfer Prediction
layers (i.e., {e𝑢 , e𝑢 , . . . , e𝑢 }) with the initial embedding
(0)
e𝑢 . Finally, the predictive score is obtained by conducting
the inner product between the final user embedding and Figure 2: An illustration of the architecture of BiTGCF (the
item embedding. arrowed lines present the flow of information), in which the
• Embedding Propagation Layer in LightGCN: red circle represents the current node according to the input.

(𝑘+1)
Õ 1 (𝑘)
𝑒𝑢 = p 𝑒𝑖 (4) A,
|N𝑢 ||N𝑖 | (0)
𝑖 ∈N𝑢 e𝑢 𝑎 = P⊤ x𝑢 𝑎
(5)
(0)
LightGCN greatly simplifies NGCF by removing the activa- e𝑖 = Q⊤ x𝑖
tion function and the transformation matrix (common in where P and Q are learnable parameter matrices of user and item,
GCN but unfavorable for CF). Besides, it uses weighted sum respectively, and x𝑢 and x𝑖 are one-hot encodings of the IDs of user
in layer fusion to replace the self-connection in Equation (3) (0) (0)
𝑢 ∈ 𝑈 and item 𝑖 ∈ 𝐼𝐴 , respectively. Note that we use e𝑢 𝑎 and e 𝑏
and concatenation in the fusion layer. Compared to NGCF, 𝑢
LightGCN reduces the risk of model overfitting to achieve to denote the embedding vectors of the same user 𝑢 in 𝐷 A and 𝐷 B ,
(0) (0)
better performance. respectively. Analogously, we can derive e 𝑏 and e 𝑗 for domain
𝑢
B, where 𝑗 ∈ 𝐼𝐵 .
In this work, instead of utilizing the embedding propagation rule of
NGCF or LightGCN directly, we will design our feature propagation Feature Propagation and Transfer: As shown in Figure 2, in
layer. We borrow the idea of LigthGCN to simplify the design of (0) (0) (0) (0)
this module, we feed [e𝑢 𝑎 , e𝑖 , e 𝑏 , e 𝑗 ] through 𝐿 graph con-
𝑢
GCN, in a more reasonable manner. volution layers to refine the embeddings of users and items. This
module consists of two components, feature (of both user and item)
4 PROPOSED MODEL propagation within each domain, and feature (of only user) transfer
We first introduce the overall structure of our BiTGCF model in between the two domains. We leverage the user-item interaction
Section 4.1, and then detail the two core components of BiTGCF, graphs to propagate and transfer embeddings as follows,
 
feature propagation and feature transfer, in Sections 4.2 and 4.3, (𝑘+1)
e𝑢 𝑎
(𝑘) (𝑘)
= 𝑓TA 𝑓PA (e𝑢 𝑎 ), 𝑓PB (e 𝑏 )
respectively. Finally, we introduce the model training in Section 4.4. 
𝑢

(𝑘+1) (𝑘) (𝑘)
e 𝑏 = 𝑓TB 𝑓PA (e𝑢 𝑎 ), 𝑓PB (e 𝑏 )
4.1 Architecture Overview
𝑢 𝑢 (6)
(𝑘+1) A (𝑘)
e𝑖 = 𝑓P (e𝑖 )
As depicted in Figure 2, the proposed model BiTGCF mainly in-
(𝑘+1) (𝑘)
cludes three modules: (1) an embedding layer that offers the ini- e𝑗 = 𝑓PB (e 𝑗 )
tialization of user embeddings and item embeddings; (2) a feature
(𝑘) (𝑘)
propagation and transfer module (with multiple layers) that refines where e𝑢 𝑎 and e𝑖 respectively denote the refined embeddings of
the initial embeddings of user and item by feature propagation (𝑘) (𝑘)
𝑢 and 𝑖 after 𝑘 layers propagation in 𝐷 A , e 𝑏 and e 𝑗 respectively
𝑢
in domain and feature transfer inter-domain; and (3) a prediction denote the refined embeddings of 𝑢 and 𝑗 after 𝑘 layers propagation
layer which concatenates the refined embeddings from different in 𝐷 B , 𝑓PA (·) and 𝑓PB (·) respectively denote the feature propagation
layers and outputs the probability that the given user-item pair is a
function in 𝐷 A and 𝐷 B and will be defined in Section 4.2, 𝑓TA (· , ·)
positive interaction.
and 𝑓TB (· , ·) respectively denote the feature transfer function in 𝐷 A
Embedding: This module maps the ID of a user 𝑢 (an item 𝑖) into and 𝐷 B and will be defined in Section 4.3.
(0) (0)
an embedding vector e𝑢 ∈ R𝑑 (e𝑖 ∈ R𝑑 ), where 𝑑 denotes the It is worth mentioning that in each layer, the feature transfer
embedding size. For ID embedding, this module can also be seen as acts only on user features. Nevertheless, the refinement of item
building a parameter matrix as an embedding look-up table, which features can be achieved with the help of high-order connectivity
will be optimized in an end-to-end manner. Specifically, for domain between the two domains through the feature propagation module.

888
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

... ...
... Item

User
Self-connect

Join edges

Inner product

...

...
0
Feature Weight in edges
... Propagation
1

Figure 3: Illustration of Cross-domain Connectivity

Figure 4: Illustration of Feature Propagation


Take the path marked in read in Figure 3 for example, 𝑖 1 gets part
of message from its first-order (one-hop) neighbor 𝑢 𝑎1 to refine its
feature. Similarly, 𝑢𝑏1 also gets part of message from its first-order passed from layer 𝑘, ⊙ denotes the element-wise product, and the
neighbor 𝑗1 to refine its feature. Then by the transfer module, we symmetric normalization term √ 1 follows the design of
| N𝑢𝑎 | | N𝑖 |
can get the path 𝑖 1 ← 𝑢 𝑎1 ⇆ 𝑢𝑏1 ← 𝑗1 (or 𝑖 1 → 𝑢 𝑎1 ⇆ 𝑢𝑏1 → 𝑗1 ), standard GCN to avoid the scale of embeddings increasing with
indicating that the connectivity between 𝑗1 and 𝑖 1 can be learned. propagation operations, which has also been used in NGCF and
Prediction: After propagating and transferring with 𝐿 layers, we LightGCN. Note that for domain B, 𝑓pB (·) can be defined similarly.
can obtain multiple feature representations for users and items. The The process of feature propagation is depicted in Figure 4. The
concatenation of multiple feature vectors learned from different message passed to the center node 𝑢 along the connected edge from
order neighbors will result in a stronger and more robust joint the neighbor nodes (items) includes two parts: (1) the normalized
representation for users and items. As such, we concatenate them weighted feature; and (2) its inner product with the center node 𝑢.
to get the final representation vector for user 𝑢 in 𝐷 A and item 𝑖 as The former is the general operation of GCN to integrate information
follows. from the neighbor nodes, while the latter ensures that the larger the
(0) (𝐿) match scores between the user and the item, the greater the value
e𝑢 𝑎 = e𝑢 𝑎 ∥ · · · ∥e𝑢 𝑎
(7) passed to the center node. Finally, we consider self-connection of
(0) (𝐿)
e𝑖 = e𝑖 ∥ · · · ∥e𝑖 the center node 𝑢 to retain the information of the original features.
where ∥ denotes the concatenation operation. Finally, we adopt
dot product, which is simple and non-parametric, to estimate the 4.3 Feature Transfer
probability of the user interact with the target item, Feature transfer is the key component of BiTGCF, which realizes
A
𝑟ˆ𝑢𝑖 = 𝑦ˆ𝐴 (𝑢, 𝑖) = 𝜎 (e𝑢⊤𝑎 e𝑖 ) (8) bidirectional user feature transfer between 𝐷 A and 𝐷 B . As men-
tioned in Section 1, we take both common features and domain-
where 𝜎 (·) is the sigmoid function to map real values to probability. specific features into consideration, which is different from existing
Note that domain B can be processed similarly to obtain 𝑟ˆ𝑢B𝑗 . CDCF methods that only consider learning common features. We
define the feature transfer function of 𝑢 in 𝐷 A and 𝐷 B as follows,
4.2 Feature Propagation
1
Feature propagation aims to refine nodes’ features on the cur- 𝑓TA (· , ·) = (𝐶 (𝑘) + 𝐴 (𝑘) ) (10)
2
rent graph by aggregating message from neighbors. Inspired by
1
NGCF [29] and LightGCN [9], we design a new feature propagation 𝑓TB (· , ·) = (𝐶 (𝑘) + 𝐵 (𝑘) ) (11)
rule in this work. We remove the non-linear activation function 2
Specifically,
and the transformation matrices in our GCN model for collabo-
(𝑘) (𝑘)
rative filtering. The effectiveness of such an operation has been 𝐶 (𝑘) = 𝑙𝑢 𝑎 𝑓PA (e𝑢 𝑎 ) + 𝑙𝑢𝑏 𝑓pB (e 𝑏 ) (12)
𝑢
verified in LightGCN [9]. But different from LightGCN, we retain
(𝑘) (𝑘)
the inner product operation, whose effectiveness has been verified 𝐴 (𝑘) = 𝜆𝑎 𝑓PA (e𝑢 𝑎 ) + (1 − 𝜆𝑎 ) 𝑓PB (e ) (13)
𝑢𝑏
by experiments in NGCF. Moreover, we use the same layer fusion (𝑘) (𝑘)
operation as that in NGCF, which, according to [4], is equivalent 𝐵 (𝑘) = (1 − 𝜆𝑏 )𝑓PA (e𝑢 𝑎 ) + 𝜆𝑏 𝑓PB (e 𝑏 ) (14)
𝑢
to using residual prediction. Specifically, for a connected user-item where 𝑙𝑢 𝑎 and 𝑙𝑢𝑏 are user-related weight factors for domain A and
pair (𝑢, 𝑖) in domain A, we define the propagation function of 𝑢’s B respectively, 𝜆𝑎 and 𝜆𝑏 are hyper-parameters within the range of
and 𝑖’s features as follows, [0, 1] that are used to control the retention ratio of user features in
(𝑘 ) (𝑘 )
Õ 1 
(𝑘 ) (𝑘 ) (𝑘 )

the corresponding domain. Note that 𝑙𝑢 𝑎 (𝑙𝑢𝑏 ) is calculated based
𝑓PA (e𝑢𝑎 ) = e𝑢𝑎 + p e𝑖 + e𝑖 ⊙ e𝑢𝑎
𝑖∈N𝑢𝑎 |N𝑢𝑎 | |N𝑖 | on the ratio of the number of 𝑢’s interacted items in 𝐷 A (𝐷 B ) to the
(9) number of items 𝑢 interacted with in both domains, i.e.,
(𝑘 ) (𝑘 )
Õ 1 
(𝑘 ) (𝑘 ) (𝑘 )

𝑓PA (e𝑖 ) = e𝑖 + e𝑢𝑎 + e𝑢𝑎 ⊙ e𝑖
|N𝑢 𝑎 |
p
|N𝑢𝑎 | |N𝑖 |
𝑢∈N𝑖 𝑙𝑢 𝑎 = (15)
where N𝑢 𝑎 and N𝑖 denote the first-hop neighbors of user 𝑢 and |N𝑢 𝑎 | + |N𝑢𝑏 |
(𝑘) (𝑘)
item 𝑖, e𝑖 and e𝑢 𝑎 are the representations of item 𝑖 and user 𝑢 𝑙𝑢𝑏 = 1 − 𝑙𝑢 𝑎 (16)

889
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

It is clear to see that Equation (12) adopts the idea of feature propa- Table 1: Statistics of the datasets
gation on the graph, which forms improved feature by aggregating Dataset #Users #Items #Interactions Density
(𝑘) (𝑘)
the features from 𝑓pA (e𝑢 𝑎 ) and 𝑓pB (e 𝑏 ). In this way, we can regard Elec 3,325 39,463 118,879 0.091%
𝑢
𝐶 (𝑘) as the common features derived for 𝑢 under both 𝐷 A and 𝐷 B . Cell 3,325 18,462 53,732 0.088%
Moreover, the design of the weight factors 𝑙𝑢 𝑎 and 𝑙𝑢𝑏 indicates that Sport 9,928 32,310 102,540 0.032%
more interaction data there is for 𝑢 in 𝐷 A (or 𝐷 B ), more features in Cloth 9,928 41,303 97,757 0.024%
𝐷 A (or 𝐷 B ) will contribute to the common features, and vice versa. Sport 4,998 22,101 55,556 0.050%
𝐴 (𝑘) and 𝐵 (𝑘) are used to preserve the users’ domain-specific Cell 4,998 14,618 47,444 0.065%
features in 𝐷 A and 𝐷 B respectively. We use the hyper-parameters Elec 15,761 53,309 226,626 0.027%
𝜆𝑎 and 𝜆𝑏 to control the retention ratio. For example, when 𝜆𝑎 = Cloth 15,761 51,865 136,844 0.017%
𝜆𝑏 = 1.0, it indicates that 100% of the user features in 𝐷 A and 𝐷 B
is retained, so user features have the largest specificity in both
domains. When 𝜆𝑎 = 𝜆𝑏 = 0.5, the specificity in users’ features 5 EXPERIMENT
disappears and the same users have the same features in both
We first describe the experimental setup in Section 5.1, and then
domains, and the transfer module in BiTGCF becomes the same as
compare the proposed model with state-of-the-art methods in Sec-
that in existing CDCF methods.
tion 5.2. To justify the effectiveness of the transfer learning module,
Finally, we emphasis equally on the obtained common features
we study its adaptability in Sections 5.3, as well as its performance
and domain-specific features to balance them, as shown in Equa-
under different data sparsity levels in Section 5.4. Finally, the impact
tions (10) and (11). This operation, though simple, is very effective
factor analysis is presented in Section 5.5.
in keeping the stability of the model performance, as validated in
our experimental study.
5.1 Experimental setup
4.4 Model Training 5.1.1 Dataset. We evaluate our proposed model on real-world
datasets from Amazon dataset1 , including two couple datasets,
The deep model calculates the gradient to update the parameters
Electronics (Elec for short) & Cell Phones (Cell for short), and
through the loss function. Therefore, a suitable loss function should
Accessories, Sports and Outdoors (Sport for short) & Clothing
not only avoid the model falling into local optimization but also
Shoes and Jewelry (Cloth for short). Moreover, in order to show
accelerate the model convergence. For recommender systems, two
that BiTGCF still has good transfer capability in domains with
types of loss functions are widely used, point-wise which focuses on
lower similarity, the cross pairing of the two groups, Sport & Cell
predicting scores more accurately, and pair-wise [8] which focuses
and Elec & Cloth, are used as the third and fourth couple datasets.
on learning rank more accurately. In this paper, we consider point-
For the data in these four couple datasets, we first transform them
wise loss function and leave the pair-wise version to our future
into implicit data, where each entry is marked as 0 or 1, indicating
work. The most commonly used point-wise loss is the squared loss
whether the user has rated the item. Then, we filter the datasets
(SE), but it is not suitable for implicit feedback. Hence, following
to retain users with number of ratings greater than 5 and items
several previous work, we employ the binary cross-entropy function
with number of ratings greater than 10, and extract the overlapping
as the loss function, which can be defined as,
users in both domains. Table 1 summarizes the detailed statistics of
𝐿 (𝑟ˆ𝑢𝑖 , 𝑟𝑢𝑖 ) = −
Õ
𝑟𝑢𝑖 logˆ𝑟𝑢𝑖 + (1 −𝑟𝑢𝑖 log(1 − 𝑟ˆ𝑢𝑖 )) +𝜆 ∥Θ ∥ 22 (17)
the four couple datasets.
(𝑢,𝑖 ) ∈𝑅 + ∪𝑅 −
5.1.2 Evaluation Protocol. We adopt the widely used leave-one-
where 𝑅 + is the set of observed interaction history, and 𝑅 − is the out evaluation method. Specifically, we take a random sample from
set of randomly sample from unobserved interaction. 𝜆 controls each user’s interaction history as the test set, and the remaining are
the 𝐿2 regularization strength to prevent overfitting. Θ = {𝐸𝑢0 , 𝐸𝑖0 } utilized for training. Then we randomly select 99 items from each
user’s non-interacted items to form negative samples. The recom-
and 𝐸𝑢0 (𝐸𝑖0 ) is the initial embedding matrix for all users (items). In
mendation model would predict 100 records (99 negative samples
order to improve the accuracy on both domains, we define the joint
and 1 positive sample) of the user and output top-𝑁 items. We use
loss function as follows,
the commonly used Hit Ratio (HR) and Normalized Discounted
𝐿 𝑗𝑜𝑖𝑛 = min A , 𝑟 A ) + 𝐿 (𝑟ˆB , 𝑟 B )
𝐿 (𝑟ˆ𝑢𝑖 (18) Cumulative Gain (NDCG) to evaluate the ranking performance. For
𝑢𝑖 𝑢𝑗 𝑢𝑗
𝑓TA ,𝑓PA ,𝑓TB ,𝑓PB both measures, we truncate the ranked list at 10. Hence, the HR
measures whether the test item is present on the top-10 list, and
We adopt mini-batch Adam to optimize the model and update the pa- the NDCG measures the ranking quality by assigning higher scores
rameters. Moreover, similar to NGCF, we introduce dropout mech- to hits at top ranks.
anisms to prevent neural networks from overfitting. Specifically,
we drop out the messages being propagated in Equation (9) with 5.1.3 Compared Methods. We compare BiTGCF with both single
a probability in training, and disable this operation during testing. domain and cross domain recommendation models. We leave out
Note that BiTGCF is a bi-directional transfer mode in which data the comparison with DDTCDR [17] and DARec [34], because they
from two domains participate in the training at the same time, but
are evaluated separately. 1 http://jmcauley.ucsd.edu/data/amazon/

890
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

Table 2: Performance comparison in terms of HR and NDCG. Best performance is in boldface and the best baselines are un-
derlined.
Single domain Methods Cross domain Methods Ours
Dataset Metrics
BPRMF+ MLP+ NGCF+ LightGCN+ CMF CDFM CoNet PPGN GCF+ BiTGCF
HR 0.3471 0.4448 0.4665 0.4701 0.3894 0.3998 0.4484 0.4337 0.5871 0.6036
Elec
NDCG 0.2224 0.2814 0.2969 0.2990 0.2457 0.2513 0.2890 0.2664 0.3682 0.3767
HR 0.4271 0.4848 0.4983 0.5185 0.4239 0.4117 0.4899 0.4386 0.6198 0.6571
Cell
NDCG 0.2919 0.3108 0.3358 0.3253 0.2631 0.2599 0.3102 0.2773 0.4119 0.4210
HR 0.2784 0.3660 0.4251 0.4623 0.3108 0.3196 0.3743 0.3484 0.5421 0.5346
Sport
NDCG 0.1778 0.2169 0.2681 0.2910 0.1837 0.1874 0.2254 0.2038 0.3414 0.3375
HR 0.2250 0.3464 0.3723 0.4011 0.2930 0.2501 0.3442 0.3239 0.5101 0.5242
Cloth
NDCG 0.1348 0.2073 0.2359 0.2570 0.1612 0.1509 0.2090 0.1819 0.3067 0.3113
HR 0.2990 0.3663 0.3858 0.3978 0.3157 0.3345 0.3700 0.3316 0.5018 0.5442
Sport
NDCG 0.2025 0.2368 0.2527 0.2498 0.1930 0.2163 0.2392 0.2260 0.3122 0.3374
HR 0.3315 0.4764 0.4926 0.4970 0.3918 0.4574 0.4824 0.4418 0.5486 0.5654
Cell
NDCG 0.2211 0.3080 0.3173 0.3074 0.2286 0.1605 0.3134 0.2851 0.3566 0.3709
HR 0.2780 0.4877 0.5007 0.5086 0.4580 0.4106 0.4931 0.4633 0.5357 0.5359
Elec
NDCG 0.1761 0.3191 0.3235 0.3288 0.3032 0.2817 0.3220 0.2920 0.3549 0.3575
HR 0.2262 0.3562 0.3632 0.3662 0.3196 0.3017 0.3461 0.3221 0.4003 0.4384
Cloth
NDCG 0.1371 0.2160 0.2238 0.2199 0.1907 0.1801 0.2015 0.1934 0.2461 0.2555

both use Auto-encoder as a feature extractor for pre-training, which shares the features of users learned from the joint interaction
is different from the end-to-end form of the methods to be evaluated. graph by stacking multiple graph convolution layers. Finally,
it inputs the learned embeddings to the domain-specific MLP
• BPRMF [26] is a classical single-domain model, which learns
structure to learn the matching function.
the user and item factors via matrix factorization and pair-
• GCF is a degenerate version of BiTGCF without the feature
wise rank loss.
transfer layer. It is a single-domain GCN based model.
• MLP [10] is single-domain deep model, which uses deep
neural networks to learn the matching function. Note that in order to exclude the influence of loss function on
• NGCF [29] is a single-domain GCN based model. It first cap- the results, we change the loss function of NGCF and LightGCN
tures the high-order connectivity information in the embed- from BPR to cross entropy, so as to unify the loss function among
ding function by stacking multiple embedding propagation all the methods based on deep learning. Moreover, in order to more
layers, and then concatenates the obtained embeddings and accurately evaluate the effect of our transfer module, for single-
uses inner product to make prediction. domain methods including BPRMF, MLP, NGCF, LightGCN and
• LightGCN [9] is also a single-domain GCN based model GCF, we use ‘model-name+’ to denote the same model but with
evolved from NGCF. It simplifies the design in the feature mixed datasets as the training set. For example, given MLP+ and a
propagation component by removing the non-linear activa- couple dataset Elec&Cell, MLP+ will use Elec combining with Cell
tion and the transformation matrices. Moreover, it adopts a as the training set, and then test on Elec and Cell respectively.
different layer combination strategy from NGCF. 5.1.4 Parameter Settings. For BPRMF, we use the BPRMF class in
• CMF [28] is a multi-relation learning approach which factor- LightFM2 , a popular CF library, for training and vary the number of
izes matrices of domains A and B simultaneously by sharing epoch from 0 to 40 to get the best result. For CMF, we use a Python
the user latent factors. It is a shallow model, which first version reference to the original Matlab code and the parameters
jointly learns on two domains, and then optimizes the target are randomly initialized from Gaussian N (0, 0.01). For CDFM, we
domain. process and code the data according to the paper, then feed it to
• CDFM [20] is a cross-domain shallow model. It takes user PyFM3 for training. For MLP4 and CoNet5 with deep structure, we
interaction history from auxiliary domains as context to maintain the optimal configuration in their papers, the configura-
generate a recommendation on the target domain with fac- tion of the hidden layers is [64, 32, 16, 8] and the ratio of negative
torization machines. sampling is set to 4. For NGCF6 and LightGCN7 , we use the pub-
• CoNet [11] is a cross-domain deep model, which transfers lished source code and only change the loss function from BPR to
knowledge across domains by cross connections between cross entropy. Moreover, we set the embedding propagation layer
the base networks. It jointly learns on two domains and
optimizes both domains simultaneously. Note that our model 2 https://github.com/lyst/lightfm
3 https://github.com/coreylynch/pyFM
BiTGCF is different from CoNet in that it transfers user-item
4 https://github.com/hexiangnan/neural_collaborative_filtering
interaction features, while BiTGCF transfers user features. 5 http://home.cse.ust.hk/ghuac/
• PPGN [35] is a cross-domain deep model, which fuses the 6 https://github.com/xiangwang1223/neural_graph_collaborative_filtering

interaction information of the two domains into a graph, and 7 https://github.com/kuandeng/LightGCN

891
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

as [64, 64, 64], the learning rate as 0.001, the size of mini batch as 0.65 Elec_Cell Sport_Cloth Elec_Cell Sport_Cloth
Cell_Elec Cloth_Sport 0.40 Cell_Elec Cloth_Sport
1024, the ratio of negative sampling as 4, and the message dropout 0.60

0.55 0.35
ratio as 0.1 for NGCF and 0 for LightGCN, respectively. For PPGN,

NDCG
HR
0.50 0.30
we use the source code provided by the author and change the data 0.45
0.25
pipeline. We also set the ratio of negative sampling as 4, and turn 0.40

0.35 0.20
the number of GCN layers from 3 to 5 as the paper states. For GCF

CF

CN

G *
CF

CF

CF

CN

gh N+

G *
CF

CF
CF

CN

CF

CN
CF

CN

CF

CF

CF
and BiTGCF, we implement them by TensorFlow, and use the same

TG

TG
G

C
G

G
G

tG
N

N
G

G
t

t
tG

tG
N

N
gh

gh
Bi

Bi
t
N

N
gh
gh

gh
Li

Li
Li

Li
Li

Li
parameter settings as that of NGCF. We use the Xavier initializer Methods Methods

to initialize the parameters of NGCF, LightGCN and our methods. (a) HR (b) NDCG
Moreover, early stopping is performed if HR@10 on the test data
does not increase for 5 successive epochs. Note that CoNet, PPGN 0.55
Elec_Cloth
Cloth_Elec
Sport_Cell
Cell_Sport
0.36
Elec_Cloth
Cloth_Elec
Sport_Cell
Cell_Sport

and BiTGCF are bi-directional transfer models, which means we 0.50


0.34
0.32
can obtain the results on a couple dataset at the same time, while 0.30

NDCG
HR
0.45 0.28
the performance of single domain methods are obtained by first 0.26
0.40
0.24
training on the mixed datasets and then evaluating separately on 0.22
0.35
each dataset. 0.20

CF

CN

G *
CF

CF

CF

CN

G *
CF

CF
CF

CN

CF

CN
CF

Li CN

CF

CF

Li CN

CF
G

TG

TG
tG

tG
G

G
tG

tG
N

N
G

G
tG

tG
N

N
gh

gh
Bi

Bi
N

N
gh

gh
gh

gh
Li

Li
Li

Li
Methods Methods
5.2 Performance Comparison
(c) HR (d) NDCG
Table 2 shows the summarized results of our experiments on the
four-couple datasets in terms of two metrics, HR@10 and NDCG@10. Figure 5: The applicability of the transfer module.
Due to space concern, the performance of the single-domain meth-
ods under the single dataset is not shown in Table 2. In fact, from
our experiments, the performance of the single-domain models gaps might come from that it uses the same propagation layer
under mixed dataset in general is better than that under the single on the joint graph of the two domains. BiTGCF achieves the
dataset, and partial results regarding this point will be reported in best performance among the evaluated single-domain and
Section 5.3. From Table 2, we have the following key observations: cross-domain methods on all the datasets, which indicates
that it is on the right track to exploit the high-order connec-
• For single domain methods, MLP+ consistently outperforms
tions in-domain and inter-domain by using graph structures
BPRMF+, demonstrating the importance of exploring nonlin-
and fuse common features with domain-specific features
ear interaction relation between users and items. GCN-based
through the feature transfer module.
methods, including NGCF+, LightGCN+ and GCF+, consis-
tently outperforms MLP+. This validates the effect of mining
high-order connectivities for better recommendation perfor- 5.3 The Applicability of the Transfer Module
mance. The improvement of LightGCN+ over NGCF+ may In order to verify the applicability of our transfer module, we incor-
come from the enhancement of its generalization ability, porate GCN-based models NGCF and LightGCN with our transfer
which is less prone to overfitting. Further, compared with module, and name the derived methods as NGCF* and LightGCN*.
NGCF+ and LightGCN+, GCF+ retains the inner product and Figure 5 shows the results. In Figure 5(a) and 5(b), the two domains
the self-connection operation to appropriately increase the of the two couple datasets (Elec&Cell, Sport&Cloth) have higher
feature flow among nodes. This might be the reason that similarity, while in Figure 5(c) and 5(d), the two domains of the two
GCF+ can gain superior performance over LightGCN+. couple datasets (Elec&Cloth, Sport&Cell) have lower similarity. In
• Cross domain methods vs. single domain methods. The su- order to indicate the pairing information of the current dataset, the
perior performance of CMF and CDFM over BPRMF+ pro- expression of the ’target’_’source’ domain is adopted. For example,
vides evidence that compared to directly mixing the two Elec_Cell means the Elec dataset with the participation of Cell.
datasets, the transferred knowledge from auxiliary domain Comparing the first point with the second point in each method,
does improve the performance of the target domain. The we can observe a growth trend in Figure 5(a) and 5(b) for all the
performance of CoNet is better than MLP+ in most cases, methods in most cases, as well as a downward trend in Figure 5(c)
which validates the effect of the cross connect unit, how- and 5(d). The reason might be that, the mixing of data in two similar
ever the improvement is very limited than direct mixing domains increases the amount of training set, which promotes the
the two datasets. Moreover, we can see that NGCF beats training of the model and in turn improves the performance. On
CoNet on almost all the datasets. This demonstrates that the contrary, when the two domains are not similar, simply mixing
the transfer learning mechanism, if not well designed, is them may lead to model confusion, which makes it difficult to train
not as effective as mining high-order connectivities for bet- and thus degrade the performance.
ter embeddings. Surprisingly, PPGN is worse than NGCF Comparing the third point with the first two in each method,
and even worse than CoNet. This might be caused by the we can conclude that the methods with transfer mechanism in
datasets, since PPGN has been shown to outperform CoNet general can achieve the best result. This validates the effectiveness
on CD&Music and Book&Movie in [35]. We guess the poor of our transfer module. In summary, compared with directly mixing
performance of PPGN on datasets with large distribution datasets, knowledge transfer yields to better performance.

892
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

Table 3: Statistics of splited datasets. 0.450


L=0 L=0
0.65 L=1 0.425 L=1
Dataset Groups Num≤ #Users #Interactions Density 0.60
L=2
L=3
0.400 L=2
L=3
0.375
G1 8 2769 17008 0.028%

NDCG
HR
0.350
0.55
G2 14 1306 14207 0.049% 0.325

Sport 0.50 0.300


G3 30 726 14422 0.090% 0.275
0.45 0.250
G4 323 197 9919 0.23% Elec_Cell Cell_Elec Sport_ClothCloth_Sport Elec_Cell Cell_Elec Sport_ClothCloth_Sport
Layer Num Layer Num
G1 6 2209 11930 0.037%
G2 10 1638 13256 0.055% (a) HR (b) NDCG
Cell
G3 20 893 12410 0.095% 0.58 0.38
G4 155 258 9848 0.26% 0.56
L=0
L=1 0.36
L=0
L=1
0.54 L=2 0.34 L=2
0.52 L=3 L=3
0.32

NDCG
0.50

HR
0.30
Table 4: Results on Sport_Cell. The improvement is the re- 0.48
0.46
0.28

sult of BiTGCF to the best one from GCF and GCF+. 0.44 0.26

0.42 0.24
0.40 0.22
Metrics Methods G1 G2 G3 G4 Sport_Cell Cell_sport Elec_Cloth Cloth_Elec
Layer Num
Sport_Cell Cell_sport Elec_Cloth Cloth_Elec
Layer Num
GCF 0.5081 0.5253 0.5840 0.6447
(c) HR (d) NDCG
GCF+ 0.4778 0.5061 0.5606 0.5935
HR
BiTGCF 0.5314 0.5330 0.5923 0.6374 Figure 6: Effect of Transfer Layer Number
Improv. 4.59% 1.47% 1.42% -1.13%
GCF 0.3061 0.3307 0.3747 0.4267 0.675
0.44
0.42
0.650
GCF+ 0.3001 0.3125 0.3536 0.4035 Elec_Cell Sport_Cloth
0.40
Elec_Cell Sport_Cloth
NDCG 0.625 Cell_Elec Cloth_Sport Cell_Elec Cloth_Sport

BiTGCF 0.3230 0.3326 0.3760 0.4301 0.38

NDCG
0.600
HR 0.36
Improv. 4.87% 1.03% 1.54% 1.75% 0.575
0.34
0.550
0.32
0.525
0.30
0.500

Table 5: Results on Cell_Sport. The improvement is the re- 0.5 0.6 0.7
λ
0.8 0.9 1.0 0.5 0.6 0.7
λ
0.8 0.9 1.0

sult of BiTGCF to the best one from GCF and GCF+.


(a) HR (b) NDCG
Metrics Methods G1 G2 G3 G4
GCF 0.5147 0.5549 0.6036 0.7093 0.56
0.36
0.54
GCF+ 0.5093 0.5516 0.6011 0.6852 0.34
HR 0.52

NDCG
BiTGCF 0.5283 0.5720 0.6103 0.686 0.32
HR

0.50
Elec_Cloth Sport_Cell 0.30 Elec_Cloth Sport_Cell
Improv. 2.64% 3.08% 1.11% -3.28% 0.48 Cloth_Elec Cell_Sport
0.28
Cloth_Elec Cell_Sport
0.46
GCF 0.3340 0.3530 0.4032 0.4910 0.44 0.26

GCF+ 0.3290 0.3506 0.4015 0.4789 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
NDCG λ λ
BiTGCF 0.3419 0.3636 0.4251 0.4787
(c) HR (d) NDCG
Improv. 2.37% 3% 5.43% -2.51%
Figure 7: Effect of 𝜆

5.4 Performance of the Transfer Module w.r.t 5.5 Impact Factor Analysis
Data Sparsity Levels 5.5.1 Number of feature transfer layers. The feature transfer mod-
The number of user interactions is an important factor that affects ule is the core of BiTGCF, and its effect is verified by changing the
the recommendation performance. In order to test the influence number of transfer layers with fixed number of features propaga-
of data sparsity levels, we divided the dataset into groups with tion layer. In particular, we vary the layer number 𝐿 in the range of
different sparsity according to the number of user interactions. {0, 1, 2, 3}. More specifically, based on the single-domain model GCF
Specifically, with the principle of minimizing the difference in the (with three propagation layers but non transfer layers), the transfer
number of interactions between each group, all users are divided layer is added from the top layer. For example, when 𝐿 = 1, one
into four groups G1, G2, G3 and G4 in the order of increasing feature transfer component is added in the last feature propagation
number of interactions. Table 3 shows the splited results of Sport & layer; when 𝐿 = 2, two feature transfer components are added in
Cell, in which users with the number of interactions no more than 8, the last two feature propagation layers.
14, 30 and 323 are divided into G1, G2, G3 and G4, respectively. From From Figure 6, we can see that with the increase of 𝐿, the perfor-
Table 4, we can find that the improvement achieved in the first two mance of BiTGCF also increases. The reason may come from that
groups (e.g., 4.59% and 1.47%) are more significant than that of the the feature transfer module communicates different domains, en-
last two (e.g.,1.42% and -1.13%). The result indicates that the feature abling the model to mine higher-order relationships between nodes
transfer module can help improve recommendation performance by stacking multiple features propagation and transfer layers. More-
for relatively inactive user (with less interaction items). A similar over, it can be seen from Figure 6(c) and 6(d) that when 𝐿 = 3, the
trend can be observed from Table 5. performance of BiTGCF degrades on some datasets. This indicates

893
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland

that mining for higher-order connectivities is not always beneficial [7] Sheng Gao, Hao Luo, Da Chen, Shantao Li, Patrick Gallinari, and Jun Guo. 2013.
for the performance improvement. However, when 𝐿 = 0, the per- Cross-domain recommendation via cluster-level latent factor model. In Proceed-
ings of ECML-PKDD. 161–176.
formance of BiTGCF is always the worst, which demonstrates the [8] Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized
effect of our transfer module. ranking from implicit feedback. In Proceedings of Thirtieth AAAI. 144–150.
[9] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng
Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network
5.5.2 Hyper-parameters 𝜆𝑎 & 𝜆𝑏 . In this set of experiments, we for Recommendation. In Proc. of SIGIR. 639–648.
analyze the influence of hyper-parameters 𝜆𝑎 and 𝜆𝑏 on the perfor- [10] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative filtering. In Proceedings of WWW. 173–182.
mance. For simplicity, we always consider a single hyper-parameter [11] Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. Conet: Collaborative cross
𝜆 = 𝜆𝑎 = 𝜆𝑏 . Figure 7 shows the results, with a step size of 0.1, networks for cross-domain recommendation. In Proceedings of CIKM. 667–676.
ranging in [0.5, 1]. Figure 7(a) and 7(b) show the results of BiTGCF [12] Guangneng Hu, Yu Zhang, and Qiang Yang. 2019. Transfer Meets Hybrid: A
Synthetic Approach for Cross-Domain Collaborative Filtering with Text. In Pro-
on Elec & Cell and Sport & Cloth, respectively. The overall best ceedings of WWW. ACM, 2822–2829.
result is achieved when 𝜆 is around 0.5 and 0.6. The results on Sport [13] Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, and Can Zhu.
2013. Personalized recommendation via cross-domain triadic factorization. In
& Cell and Elec & Cloth are shown in Figure 7(c), 7(d), where the Proc. of WWW. 595–606.
best result appears when 𝜆 = 0.8. The result implicitly reflects that, [14] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with
when the similarity between the two domains is low, it is necessary Graph Convolutional Networks. In Proceedings of ICLR (Poster).
[15] Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Can movies and books collaborate?
to retain more domain-specific features. cross-domain collaborative filtering for sparsity reduction. In Proceedings of
Twenty-First International Joint Conference on Artificial Intelligence. 2052–2057.
[16] Guohao Li, Matthias Müller, Ali K. Thabet, and Bernard Ghanem. 2019. Deep-
6 CONCLUSION GCNs: Can GCNs Go As Deep As CNNs?. In Proceedings of ICCV. 9266–9275.
[17] Pan Li and Alexander Tuzhilin. 2020. DDTCDR: Deep Dual Transfer Cross
In this paper, we proposed BiTGCF for Top-𝑁 cross-domain recom- Domain Recommendation. In Proc. of WSDM. ACM, Houston, USA, 331–339.
mendation by combining the idea of high order feature propagation [18] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations:
in graph structure with transfer learning. Inspired by NGCF and Item-to-item collaborative filtering. IEEE Internet computing 1 (2003), 76–80.
[19] Jian Liu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Fuzhen Zhuang, Jiajie
LightGCN, we designed a new feature propagation module, which Xu, Xiaofang Zhou, and Hui Xiong. 2019. Deep Cross Networks with Aesthetic
simplifies the feature propagation in a more reasonable manner to Preference for Cross-domain Recommendation. CoRR abs/1905.13030 (2019).
http://arxiv.org/abs/1905.13030
reduce the risk of model over-fitting but still preserve some non- [20] Babak Loni, Yue Shi, Martha A. Larson, and Alan Hanjalic. 2014. Cross-Domain
linearity to boost the recommendation performance. Moreover, we Collaborative Filtering with Factorization Machines. In Proceedings of 36th Euro-
took the first time to consider the domain-specific features in differ- pean Conference on Advances in Information Retrieval Research. 656–661.
[21] Zhongqi Lu, Weike Pan, Evan Wei Xiang, Qiang Yang, Lili Zhao, and Erheng
ent domains when refining user features, and designed a simple but Zhong. 2013. Selective Transfer Learning for Cross Domain Recommendation.
effective balancing mechanism to balance user’s common features In Proceedings ICDM. SIAM, 641–649.
and domain-specific features. By combining inter-domain feature [22] Zhongqi Lu, Yin Zhu, Sinno Jialin Pan, Evan Wei Xiang, Yujing Wang, and Qiang
Yang. 2014. Source Free Transfer Learning for Text Classification. In Proceedings
transfer with in-domain feature propagation, our model realizes of AAAI. AAAI Press, 122–128.
more efficient representation and transfer of users’ common fea- [23] Jingwei Ma, Jiahui Wen, Mingyang Zhong, Weitong Chen, and Xue Li. 2019.
MMM: Multi-source Multi-net Micro-video Recommendation with Clustered
tures and their integration with domain-specific features. Remark- Hidden Item Representation Learning. Data Science and Engineering 4, 3 (2019),
able performance improvements on several benchmark datasets 240–253.
demonstrate the effectiveness of our BiTGCF model. [24] Orly Moreno, Bracha Shapira, Lior Rokach, and Guy Shani. 2012. TALMUD:
transfer learning for multiple domains. In Proceedings of CIKM. ACM, 425–434.
[25] ThaiBinh Nguyen and Atsuhiro Takasu. 2018. NPE: Neural Personalized Embed-
ding for Collaborative Filtering. In Proceedings of IJCAI. 1583–1589.
ACKNOWLEDGMENTS [26] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
The work was supported by the National Natural Science Foun- 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proc. of
UAI. 452–461.
dation of China under Grant No. 61672252, and the Fundamen- [27] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
tal Research Funds for the Central Universities under Grant No. collaborative filtering recommendation algorithms. In Proc. of WWW. 285–295.
2019kfyXKJC021. [28] Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective
matrix factorization. In Proceedings of SIGKDD. ACM, 650–658.
[29] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.
Neural Graph Collaborative Filtering. In Proc. of the 42nd ACM SIGIR. 165–174.
REFERENCES [30] Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and
[1] Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, An- Kilian Q. Weinberger. 2019. Simplifying Graph Convolutional Networks. In
drew Stephen McGough, and Boguslaw Obara. 2019. Exploring the Semantic Proceedings of ICML. 6861–6871.
Content of Unsupervised Graph Embeddings: An Empirical Study. Data Science [31] Yuexin Wu, Hanxiao Liu, and Yiming Yang. 2018. Graph Convolutional Matrix
and Engineering 4, 3 (2019), 269–289. Completion for Bipartite Edge Prediction. In Proceedings of IC3K. 49–58.
[2] Bin Cao, Nathan Nan Liu, and Qiang Yang. 2010. Transfer Learning for Collective [32] Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong.
Link Prediction in Multiple Heterogenous Domains. In Proceedings of ICML. 2019. Deep Item-based Collaborative Filtering for Top-N Recommendation. ACM
Omnipress, 159–166. Trans. Inf. Syst. 37, 3 (2019), 33:1–33:25.
[3] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat- [33] Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen. 2017.
Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation Deep Matrix Factorization Models for Recommender Systems. In Proceedings of
with Item- and Component-Level Attention. In Proc. of SIGIR. ACM, 335–344. IJCAI. 3203–3209.
[4] Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting [34] Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. DARec: Deep Domain
Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns.
Network Approach. In Proc. of AAAI. 27–34. In Proceedings of IJCAI. 4227–4233.
[5] Zhi-Hong Deng, Ling Huang, Chang-Dong Wang, Jian-Huang Lai, and Philip S. [35] Cheng Zhao, Chenliang Li, and Cong Fu. 2019. Cross-Domain Recommendation
Yu. 2019. DeepCF: A Unified Framework of Representation Learning and Match- via Preference Propagation GraphNet. In Proceedings of CIKM. 2165–2168.
ing Function Learning in Recommender System. In Proceedings of AAAI. 61–68. [36] Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S. Yu. 2018. Spectral
[6] Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative Memory Network for collaborative filtering. In Proc. of ACM RecSys. 311–319.
Recommendation Systems. In Proc. of SIGIR. 515–524.

894

You might also like