Cross Domain Recommendation Via Bi-Directional Transfer Graph Collaborative Filtering Networks
Cross Domain Recommendation Via Bi-Directional Transfer Graph Collaborative Filtering Networks
ABSTRACT of users, has become one of the most important services on the
Data sparsity is a challenge problem that most modern recom- Internet. The personalized recommendation aims to predict a group
mender systems are confronted with. By leveraging the knowl- of items that users are more likely to purchase in the future by fully
edge from relevant domains, the cross-domain recommendation employing uses’ historical interactions.
technique can be an effective way of alleviating the data sparsity Collaborative filtering (CF) is a widely used method [18, 27]
problem. In this paper, we propose a novel Bi-directional Transfer for personalized recommendation, which learns the recommender
learning method for cross-domain recommendation by using Graph model based on interaction history of similar users or items. Gen-
Collaborative Filtering network as the base model (BiTGCF). BiT- erally speaking, the key component in CF models is to learn the
GCF not only exploits the high-order connectivity in user-item latent features (embeddings) of users and items effectively and
graph of single domain through a novel feature propagation layer, then perform the prediction based on these embeddings. The tradi-
but also realizes the two-way transfer of knowledge across two tional CF methods, represented by matrix factorization (MF), obtain
domains by using the common user as the bridge. Moreover, dis- the latent factors of users and items by factorizing the user-item
tinct from previous cross-domain collaborative filtering methods, interactive matrix [26]. Neural CF models, such as NeuMF [10],
BiTGCF fuses users’ common features and domain-specific features replace inner product by multiple neural network layers to obtain
during transfer. Experimental results on four couple benchmark effectively matching function [5, 6, 23, 25]. Due to the powerful
datasets verify the effectiveness of BiTGCF over state-of-the-art nonlinearity fitting ability of neural network, neural CF models in
models in terms of bi-directional cross domain recommendation. general can get better fitting results and have gradually become
the mainstream.
CCS CONCEPTS In recent years, inspired by the success of graph convolutional
networks (GCN) in effectively extracting features in non-Euclidean
• Information systems → Recommender systems; • Comput-
spaces, some researchers try to exploit the user-item bipartite graph
ing methodologies → Transfer learning.
structure by propagating embeddings on it, aiming at achieving
more effective embeddings [1, 4, 14, 31]. For example, Wang et
KEYWORDS
al. [29] proposed NGCF, which follows the same propagation rules
Recommender Systems; Collaborative Filtering; Transfer Learning; as in GCN (including feature transformation, neighborhood ag-
Graph Convolution Network gregation and nonlinear activation) to capture the high-order con-
ACM Reference Format: nectivity between users and items by stacking multiple feature
Meng Liu, Jianjun Li, Guohui Li, and Peng Pan. 2020. Cross Domain Recom- propagation layers, and achieves promising results. Recently, He et
mendation via Bi-directional Transfer Graph Collaborative Filtering Net- al. [9] found two common designs in GCNs, transformation function
works. In The 29th ACM International Conference on Information and Knowl- and nonlinear activation, have no positive effect on collaborative fil-
edge Management (CIKM ’20), October 19–23, 2020, Virtual Event, Ireland. tering or even degrade the performance. They proposed LightGCN,
ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3340531.3412012
which greatly simplifies the design of NGCF but can yield better
performance. In general, the integration of higher-order neighbor
1 INTRODUCTION information makes GCN-based methods a great success. However,
With the rapid increase of commodity types and quantities, person- due to the large number of items in real life, recommender systems
alized recommendation, which can predict the purchase intention inevitably face the problem of data sparsity, which has become the
∗ Jianjun Li is the corresponding author.
main factor limiting the effectiveness of existing models.
An effective solution to the data sparsity problem is transferring
Permission to make digital or hard copies of all or part of this work for personal or knowledge [13] from other related domains by transfer learning. In
classroom use is granted without fee provided that copies are not made or distributed real life, a user inevitably interacts with multiple domains to meet
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM the demand of her life. When the interaction history is less in do-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, main A, it is natural to consider getting some common knowledge
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
from correlated domain B that includes more data. In recent years,
CIKM ’20, October 19–23, 2020, Virtual Event, Ireland cross-domain collaborative filtering (CDCF) [2, 11, 12, 15, 22, 28]
© 2020 Association for Computing Machinery. has attracted increasing research attention. But like every coin
ACM ISBN 978-1-4503-6859-9/20/10. . . $15.00
https://doi.org/10.1145/3340531.3412012
885
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
has two sides, the correlations between domains make CDCF pos-
Pre-CDCF Bi-TGCN
sible, the differences between domains also render it difficult to
transfer knowledge. Early on, CodeBook Transfer (CBT) [15] was Writing
Style,
proposed to first compress the dense rating matrix of auxiliary Author
domain into cluster-level rating pattern, called codebook, by or-
Type
thogonal nonnegative matrix tri-factorization (ONMTF), and then Extract
Extract Common
realize knowledge transfer by sharing the codebook. Later, some and Share Features Fuse with
Music Common Domain-
variants of CBT that follow the similar transfer mechanism have Frames Features Specific
been proposed [7, 21, 24]. This kind of method does not require the Actors Features
886
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
to the prediction of current interactions. To this end, attention graph of two domains, and then transfers knowledge by sharing
mechanisms, such as ACF [3] and DeepICF [32], were introduced to user features. Compared with the shallow cross-domain matrix
automatically learn the importance of each historical interaction. factorization models, the deep transfer methods generally exhibit
better performance, due to their stronger feature extraction ability.
2.2 Graph Convolutional Networks based
Recommendation 3 PRELIMINARY
Inspired by the development of graph neural networks [14, 16, 30], 3.1 Problem Definition
there are some efforts on exploiting user-item interaction graph We consider two domains 𝐷 A and 𝐷 B . The set of users in both do-
to infer user’s preference. GC-MC [31] applies the graph convo- mains are shared, denoted by 𝑈 (of size 𝑚 = |𝑈 |). Let the set of items
lutional network to exploit the connections between users and in 𝐷 A and 𝐷 B be 𝐼 A (of size 𝑛𝑎 = |𝐷 A |) and 𝐼 B (of size 𝑛𝑏 = |𝐷 B |),
items when encoding interactive features. SpectralCF [36] utilizes respectively. The purpose of bi-directional cross domain transfer is
a spectral convolution operation to explore all possible connectiv- to improve the recommendation performance in both domains. We
ity between users and items in the spectral domain. However, the consider Top-𝑁 recommendation with implicit feedback in each
eigen-decomposition in SpectralCF, which is a necessary step, is domain. Let 𝑅A ∈ R𝑚×𝑛𝑎 (𝑅B ∈ R𝑚×𝑛𝑏 , resp.) denote the user-item
very time-consuming. Recently, Wang et al. [29] proposed the Neu- interaction matrix of 𝐷 A (𝐷 B , resp.) from users’ implicit feedback,
ral Graph Collaborative Filtering (NGCF) framework to integrate A ∈ {0, 1} (𝑟 B ∈ {0, 1}, resp.) is 1 if the interaction
where an entry 𝑟𝑢𝑖 𝑢𝑗
GCN into the embedding process. By stacking multiple embedding between user 𝑢 on item 𝑖 (item 𝑗, resp.) is observed, and 0 otherwise.
propagation layers, NGCF can capture the collaborative signal in The recommendation problem with implicit feedback is abstracted
high-order connectivities between users and items. However, its as learning a function to estimate the scores of unobserved entries
designs are rather burdensome. LR-GCCF [4] removes the non- in interaction matrix, which are later used for ranking. Specifically,
linear activation function to facilitate turning for large dataset. for domain A,
More importantly, it takes residual learning approach to explain the 𝐴
𝑟ˆ𝑢𝑖 = 𝑓 (𝑢, 𝑖 |Θ) (1)
reason of concatenating all the layer’s output. Later, LightGCN [9]
simplifies NGCF by removing the operations such as activation where 𝑓 is the interaction function, Θ represents all learnable pa-
𝐴 is the predicted score. For matrix factorization
rameters, and 𝑟ˆ𝑢𝑖
function and transformation function that have no positive impact
on collaborative filtering. (MF) techniques, the matching function is the fixed dot product.
For deep-learning based CF, such as NeuMF [10], the interaction is
2.3 Transfer Learning and Cross Domain implemented by non-linear neural networks.
Collaborative Filtering Obviously, extracting satisfactory embeddings for users and
items is the key for better recommendation. Recently, GCN has
In recent years, transfer learning has emerged as a new learning shown its powerful ability in capturing the collaborative signal in
framework to address the data sparsity problem by extracting and high-order connectivities for more effective embedding learning.
transferring knowledge from related domain. Cross Domain Col- In view of this, in our transfer learning approach for cross-domain
laborative Filtering (CDCF) is the application of transfer learning recommendation, each domain is also modeled by a graph convolu-
in recommendation, which focuses on how to transfer knowledge tional network, and the GCN in both domains are jointly learned
(features) in an effective way. to improve the performance through bidirection high-order fea-
The way to transfer knowledge is various, such as Collective ma- ture transfer. Before introducing our model in detail, we briefly
trix factorization (CMF) [28] and codebook transfer [7, 15], which review two representative recommendation models based on GCN,
are based on Matrix Factorization (MF) applied in each domain. NGCF [29] and LightGCN [9], in the following subsection.
These approaches transfer interaction information from an auxil- .
iary domain to improve the performance in a target domain with
shallow model. Specifically, CMF jointly factorizes the rating matrix 3.2 Brief Review of NGCF and LightGCN
from two domains by sharing the user latent factors. This method
effectively realizes the transfer and improvement of common user The basic idea of GCN is to aggregate the features of neighbors so
hidden features. The rise of deep learning contributes a lot to the as to obtain better feature expression of nodes on the graph. GCN
development of CDCF, and some studies have tried to fuse CDCF layer in general can be abstracted as:
with deep learning, such as CoNet [11] and its heterogeneous vari- (𝑘+1) (𝑘) (𝑘)
e𝑢 = 𝐴𝐺𝐺 (e𝑢 , e𝑖 : 𝑖 ∈ N𝑢 ). (2)
ants [19]. With MLP as the basic model, CoNet shares user features
in the embedding process and completes the transfer of interac- where 𝐴𝐺𝐺 (·) is an aggregation function such as weighted sum
(𝑘) (𝑘)
tion features between the two domains through cross-mapping. aggregator and mean aggregator, e𝑢 and e𝑢 respectively denote
DARec [34] extractes and transfer patterns from rating matrices the refined embeddings of user 𝑢 and item 𝑖 after 𝑘 layers propaga-
in related domain, following the idea of domain adaptation. Later, tion, and N𝑢 denotes the first-hop neighbors of user 𝑢. In order to
DDTCDR [17] utilizes user information and items’ metadata from better understand the application of GCN in recommendation, we
online platform by using autoencoder, then adopts latent orthog- briefly introduce the embedding propagation in NGCF and Light-
onal mapping to extract user preferences over multiple domains. GCN. Note that we only show the user feature propagation process
PPGN [35] adopts graph convolutional network to explore the high- in these two models, the item feature propagation process can be
order connectivity between users and items on the joint interaction obtained analogously.
887
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
(𝑘+1)
Õ 1 (𝑘)
𝑒𝑢 = p 𝑒𝑖 (4) A,
|N𝑢 ||N𝑖 | (0)
𝑖 ∈N𝑢 e𝑢 𝑎 = P⊤ x𝑢 𝑎
(5)
(0)
LightGCN greatly simplifies NGCF by removing the activa- e𝑖 = Q⊤ x𝑖
tion function and the transformation matrix (common in where P and Q are learnable parameter matrices of user and item,
GCN but unfavorable for CF). Besides, it uses weighted sum respectively, and x𝑢 and x𝑖 are one-hot encodings of the IDs of user
in layer fusion to replace the self-connection in Equation (3) (0) (0)
𝑢 ∈ 𝑈 and item 𝑖 ∈ 𝐼𝐴 , respectively. Note that we use e𝑢 𝑎 and e 𝑏
and concatenation in the fusion layer. Compared to NGCF, 𝑢
LightGCN reduces the risk of model overfitting to achieve to denote the embedding vectors of the same user 𝑢 in 𝐷 A and 𝐷 B ,
(0) (0)
better performance. respectively. Analogously, we can derive e 𝑏 and e 𝑗 for domain
𝑢
B, where 𝑗 ∈ 𝐼𝐵 .
In this work, instead of utilizing the embedding propagation rule of
NGCF or LightGCN directly, we will design our feature propagation Feature Propagation and Transfer: As shown in Figure 2, in
layer. We borrow the idea of LigthGCN to simplify the design of (0) (0) (0) (0)
this module, we feed [e𝑢 𝑎 , e𝑖 , e 𝑏 , e 𝑗 ] through 𝐿 graph con-
𝑢
GCN, in a more reasonable manner. volution layers to refine the embeddings of users and items. This
module consists of two components, feature (of both user and item)
4 PROPOSED MODEL propagation within each domain, and feature (of only user) transfer
We first introduce the overall structure of our BiTGCF model in between the two domains. We leverage the user-item interaction
Section 4.1, and then detail the two core components of BiTGCF, graphs to propagate and transfer embeddings as follows,
feature propagation and feature transfer, in Sections 4.2 and 4.3, (𝑘+1)
e𝑢 𝑎
(𝑘) (𝑘)
= 𝑓TA 𝑓PA (e𝑢 𝑎 ), 𝑓PB (e 𝑏 )
respectively. Finally, we introduce the model training in Section 4.4.
𝑢
(𝑘+1) (𝑘) (𝑘)
e 𝑏 = 𝑓TB 𝑓PA (e𝑢 𝑎 ), 𝑓PB (e 𝑏 )
4.1 Architecture Overview
𝑢 𝑢 (6)
(𝑘+1) A (𝑘)
e𝑖 = 𝑓P (e𝑖 )
As depicted in Figure 2, the proposed model BiTGCF mainly in-
(𝑘+1) (𝑘)
cludes three modules: (1) an embedding layer that offers the ini- e𝑗 = 𝑓PB (e 𝑗 )
tialization of user embeddings and item embeddings; (2) a feature
(𝑘) (𝑘)
propagation and transfer module (with multiple layers) that refines where e𝑢 𝑎 and e𝑖 respectively denote the refined embeddings of
the initial embeddings of user and item by feature propagation (𝑘) (𝑘)
𝑢 and 𝑖 after 𝑘 layers propagation in 𝐷 A , e 𝑏 and e 𝑗 respectively
𝑢
in domain and feature transfer inter-domain; and (3) a prediction denote the refined embeddings of 𝑢 and 𝑗 after 𝑘 layers propagation
layer which concatenates the refined embeddings from different in 𝐷 B , 𝑓PA (·) and 𝑓PB (·) respectively denote the feature propagation
layers and outputs the probability that the given user-item pair is a
function in 𝐷 A and 𝐷 B and will be defined in Section 4.2, 𝑓TA (· , ·)
positive interaction.
and 𝑓TB (· , ·) respectively denote the feature transfer function in 𝐷 A
Embedding: This module maps the ID of a user 𝑢 (an item 𝑖) into and 𝐷 B and will be defined in Section 4.3.
(0) (0)
an embedding vector e𝑢 ∈ R𝑑 (e𝑖 ∈ R𝑑 ), where 𝑑 denotes the It is worth mentioning that in each layer, the feature transfer
embedding size. For ID embedding, this module can also be seen as acts only on user features. Nevertheless, the refinement of item
building a parameter matrix as an embedding look-up table, which features can be achieved with the help of high-order connectivity
will be optimized in an end-to-end manner. Specifically, for domain between the two domains through the feature propagation module.
888
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
... ...
... Item
User
Self-connect
Join edges
Inner product
...
...
0
Feature Weight in edges
... Propagation
1
889
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
It is clear to see that Equation (12) adopts the idea of feature propa- Table 1: Statistics of the datasets
gation on the graph, which forms improved feature by aggregating Dataset #Users #Items #Interactions Density
(𝑘) (𝑘)
the features from 𝑓pA (e𝑢 𝑎 ) and 𝑓pB (e 𝑏 ). In this way, we can regard Elec 3,325 39,463 118,879 0.091%
𝑢
𝐶 (𝑘) as the common features derived for 𝑢 under both 𝐷 A and 𝐷 B . Cell 3,325 18,462 53,732 0.088%
Moreover, the design of the weight factors 𝑙𝑢 𝑎 and 𝑙𝑢𝑏 indicates that Sport 9,928 32,310 102,540 0.032%
more interaction data there is for 𝑢 in 𝐷 A (or 𝐷 B ), more features in Cloth 9,928 41,303 97,757 0.024%
𝐷 A (or 𝐷 B ) will contribute to the common features, and vice versa. Sport 4,998 22,101 55,556 0.050%
𝐴 (𝑘) and 𝐵 (𝑘) are used to preserve the users’ domain-specific Cell 4,998 14,618 47,444 0.065%
features in 𝐷 A and 𝐷 B respectively. We use the hyper-parameters Elec 15,761 53,309 226,626 0.027%
𝜆𝑎 and 𝜆𝑏 to control the retention ratio. For example, when 𝜆𝑎 = Cloth 15,761 51,865 136,844 0.017%
𝜆𝑏 = 1.0, it indicates that 100% of the user features in 𝐷 A and 𝐷 B
is retained, so user features have the largest specificity in both
domains. When 𝜆𝑎 = 𝜆𝑏 = 0.5, the specificity in users’ features 5 EXPERIMENT
disappears and the same users have the same features in both
We first describe the experimental setup in Section 5.1, and then
domains, and the transfer module in BiTGCF becomes the same as
compare the proposed model with state-of-the-art methods in Sec-
that in existing CDCF methods.
tion 5.2. To justify the effectiveness of the transfer learning module,
Finally, we emphasis equally on the obtained common features
we study its adaptability in Sections 5.3, as well as its performance
and domain-specific features to balance them, as shown in Equa-
under different data sparsity levels in Section 5.4. Finally, the impact
tions (10) and (11). This operation, though simple, is very effective
factor analysis is presented in Section 5.5.
in keeping the stability of the model performance, as validated in
our experimental study.
5.1 Experimental setup
4.4 Model Training 5.1.1 Dataset. We evaluate our proposed model on real-world
datasets from Amazon dataset1 , including two couple datasets,
The deep model calculates the gradient to update the parameters
Electronics (Elec for short) & Cell Phones (Cell for short), and
through the loss function. Therefore, a suitable loss function should
Accessories, Sports and Outdoors (Sport for short) & Clothing
not only avoid the model falling into local optimization but also
Shoes and Jewelry (Cloth for short). Moreover, in order to show
accelerate the model convergence. For recommender systems, two
that BiTGCF still has good transfer capability in domains with
types of loss functions are widely used, point-wise which focuses on
lower similarity, the cross pairing of the two groups, Sport & Cell
predicting scores more accurately, and pair-wise [8] which focuses
and Elec & Cloth, are used as the third and fourth couple datasets.
on learning rank more accurately. In this paper, we consider point-
For the data in these four couple datasets, we first transform them
wise loss function and leave the pair-wise version to our future
into implicit data, where each entry is marked as 0 or 1, indicating
work. The most commonly used point-wise loss is the squared loss
whether the user has rated the item. Then, we filter the datasets
(SE), but it is not suitable for implicit feedback. Hence, following
to retain users with number of ratings greater than 5 and items
several previous work, we employ the binary cross-entropy function
with number of ratings greater than 10, and extract the overlapping
as the loss function, which can be defined as,
users in both domains. Table 1 summarizes the detailed statistics of
𝐿 (𝑟ˆ𝑢𝑖 , 𝑟𝑢𝑖 ) = −
Õ
𝑟𝑢𝑖 logˆ𝑟𝑢𝑖 + (1 −𝑟𝑢𝑖 log(1 − 𝑟ˆ𝑢𝑖 )) +𝜆 ∥Θ ∥ 22 (17)
the four couple datasets.
(𝑢,𝑖 ) ∈𝑅 + ∪𝑅 −
5.1.2 Evaluation Protocol. We adopt the widely used leave-one-
where 𝑅 + is the set of observed interaction history, and 𝑅 − is the out evaluation method. Specifically, we take a random sample from
set of randomly sample from unobserved interaction. 𝜆 controls each user’s interaction history as the test set, and the remaining are
the 𝐿2 regularization strength to prevent overfitting. Θ = {𝐸𝑢0 , 𝐸𝑖0 } utilized for training. Then we randomly select 99 items from each
user’s non-interacted items to form negative samples. The recom-
and 𝐸𝑢0 (𝐸𝑖0 ) is the initial embedding matrix for all users (items). In
mendation model would predict 100 records (99 negative samples
order to improve the accuracy on both domains, we define the joint
and 1 positive sample) of the user and output top-𝑁 items. We use
loss function as follows,
the commonly used Hit Ratio (HR) and Normalized Discounted
𝐿 𝑗𝑜𝑖𝑛 = min A , 𝑟 A ) + 𝐿 (𝑟ˆB , 𝑟 B )
𝐿 (𝑟ˆ𝑢𝑖 (18) Cumulative Gain (NDCG) to evaluate the ranking performance. For
𝑢𝑖 𝑢𝑗 𝑢𝑗
𝑓TA ,𝑓PA ,𝑓TB ,𝑓PB both measures, we truncate the ranked list at 10. Hence, the HR
measures whether the test item is present on the top-10 list, and
We adopt mini-batch Adam to optimize the model and update the pa- the NDCG measures the ranking quality by assigning higher scores
rameters. Moreover, similar to NGCF, we introduce dropout mech- to hits at top ranks.
anisms to prevent neural networks from overfitting. Specifically,
we drop out the messages being propagated in Equation (9) with 5.1.3 Compared Methods. We compare BiTGCF with both single
a probability in training, and disable this operation during testing. domain and cross domain recommendation models. We leave out
Note that BiTGCF is a bi-directional transfer mode in which data the comparison with DDTCDR [17] and DARec [34], because they
from two domains participate in the training at the same time, but
are evaluated separately. 1 http://jmcauley.ucsd.edu/data/amazon/
890
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
Table 2: Performance comparison in terms of HR and NDCG. Best performance is in boldface and the best baselines are un-
derlined.
Single domain Methods Cross domain Methods Ours
Dataset Metrics
BPRMF+ MLP+ NGCF+ LightGCN+ CMF CDFM CoNet PPGN GCF+ BiTGCF
HR 0.3471 0.4448 0.4665 0.4701 0.3894 0.3998 0.4484 0.4337 0.5871 0.6036
Elec
NDCG 0.2224 0.2814 0.2969 0.2990 0.2457 0.2513 0.2890 0.2664 0.3682 0.3767
HR 0.4271 0.4848 0.4983 0.5185 0.4239 0.4117 0.4899 0.4386 0.6198 0.6571
Cell
NDCG 0.2919 0.3108 0.3358 0.3253 0.2631 0.2599 0.3102 0.2773 0.4119 0.4210
HR 0.2784 0.3660 0.4251 0.4623 0.3108 0.3196 0.3743 0.3484 0.5421 0.5346
Sport
NDCG 0.1778 0.2169 0.2681 0.2910 0.1837 0.1874 0.2254 0.2038 0.3414 0.3375
HR 0.2250 0.3464 0.3723 0.4011 0.2930 0.2501 0.3442 0.3239 0.5101 0.5242
Cloth
NDCG 0.1348 0.2073 0.2359 0.2570 0.1612 0.1509 0.2090 0.1819 0.3067 0.3113
HR 0.2990 0.3663 0.3858 0.3978 0.3157 0.3345 0.3700 0.3316 0.5018 0.5442
Sport
NDCG 0.2025 0.2368 0.2527 0.2498 0.1930 0.2163 0.2392 0.2260 0.3122 0.3374
HR 0.3315 0.4764 0.4926 0.4970 0.3918 0.4574 0.4824 0.4418 0.5486 0.5654
Cell
NDCG 0.2211 0.3080 0.3173 0.3074 0.2286 0.1605 0.3134 0.2851 0.3566 0.3709
HR 0.2780 0.4877 0.5007 0.5086 0.4580 0.4106 0.4931 0.4633 0.5357 0.5359
Elec
NDCG 0.1761 0.3191 0.3235 0.3288 0.3032 0.2817 0.3220 0.2920 0.3549 0.3575
HR 0.2262 0.3562 0.3632 0.3662 0.3196 0.3017 0.3461 0.3221 0.4003 0.4384
Cloth
NDCG 0.1371 0.2160 0.2238 0.2199 0.1907 0.1801 0.2015 0.1934 0.2461 0.2555
both use Auto-encoder as a feature extractor for pre-training, which shares the features of users learned from the joint interaction
is different from the end-to-end form of the methods to be evaluated. graph by stacking multiple graph convolution layers. Finally,
it inputs the learned embeddings to the domain-specific MLP
• BPRMF [26] is a classical single-domain model, which learns
structure to learn the matching function.
the user and item factors via matrix factorization and pair-
• GCF is a degenerate version of BiTGCF without the feature
wise rank loss.
transfer layer. It is a single-domain GCN based model.
• MLP [10] is single-domain deep model, which uses deep
neural networks to learn the matching function. Note that in order to exclude the influence of loss function on
• NGCF [29] is a single-domain GCN based model. It first cap- the results, we change the loss function of NGCF and LightGCN
tures the high-order connectivity information in the embed- from BPR to cross entropy, so as to unify the loss function among
ding function by stacking multiple embedding propagation all the methods based on deep learning. Moreover, in order to more
layers, and then concatenates the obtained embeddings and accurately evaluate the effect of our transfer module, for single-
uses inner product to make prediction. domain methods including BPRMF, MLP, NGCF, LightGCN and
• LightGCN [9] is also a single-domain GCN based model GCF, we use ‘model-name+’ to denote the same model but with
evolved from NGCF. It simplifies the design in the feature mixed datasets as the training set. For example, given MLP+ and a
propagation component by removing the non-linear activa- couple dataset Elec&Cell, MLP+ will use Elec combining with Cell
tion and the transformation matrices. Moreover, it adopts a as the training set, and then test on Elec and Cell respectively.
different layer combination strategy from NGCF. 5.1.4 Parameter Settings. For BPRMF, we use the BPRMF class in
• CMF [28] is a multi-relation learning approach which factor- LightFM2 , a popular CF library, for training and vary the number of
izes matrices of domains A and B simultaneously by sharing epoch from 0 to 40 to get the best result. For CMF, we use a Python
the user latent factors. It is a shallow model, which first version reference to the original Matlab code and the parameters
jointly learns on two domains, and then optimizes the target are randomly initialized from Gaussian N (0, 0.01). For CDFM, we
domain. process and code the data according to the paper, then feed it to
• CDFM [20] is a cross-domain shallow model. It takes user PyFM3 for training. For MLP4 and CoNet5 with deep structure, we
interaction history from auxiliary domains as context to maintain the optimal configuration in their papers, the configura-
generate a recommendation on the target domain with fac- tion of the hidden layers is [64, 32, 16, 8] and the ratio of negative
torization machines. sampling is set to 4. For NGCF6 and LightGCN7 , we use the pub-
• CoNet [11] is a cross-domain deep model, which transfers lished source code and only change the loss function from BPR to
knowledge across domains by cross connections between cross entropy. Moreover, we set the embedding propagation layer
the base networks. It jointly learns on two domains and
optimizes both domains simultaneously. Note that our model 2 https://github.com/lyst/lightfm
3 https://github.com/coreylynch/pyFM
BiTGCF is different from CoNet in that it transfers user-item
4 https://github.com/hexiangnan/neural_collaborative_filtering
interaction features, while BiTGCF transfers user features. 5 http://home.cse.ust.hk/ghuac/
• PPGN [35] is a cross-domain deep model, which fuses the 6 https://github.com/xiangwang1223/neural_graph_collaborative_filtering
891
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
as [64, 64, 64], the learning rate as 0.001, the size of mini batch as 0.65 Elec_Cell Sport_Cloth Elec_Cell Sport_Cloth
Cell_Elec Cloth_Sport 0.40 Cell_Elec Cloth_Sport
1024, the ratio of negative sampling as 4, and the message dropout 0.60
0.55 0.35
ratio as 0.1 for NGCF and 0 for LightGCN, respectively. For PPGN,
NDCG
HR
0.50 0.30
we use the source code provided by the author and change the data 0.45
0.25
pipeline. We also set the ratio of negative sampling as 4, and turn 0.40
0.35 0.20
the number of GCN layers from 3 to 5 as the paper states. For GCF
CF
CN
G *
CF
CF
CF
CN
gh N+
G *
CF
CF
CF
CN
CF
CN
CF
CN
CF
CF
CF
and BiTGCF, we implement them by TensorFlow, and use the same
TG
TG
G
C
G
G
G
tG
N
N
G
G
t
t
tG
tG
N
N
gh
gh
Bi
Bi
t
N
N
gh
gh
gh
Li
Li
Li
Li
Li
Li
parameter settings as that of NGCF. We use the Xavier initializer Methods Methods
to initialize the parameters of NGCF, LightGCN and our methods. (a) HR (b) NDCG
Moreover, early stopping is performed if HR@10 on the test data
does not increase for 5 successive epochs. Note that CoNet, PPGN 0.55
Elec_Cloth
Cloth_Elec
Sport_Cell
Cell_Sport
0.36
Elec_Cloth
Cloth_Elec
Sport_Cell
Cell_Sport
NDCG
HR
0.45 0.28
the performance of single domain methods are obtained by first 0.26
0.40
0.24
training on the mixed datasets and then evaluating separately on 0.22
0.35
each dataset. 0.20
CF
CN
G *
CF
CF
CF
CN
G *
CF
CF
CF
CN
CF
CN
CF
Li CN
CF
CF
Li CN
CF
G
TG
TG
tG
tG
G
G
tG
tG
N
N
G
G
tG
tG
N
N
gh
gh
Bi
Bi
N
N
gh
gh
gh
gh
Li
Li
Li
Li
Methods Methods
5.2 Performance Comparison
(c) HR (d) NDCG
Table 2 shows the summarized results of our experiments on the
four-couple datasets in terms of two metrics, HR@10 and NDCG@10. Figure 5: The applicability of the transfer module.
Due to space concern, the performance of the single-domain meth-
ods under the single dataset is not shown in Table 2. In fact, from
our experiments, the performance of the single-domain models gaps might come from that it uses the same propagation layer
under mixed dataset in general is better than that under the single on the joint graph of the two domains. BiTGCF achieves the
dataset, and partial results regarding this point will be reported in best performance among the evaluated single-domain and
Section 5.3. From Table 2, we have the following key observations: cross-domain methods on all the datasets, which indicates
that it is on the right track to exploit the high-order connec-
• For single domain methods, MLP+ consistently outperforms
tions in-domain and inter-domain by using graph structures
BPRMF+, demonstrating the importance of exploring nonlin-
and fuse common features with domain-specific features
ear interaction relation between users and items. GCN-based
through the feature transfer module.
methods, including NGCF+, LightGCN+ and GCF+, consis-
tently outperforms MLP+. This validates the effect of mining
high-order connectivities for better recommendation perfor- 5.3 The Applicability of the Transfer Module
mance. The improvement of LightGCN+ over NGCF+ may In order to verify the applicability of our transfer module, we incor-
come from the enhancement of its generalization ability, porate GCN-based models NGCF and LightGCN with our transfer
which is less prone to overfitting. Further, compared with module, and name the derived methods as NGCF* and LightGCN*.
NGCF+ and LightGCN+, GCF+ retains the inner product and Figure 5 shows the results. In Figure 5(a) and 5(b), the two domains
the self-connection operation to appropriately increase the of the two couple datasets (Elec&Cell, Sport&Cloth) have higher
feature flow among nodes. This might be the reason that similarity, while in Figure 5(c) and 5(d), the two domains of the two
GCF+ can gain superior performance over LightGCN+. couple datasets (Elec&Cloth, Sport&Cell) have lower similarity. In
• Cross domain methods vs. single domain methods. The su- order to indicate the pairing information of the current dataset, the
perior performance of CMF and CDFM over BPRMF+ pro- expression of the ’target’_’source’ domain is adopted. For example,
vides evidence that compared to directly mixing the two Elec_Cell means the Elec dataset with the participation of Cell.
datasets, the transferred knowledge from auxiliary domain Comparing the first point with the second point in each method,
does improve the performance of the target domain. The we can observe a growth trend in Figure 5(a) and 5(b) for all the
performance of CoNet is better than MLP+ in most cases, methods in most cases, as well as a downward trend in Figure 5(c)
which validates the effect of the cross connect unit, how- and 5(d). The reason might be that, the mixing of data in two similar
ever the improvement is very limited than direct mixing domains increases the amount of training set, which promotes the
the two datasets. Moreover, we can see that NGCF beats training of the model and in turn improves the performance. On
CoNet on almost all the datasets. This demonstrates that the contrary, when the two domains are not similar, simply mixing
the transfer learning mechanism, if not well designed, is them may lead to model confusion, which makes it difficult to train
not as effective as mining high-order connectivities for bet- and thus degrade the performance.
ter embeddings. Surprisingly, PPGN is worse than NGCF Comparing the third point with the first two in each method,
and even worse than CoNet. This might be caused by the we can conclude that the methods with transfer mechanism in
datasets, since PPGN has been shown to outperform CoNet general can achieve the best result. This validates the effectiveness
on CD&Music and Book&Movie in [35]. We guess the poor of our transfer module. In summary, compared with directly mixing
performance of PPGN on datasets with large distribution datasets, knowledge transfer yields to better performance.
892
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
NDCG
HR
0.350
0.55
G2 14 1306 14207 0.049% 0.325
NDCG
0.50
HR
0.30
Table 4: Results on Sport_Cell. The improvement is the re- 0.48
0.46
0.28
sult of BiTGCF to the best one from GCF and GCF+. 0.44 0.26
0.42 0.24
0.40 0.22
Metrics Methods G1 G2 G3 G4 Sport_Cell Cell_sport Elec_Cloth Cloth_Elec
Layer Num
Sport_Cell Cell_sport Elec_Cloth Cloth_Elec
Layer Num
GCF 0.5081 0.5253 0.5840 0.6447
(c) HR (d) NDCG
GCF+ 0.4778 0.5061 0.5606 0.5935
HR
BiTGCF 0.5314 0.5330 0.5923 0.6374 Figure 6: Effect of Transfer Layer Number
Improv. 4.59% 1.47% 1.42% -1.13%
GCF 0.3061 0.3307 0.3747 0.4267 0.675
0.44
0.42
0.650
GCF+ 0.3001 0.3125 0.3536 0.4035 Elec_Cell Sport_Cloth
0.40
Elec_Cell Sport_Cloth
NDCG 0.625 Cell_Elec Cloth_Sport Cell_Elec Cloth_Sport
NDCG
0.600
HR 0.36
Improv. 4.87% 1.03% 1.54% 1.75% 0.575
0.34
0.550
0.32
0.525
0.30
0.500
Table 5: Results on Cell_Sport. The improvement is the re- 0.5 0.6 0.7
λ
0.8 0.9 1.0 0.5 0.6 0.7
λ
0.8 0.9 1.0
NDCG
BiTGCF 0.5283 0.5720 0.6103 0.686 0.32
HR
0.50
Elec_Cloth Sport_Cell 0.30 Elec_Cloth Sport_Cell
Improv. 2.64% 3.08% 1.11% -3.28% 0.48 Cloth_Elec Cell_Sport
0.28
Cloth_Elec Cell_Sport
0.46
GCF 0.3340 0.3530 0.4032 0.4910 0.44 0.26
GCF+ 0.3290 0.3506 0.4015 0.4789 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
NDCG λ λ
BiTGCF 0.3419 0.3636 0.4251 0.4787
(c) HR (d) NDCG
Improv. 2.37% 3% 5.43% -2.51%
Figure 7: Effect of 𝜆
5.4 Performance of the Transfer Module w.r.t 5.5 Impact Factor Analysis
Data Sparsity Levels 5.5.1 Number of feature transfer layers. The feature transfer mod-
The number of user interactions is an important factor that affects ule is the core of BiTGCF, and its effect is verified by changing the
the recommendation performance. In order to test the influence number of transfer layers with fixed number of features propaga-
of data sparsity levels, we divided the dataset into groups with tion layer. In particular, we vary the layer number 𝐿 in the range of
different sparsity according to the number of user interactions. {0, 1, 2, 3}. More specifically, based on the single-domain model GCF
Specifically, with the principle of minimizing the difference in the (with three propagation layers but non transfer layers), the transfer
number of interactions between each group, all users are divided layer is added from the top layer. For example, when 𝐿 = 1, one
into four groups G1, G2, G3 and G4 in the order of increasing feature transfer component is added in the last feature propagation
number of interactions. Table 3 shows the splited results of Sport & layer; when 𝐿 = 2, two feature transfer components are added in
Cell, in which users with the number of interactions no more than 8, the last two feature propagation layers.
14, 30 and 323 are divided into G1, G2, G3 and G4, respectively. From From Figure 6, we can see that with the increase of 𝐿, the perfor-
Table 4, we can find that the improvement achieved in the first two mance of BiTGCF also increases. The reason may come from that
groups (e.g., 4.59% and 1.47%) are more significant than that of the the feature transfer module communicates different domains, en-
last two (e.g.,1.42% and -1.13%). The result indicates that the feature abling the model to mine higher-order relationships between nodes
transfer module can help improve recommendation performance by stacking multiple features propagation and transfer layers. More-
for relatively inactive user (with less interaction items). A similar over, it can be seen from Figure 6(c) and 6(d) that when 𝐿 = 3, the
trend can be observed from Table 5. performance of BiTGCF degrades on some datasets. This indicates
893
Full Paper Track CIKM '20, October 19–23, 2020, Virtual Event, Ireland
that mining for higher-order connectivities is not always beneficial [7] Sheng Gao, Hao Luo, Da Chen, Shantao Li, Patrick Gallinari, and Jun Guo. 2013.
for the performance improvement. However, when 𝐿 = 0, the per- Cross-domain recommendation via cluster-level latent factor model. In Proceed-
ings of ECML-PKDD. 161–176.
formance of BiTGCF is always the worst, which demonstrates the [8] Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized
effect of our transfer module. ranking from implicit feedback. In Proceedings of Thirtieth AAAI. 144–150.
[9] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng
Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network
5.5.2 Hyper-parameters 𝜆𝑎 & 𝜆𝑏 . In this set of experiments, we for Recommendation. In Proc. of SIGIR. 639–648.
analyze the influence of hyper-parameters 𝜆𝑎 and 𝜆𝑏 on the perfor- [10] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative filtering. In Proceedings of WWW. 173–182.
mance. For simplicity, we always consider a single hyper-parameter [11] Guangneng Hu, Yu Zhang, and Qiang Yang. 2018. Conet: Collaborative cross
𝜆 = 𝜆𝑎 = 𝜆𝑏 . Figure 7 shows the results, with a step size of 0.1, networks for cross-domain recommendation. In Proceedings of CIKM. 667–676.
ranging in [0.5, 1]. Figure 7(a) and 7(b) show the results of BiTGCF [12] Guangneng Hu, Yu Zhang, and Qiang Yang. 2019. Transfer Meets Hybrid: A
Synthetic Approach for Cross-Domain Collaborative Filtering with Text. In Pro-
on Elec & Cell and Sport & Cloth, respectively. The overall best ceedings of WWW. ACM, 2822–2829.
result is achieved when 𝜆 is around 0.5 and 0.6. The results on Sport [13] Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, and Can Zhu.
2013. Personalized recommendation via cross-domain triadic factorization. In
& Cell and Elec & Cloth are shown in Figure 7(c), 7(d), where the Proc. of WWW. 595–606.
best result appears when 𝜆 = 0.8. The result implicitly reflects that, [14] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with
when the similarity between the two domains is low, it is necessary Graph Convolutional Networks. In Proceedings of ICLR (Poster).
[15] Bin Li, Qiang Yang, and Xiangyang Xue. 2009. Can movies and books collaborate?
to retain more domain-specific features. cross-domain collaborative filtering for sparsity reduction. In Proceedings of
Twenty-First International Joint Conference on Artificial Intelligence. 2052–2057.
[16] Guohao Li, Matthias Müller, Ali K. Thabet, and Bernard Ghanem. 2019. Deep-
6 CONCLUSION GCNs: Can GCNs Go As Deep As CNNs?. In Proceedings of ICCV. 9266–9275.
[17] Pan Li and Alexander Tuzhilin. 2020. DDTCDR: Deep Dual Transfer Cross
In this paper, we proposed BiTGCF for Top-𝑁 cross-domain recom- Domain Recommendation. In Proc. of WSDM. ACM, Houston, USA, 331–339.
mendation by combining the idea of high order feature propagation [18] Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.com recommendations:
in graph structure with transfer learning. Inspired by NGCF and Item-to-item collaborative filtering. IEEE Internet computing 1 (2003), 76–80.
[19] Jian Liu, Pengpeng Zhao, Yanchi Liu, Victor S. Sheng, Fuzhen Zhuang, Jiajie
LightGCN, we designed a new feature propagation module, which Xu, Xiaofang Zhou, and Hui Xiong. 2019. Deep Cross Networks with Aesthetic
simplifies the feature propagation in a more reasonable manner to Preference for Cross-domain Recommendation. CoRR abs/1905.13030 (2019).
http://arxiv.org/abs/1905.13030
reduce the risk of model over-fitting but still preserve some non- [20] Babak Loni, Yue Shi, Martha A. Larson, and Alan Hanjalic. 2014. Cross-Domain
linearity to boost the recommendation performance. Moreover, we Collaborative Filtering with Factorization Machines. In Proceedings of 36th Euro-
took the first time to consider the domain-specific features in differ- pean Conference on Advances in Information Retrieval Research. 656–661.
[21] Zhongqi Lu, Weike Pan, Evan Wei Xiang, Qiang Yang, Lili Zhao, and Erheng
ent domains when refining user features, and designed a simple but Zhong. 2013. Selective Transfer Learning for Cross Domain Recommendation.
effective balancing mechanism to balance user’s common features In Proceedings ICDM. SIAM, 641–649.
and domain-specific features. By combining inter-domain feature [22] Zhongqi Lu, Yin Zhu, Sinno Jialin Pan, Evan Wei Xiang, Yujing Wang, and Qiang
Yang. 2014. Source Free Transfer Learning for Text Classification. In Proceedings
transfer with in-domain feature propagation, our model realizes of AAAI. AAAI Press, 122–128.
more efficient representation and transfer of users’ common fea- [23] Jingwei Ma, Jiahui Wen, Mingyang Zhong, Weitong Chen, and Xue Li. 2019.
MMM: Multi-source Multi-net Micro-video Recommendation with Clustered
tures and their integration with domain-specific features. Remark- Hidden Item Representation Learning. Data Science and Engineering 4, 3 (2019),
able performance improvements on several benchmark datasets 240–253.
demonstrate the effectiveness of our BiTGCF model. [24] Orly Moreno, Bracha Shapira, Lior Rokach, and Guy Shani. 2012. TALMUD:
transfer learning for multiple domains. In Proceedings of CIKM. ACM, 425–434.
[25] ThaiBinh Nguyen and Atsuhiro Takasu. 2018. NPE: Neural Personalized Embed-
ding for Collaborative Filtering. In Proceedings of IJCAI. 1583–1589.
ACKNOWLEDGMENTS [26] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
The work was supported by the National Natural Science Foun- 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proc. of
UAI. 452–461.
dation of China under Grant No. 61672252, and the Fundamen- [27] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based
tal Research Funds for the Central Universities under Grant No. collaborative filtering recommendation algorithms. In Proc. of WWW. 285–295.
2019kfyXKJC021. [28] Ajit P Singh and Geoffrey J Gordon. 2008. Relational learning via collective
matrix factorization. In Proceedings of SIGKDD. ACM, 650–658.
[29] Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019.
Neural Graph Collaborative Filtering. In Proc. of the 42nd ACM SIGIR. 165–174.
REFERENCES [30] Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and
[1] Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, An- Kilian Q. Weinberger. 2019. Simplifying Graph Convolutional Networks. In
drew Stephen McGough, and Boguslaw Obara. 2019. Exploring the Semantic Proceedings of ICML. 6861–6871.
Content of Unsupervised Graph Embeddings: An Empirical Study. Data Science [31] Yuexin Wu, Hanxiao Liu, and Yiming Yang. 2018. Graph Convolutional Matrix
and Engineering 4, 3 (2019), 269–289. Completion for Bipartite Edge Prediction. In Proceedings of IC3K. 49–58.
[2] Bin Cao, Nathan Nan Liu, and Qiang Yang. 2010. Transfer Learning for Collective [32] Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, and Richang Hong.
Link Prediction in Multiple Heterogenous Domains. In Proceedings of ICML. 2019. Deep Item-based Collaborative Filtering for Top-N Recommendation. ACM
Omnipress, 159–166. Trans. Inf. Syst. 37, 3 (2019), 33:1–33:25.
[3] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat- [33] Hong-Jian Xue, Xinyu Dai, Jianbing Zhang, Shujian Huang, and Jiajun Chen. 2017.
Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation Deep Matrix Factorization Models for Recommender Systems. In Proceedings of
with Item- and Component-Level Attention. In Proc. of SIGIR. ACM, 335–344. IJCAI. 3203–3209.
[4] Lei Chen, Le Wu, Richang Hong, Kun Zhang, and Meng Wang. 2020. Revisiting [34] Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. DARec: Deep Domain
Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Adaptation for Cross-Domain Recommendation via Transferring Rating Patterns.
Network Approach. In Proc. of AAAI. 27–34. In Proceedings of IJCAI. 4227–4233.
[5] Zhi-Hong Deng, Ling Huang, Chang-Dong Wang, Jian-Huang Lai, and Philip S. [35] Cheng Zhao, Chenliang Li, and Cong Fu. 2019. Cross-Domain Recommendation
Yu. 2019. DeepCF: A Unified Framework of Representation Learning and Match- via Preference Propagation GraphNet. In Proceedings of CIKM. 2165–2168.
ing Function Learning in Recommender System. In Proceedings of AAAI. 61–68. [36] Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S. Yu. 2018. Spectral
[6] Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative Memory Network for collaborative filtering. In Proc. of ACM RecSys. 311–319.
Recommendation Systems. In Proc. of SIGIR. 515–524.
894