0% found this document useful (0 votes)
70 views8 pages

Transfer Learning For Collaborative Filtering Via A Rating-Matrix Generative Model

This document proposes a rating-matrix generative model (RMGM) for cross-domain collaborative filtering. The RMGM establishes relatedness across multiple rating matrices by finding a shared implicit cluster-level rating matrix. It then extends this to a cluster-level rating model. This allows a rating matrix to be viewed as drawing users/items from a joint mixture model and ratings from the cluster-level model. The RMGM can be trained on pooled rating data from multiple domains to address sparsity and outperform individual models. It is evaluated on three real-world datasets.

Uploaded by

sahuashishcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views8 pages

Transfer Learning For Collaborative Filtering Via A Rating-Matrix Generative Model

This document proposes a rating-matrix generative model (RMGM) for cross-domain collaborative filtering. The RMGM establishes relatedness across multiple rating matrices by finding a shared implicit cluster-level rating matrix. It then extends this to a cluster-level rating model. This allows a rating matrix to be viewed as drawing users/items from a joint mixture model and ratings from the cluster-level model. The RMGM can be trained on pooled rating data from multiple domains to address sparsity and outperform individual models. It is evaluated on three real-world datasets.

Uploaded by

sahuashishcs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Transfer Learning for Collaborative Filtering

via a Rating-Matrix Generative Model

Bin Li [email protected]
School of Computer Science, Fudan University, Shanghai 200433, China
Qiang Yang [email protected]
Dept. of Computer Science & Engineering, Hong Kong University of Science & Technology, Hong Kong, China
Xiangyang Xue [email protected]
School of Computer Science, Fudan University, Shanghai 200433, China

Abstract items based on a collection of like-minded users’ rating


Cross-domain collaborative filtering solves records on the same set of items. Various CF meth-
the sparsity problem by transferring rating ods have been proposed in the last decade. For ex-
knowledge across multiple domains. In this ample, memory-based methods (Resnick et al., 1994;
paper, we propose a rating-matrix generative Sarwar et al., 2001) find K-nearest neighbors based on
model (RMGM) for effective cross-domain some similarity measure. Model-based methods (Hof-
collaborative filtering. We first show that mann & Puzicha, 1999; Pennock et al., 2000; Si & Jin,
the relatedness across multiple rating matri- 2003) learn prference/rating models for similar users
ces can be established by finding a shared (and items). Matrix factorization methods (Srebro &
implicit cluster-level rating matrix, which is Jaakkola, 2003) find a low-rank approximation for the
next extended to a cluster-level rating model. rating matrix. Most of these methods are based on the
Consequently, a rating matrix of any related available ratings in the given rating matrix. Thus, the
task can be viewed as drawing a set of users performance of these methods largely depends on the
and items from a user-item joint mixture density of the given rating matrix.
model as well as drawing the corresponding However, in real-world recommender systems, users
ratings from the cluster-level rating model. can rate a very limited number of items. Thus, the
The combination of these two models gives rating matrix is often extremely sparse. As a result,
the RMGM, which can be used to fill the the available rating data that can be used for K-NN
missing ratings for both existing and new search, probabilistic modeling, or matrix factorization
users. A major advantage of RMGM is that are radically insufficient. The sparsity problem has
it can share the knowledge by pooling the rat- become a major bottleneck for most CF methods.
ing data from multiple tasks even when the
users and items of these tasks do not overlap. To alleviate the sparsity problem in collaborative fil-
We evaluate the RMGM empirically on three tering, one promising approach is to pool together the
real-world collaborative filtering data sets to rating data from multiple rating matrices in related do-
show that RMGM can outperform the indi- mains for knowledge transfer and sharing. In the real
vidual models trained separately. world, many web sites for recommending similar items,
e.g., movies, books, and music, are closely related. On
one hand, since many of these items are literary and
1. Introduction entertainment works, they should share some common
properties (e.g., genre and style). On the other hand,
Collaborative filtering (CF) in recommender systems since these web services are geared towards the gen-
aims at predicting an active user’s ratings on a set of eral population, users of these services, and the items
interested by them, should share some properties as
Appearing in Proceedings of the 26 th International Confer-
well. However, much of the shared knowledge across
ence on Machine Learning, Montreal, Canada, 2009. Copy-
right 2009 by the author(s)/owner(s). multiple related domains may be well hidden, and few
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

studies have been done to uncover this knowledge. ing data in the z-th rating matrix is a set of triplets
(z) (z) (z) (z) (z) (z)
In this paper, we solve the problem of learning a Dz = {(u1 , v1 , r1 ), . . . , (usz , vsz , rsz )}, where sz
rating-matrix generative model from a set of rating is the number of available ratings in the z-th rating
matrices in multiple related recommender systems (do- matrix. The ratings in {D1 , . . . , DZ } should be in the
mains) for collaborative filtering. Our aim is to al- same rating scales R (e.g., 1 − 5).
leviate the sparsity problem in individual rating ma- For model-based CF methods, a preference/rating
trices by discovering what is common among them. model, e.g., the aspect model (Hofmann & Puzicha,
We first show that the relatedness across multiple rat- 1999), can be trained on Dz for the z-th task. In our
ing matrices can be established by sharing an im- cross-domain collaborative filtering setting, we wish to
plicit cluster-level rating matrix. Then, we extend train a rating-matrix generative model (RMGM) for
the shared cluster-level rating matrix to a more gen- all the given
 related tasks on the pooled rating data,
eral cluster-level rating model, which defines a rating namely, z Dz . Then, the z-th rating matrix can be
function in terms of the latent user- and item-cluster viewed as drawing a set of users, Uz , and a set of items,
variables. Consequently, a rating matrix of any re- Vz , from the learned RMGM. The missing values in the
lated task can be viewed as drawing a set of users and z-th rating matrix can be generated by the RMGM.
items from a user-item joint mixture model as well
as drawing the corresponding ratings from the cluster-
level rating model. The combination of these two mod-
3. Cluster-Level Rating Matrix as
els gives the rating-matrix generative model (RMGM). Knowledge Sharing
We also propose an algorithm for training the RMGM To allow knowledge-sharing across multiple rating ma-
on the pooled rating data from multiple related rat- trices, we first investigate how to establish the relat-
ing matrices as well as an algorithm for predicting edness among the given tasks. A difficulty is that
the missing ratings for new users in different tasks. no explicit correspondence among the user sets or the
Experimental comparison is carried out on the three item sets in the given rating matrices can be exploited.
real-world CF data sets. The results show that our However, some collaborative filtering tasks are some-
proposed RMGM learned from multiple CF tasks can what related in certain aspects. Take movie-rating
outperform the individual models trained separately. and book-rating web sites for example. On one hand,
The remainder of the paper is organized as follows. In movies and books have correspondence in genre. On
Section 2, we first introduce the problem setting for the other hand, although the user sets are different
cross-domain collaborative filtering and the notations from one another, they are the subsets sampled from
used in this paper. In Section 3, we describe how to es- the same population (this assumption only holds for
tablish the relatedness across multiple rating matrices popular web sites) and should reflect similar social as-
via a shared cluster-level rating matrix. The RMGM pects (Coyle & Smyth, 2008).
is presented in Section 4 as well as the training and The above observation suggests that, although we can
prediction algorithms. Related work is introduced in not find an explicit correspondence among individual
Section 5. We experimentally validate the effectiveness users or items, we can establish a cluster-level rating-
of the RMGM for cross-domain collaborative filtering pattern representation as a “bridge” to connect all the
in Section 6 and conclude the paper in Section 7. related rating matrices. Figure 1 illustrates how the
implicit relatedness among three artificially generated
2. Problem Setting rating matrices is established via a cluster-level rating
matrix. By permuting the rows and columns (which
Suppose that we are given Z rating matrices in related is equivalent to co-clustering) in each rating matrix,
domains for collaborative filtering. In the z-th rating we can obtain three block rating matrices. Each block
(z) (z)
matrix, a set of users, Uz = {u1 , . . . , unz } ⊂ U, make comprises a set of ratings provided by a user group on
(z) (z)
ratings on a set of items, Vz = {v1 , . . . , vmz } ⊂ V, an item group. We can further reduce the block ma-
where nz and mz denote the numbers of rows (users) trices to be the cluster-level rating matrices, in which
and columns (items), respectively. The random vari- each row corresponds to a user cluster and each col-
ables u and v are assumed to be independent from umn an item cluster. The entries in the cluster-level
each other. To consider the more difficult case, we rating matrices are the average ratings of the corre-
assume that neither the user sets nor the item sets sponding user-item co-clusters. The resulting cluster-
in
 the given rating matrices have intersections, i.e., level rating matrices reveal that the three rating ma-
z U z = ∅ and z Vz = ∅ (in fact, there may exist trices implicitly share a common 4 × 4 cluster-level
intersections, but they are unobservable). The rat- rating-pattern representation.
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

a b c d e f a e b f c d

Permute rows & cols


1 ? 3 ? 3 2 3 2 3 ? 1 1 2 2 A B C A B C D
2 3 1 2 2 ? 1 3 3 3 ? 1 2 ? I 3 1 2 I 3 1 2 1
CF Task I

3 3 ? 2 ? 3 1 1 ? 2 3 3 ? 3 II 2 3 3 II 2 3 3 1
4 3 ? 1 1 ? 2 5 2 2 3 ? 3 ? III 3 2 1 III 3 2 1 2
5 2 3 3 ? 2 ? 4 3 ? ? 2 1 1 IV 1 1 2 3
6 3 2 ? 1 3 2 6 3 3 2 2 ? 1 Cluster-level
Rating Matrix
a b c d e b d a c e

Permute rows & cols


1 3 3 ? ? 1 5 ? 1 2 2 ? B C D A B C D
2 2 ? 2 1 ? 3 1 1 ? 2 1 I 1 2 1 I 3 1 2 1
CF Task II

3 ? 1 2 1 1 1 3 ? 3 ? 1 II 3 3 1 II 2 3 3 1
4 ? 3 3 3 1 4 3 3 ? 3 1 III 2 1 2 III 3 2 1 2
5 2 ? 2 1 ? 7 2 2 1 ? 2 IV 1 2 3 IV 1 1 2 3
6 ? 1 2 1 3 2 ? 1 2 2 ?
7 1 2 ? 2 2 6 1 1 ? 2 3 Cluster-level
Rating Matrix
Permute rows & cols

a b c d e f g a c d f e b g A B C D
1 2 1 ? 3 3 ? 1 1 2 ? 3 ? 3 1 1 A B C D I 3 1 2 1
CF Task III

2 ? ? 3 2 1 2 2 3 2 2 ? 3 3 1 ? II 2 3 3 1 II 2 3 3 1
3 2 1 2 ? 3 3 ? 2 ? 3 2 2 1 ? 2 III 3 2 1 2 III 3 2 1 2
4 1 3 ? 1 2 1 3 5 3 3 ? 2 ? 2 ? IV 1 1 2 3 IV 1 1 2 3
5 3 2 3 ? ? 2 ? 4 1 ? 1 1 2 3 3

Figure 1. Sharing cluster-level user-item rating patterns among three toy rating matrices in different domains. The missing
values are denoted by ‘?’. After permuting the rows (users) and columns (items) in each rating matrix, it is revealed that
the three rating matrices implicitly share a common 4 × 4 cluster-level rating matrix.

This toy example shows an ideal case in which the where (1) is obtained based on the assumption that
users and items in the same cluster behave exactly random variables u and v are independent.
the same. In many real-world cases, since users may
We can further rewrite (1) in the form of matrices
have multiple personalities and items may have multi-
ple attributes, a user or an item can simultaneously fR (u, v) = p p 
u Bpv , u 1 = 1, pv 1 = 1, (2)
belong to multiple clusters with different member-
ships. Thus, we need to introduce softness to clus- where pu ∈ RK and pv ∈ RL are the user- and item-
tering models. Suppose there are K user clusters, (k)
cluster membership vectors ([pu ]k = P (cU |u) and
(1) (K) (1) (L)
{cU , . . . , cU }, and L item clusters, {cV , . . . , cV }, (l)
[pv ]l = P (cV |v)), and B is a K × L relaxed cluster-
in the shared cluster-level rating patterns. The mem- level rating matrix in which an entry can have multiple
bership of a user-item pair (u, v) to a user-item co- ratings with different probabilities
(k) (l)
cluster (cU , cV ) is the joint posterior membership 
(k) (l) (k) (l)
probability P (cU , cV |u, v). Furthermore, a user-item Bkl = rP (r|cU , cV ). (3)
co-cluster can also have multiple ratings with different r
(k) (l)
probabilities P (r|cU , cV ). Then, we can define the
Eq. (2) implies that the relaxed cluster-level rating ma-
rating function fR (u, v) for a user u on an item v in
(k) (l) trix B is a cluster-level rating model. In the next sec-
terms of the two latent cluster variables cU and cV tion, we focus on learning the user-item joint mixture
 model as well as the shared cluster-level rating model
fR (u, v) = rP (r|u, v)
on the pooled rating data from multiple related tasks.
r
  (k) (l) (k) (l)
= r P (r|cU , cV )P (cU , cV |u, v)
r k,l
4. Rating-Matrix Generative Model
  (k) (l) (k) (l)
= r P (r|cU , cV )P (cU |u)P (cV |v), (1) In order to extend the shared cluster-level rating ma-
r k,l trix to a more general cluster-level rating model, we
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

4×4 Cluster-level Rating Matrix CF Task I CF Task II CF Task III


A B C D b d a c e
a e b f c d
I 3 1 2 1 5 ? 1 2 2 ? a c d f e b g
2 3 ? 1 1 2 2
II 2 3 3 1 3 1 1 ? 2 1 1 2 ? 3 ? 3 1 1
3 3 3 ? 1 2 ?
III 3 2 1 2 1 3 ? 3 ? 1 3 2 2 ? 3 3 1 ?
1 ? 2 3 3 ? 3
IV 1 1 2 3 4 3 3 ? 3 1 2 ? 3 2 2 1 ? 2
5 2 2 3 ? 3 ?
7 2 2 1 ? 2 5 3 3 ? 2 ? 2 ?
Extended to a cluster-level rating model 4 3 ? ? 2 1 1
2 ? 1 2 2 ? 4 1 ? 1 1 2 3 3
6 3 3 2 2 ? 1
6 1 1 ? 2 3
P(v)

Draw users and items from the same user-item joint mixture model for different tasks
3 1 2 1

2 3 3 1
P(u)

3 2 1 2

1 1 2 3

Figure 2. Each rating matrix can be viewed as drawing a set of users (horizontal straight lines) and items (vertical straight
lines) from the same user-item joint mixture model (the joint probability of a user-item pair is indicted by gray-scales)
as well as drawing the corresponding ratings (the crossing points of the horizontal and vertical lines) from a shared
cluster-level rating model (the figures denote the ratings which are most likely to be obtained in those co-clusters).

(k)
should first define a user-item bivariate probability u given the user cluster cU . The user-item bivariate
histogram over U × V. Let PU (u) and PV (v) denote probability histogram (4) can be rewritten as
the marginal distributions for users and items, respec-  (k) (l) (k) (l)
tively. The user-item bivariate probability histogram Huv = P (cU )P (cV )P (u|cU )P (v|cV ). (8)
is a |U| × |V| matrix, H, which is defined as the user- k,l
item joint distribution
Then, the users and items can be drawn respectively
Huv = P (u, v) = PU (u)PV (v). (4) from the user and the item mixture models which are
in terms of the two latent cluster variables
Thus, the user-item pairs for all the given tasks can be  (z) (z)   (k) (l) (k) (l)
drawn from H ui , vi ∼ P (cU )P (cV )P (u|cU )P (v|cV ).
 (z) (z)  k,l
ui , vi ∼ Pr(H), (5) (9)
Eq. (9) defines the user-item joint mixture model. Fur-
for z = 1, . . . , Z; i = 1, . . . , sz .
thermore, the ratings also can be drawn from the con-
Based on the assumption that there are K clusters in ditional distributions given the latent cluster variables
U and L clusters in V, we can model the user and item (z) (k) (l)
marginal distributions in the form of mixture mod- ri ∼ P (r|cU , cV ). (10)
els, in which each component corresponds to a latent
user/item cluster Eq. (10) defines the cluster-level rating model.
 (k) (k)
Combining (9) and (10), we can obtain the rating-
PU (u) = P (cU )P (u|cU ), (6) matrix generative model (RMGM), which can generate
k rating matrices. Figure 2 illustrates the rating-matrix
 (l) (l)
PV (v) = P (cV )P (v|cV ), (7) generating process on the three toy rating matrices.
l The 4 × 4 cluster-level rating matrix from Figure 1 is
(k)
extended to a cluster-level rating model. Each rating
where P (cU ) denotes the prior for the user cluster matrix can thus be viewed as drawing a set of users Uz
(k) (k)
cU and P (u|cU ) the conditional probability of a user and items Vz from the user-item joint mixture model as
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
 
well as drawing the corresponding ratings for (Uz , Vz ) z j:r
(z)
=r
P (k, l|j (z) )
(k) (l)
from the cluster-level rating model. Generally speak- P (r|cU , cV ) =  j . (16)
z j P (k, l|j (z) )
ing, each rating matrix can be viewed as drawing Dz
from the RMGM. In Eqs. (12–16), all the parameters in terms of the two
latent cluster
 variables are computed using the pooled
The formulation of RMGM is similar to the flexible
rating data z Dz . By alternating E-step and M-step,
mixture model (FMM) (Si & Jin, 2003). The major
an RMGM which is fit onto a set of related CF tasks
difference is that RMGM can generaterating matri-
can be obtained. In particular, the user-item joint
 for different CF tasks (recall that z Uz = ∅ and
ces
mixture model defined in (9) and the shared cluster-
z Vz = ∅ and the sizes of rating matrices are also level rating model defined in (10) can be learned. A
different from one another). RMGM can be viewed (z) (z) (z)
as extending FMM to a multi-task version such that rating triplet (ui , vi , ri ) from any task can thus
the user- and item-cluster variables are shared by and be viewed as drawing from the RMGM.
learned from multiple tasks. Furthermore, since the
RMGM is trained on the pooled rating data from mul- 4.2. RMGM-Based Prediction
tiple tasks, the training and prediction algorithms for After training the RMGM, according to (1), the miss-
RMGM are also different from those for FMM. ing values in the K given rating matrices can be gen-
erated by
4.1. Training the RMGM (z) (z)
  (k) (l)
fR (ui , vi ) = r P (r|cU , cV )
In this section, we introduce  how to train an RMGM r k,l
on the pooled rating data z Dz . We need to learn five (k) (z) (l) (z)
(k) P (cU |ui )P (cV |vi ), (17)
sets of model parameters in (9) and (10), i.e., P (cU ),
(l) (k) (l) (k) (l) (k) (z) (l) (z)
P (cV ), P (u|cU ), P (v|cV ), and where P (cU |ui ) and P (cV |vi ) can be computed
 P (r|cU , cV ), for k =
1, . . . , K; l = 1, . . . , L; u ∈ z Uz ; v ∈ z Vz ; and using the learned parameters based on the Bayes rule.
r ∈ R. To predict the ratings on Vz for a new user u(z) in
We adopt the Expectation Maximization (EM) algo- the z-th task, we can solve a quadratic optimiza-
rithm (Dempster et al., 1977) for RMGM training. In tion problem to compute the user-cluster member-
(k) (l)
the E-step, the joint posterior probability of (cU , cV ) ship, pu(z) ∈ RK , for u(z) based on the given ratings
(z) (z) (z) ru(z) ∈ {R, 0}mz (the unobserved ratings are set to 0)
given (ui , vi , ri ) can be computed using the five
 2
sets of model parameters minpu(z) [BPVz ] pu(z) − ru(z) W (18)
u(z)
(k) (l) (z) (z) (z)
P (cU , cV |ui , vi , ri ) = (11) s.t. p
u(z) 1 = 1.
(z) (k) (z) (l) (z) (k) (l)
P (ui , cU )P (vi , cV )P (ri |cU , cV ) In Eq. (18), PVz is an L × mz item-cluster member-
 (z) (p) (z) (q) (z) (p) (q) (l) (z)
p,q P (ui , cU )P (vi , cV )P (ri |cU , cV )
ship matrix, where [PVz ]li = P (cV |vi ); Wu(z) is an
mz ×mz diagonal matrix, where [Wu(z) ]ii = 1 if [ru(z) ]i
(z) (k) (k) (z) (k) (z) (l)
and P (ui , cU ) = P (cU )P (ui |cU ), P (vi , cV ) = is given and [Wu(z) ]ii = 0 otherwise.
√ Here xW de-
(l) (z) (l) 
P (cV )P (vi |cV ). notes a weighted l2 -norm, x Wx. The quadratic
optimization problem (18) is very simple and can be
In the M-step, the five sets of model parameters for Z solved by any quadratic solver. After obtaining the op-
given tasks are updated as follows (let P (k, l|j (z) ) as a timal user-cluster membership p̂u(z) for u(z) , the rat-
(k) (l) (z) (z) (z)
shorthand for P (cU , cV |uj , vj , rj ) for simplicity) (z)
ings of u(z) on vi can be predicted by
  
P (k, l|j (z) ) (z)
fR (u(z) , vi ) = p̂
(k)
P (cU ) =
z l
j (12) u(z) Bpv (z) , i
(19)
z sz
   where pv(z) is the i-th column in PVz . Similarly, based
z k P (k, l|j (z) ) i
(l)
P (cV ) = j (13) on the learned parameters, we can also predict the
z sz ratings of all the existing users in the z-th task on a
  (z)
(z) (z) P (k, l|j ) new item. Due to space limitation, we skip the details.
(z) (k) l j:uj =ui
P (ui |cU ) = (k) 
(14)
P (cU ) z sz 4.3. Implementation Details
  (z)
(z) (z) P (k, l|j )
(z) (l) k j:vj =vi Initialization: Since the optimization problem for
P (vi |cV ) = (l) 
(15)
P (cV ) z sz RMGM training is non-convex, the initialization for
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

the five sets of model parameters is crucial for search- sided feature representation (in both row and column
ing a better local maximum. We first select the spaces). Owing to such two-sided feature representa-
densest rating matrix from the given tasks, and si- tion, RMGM can share the knowledge across multiple
multaneously cluster the rows (users) and columns tabular data sets from different domains.
(items) in that matrix using orthogonal nonnegative
Since RMGM is a mixture model, our method is also
matrix tri-factorization (Ding et al., 2006) (other co-
related to various model-based CF methods. The most
clustering methods are also applicable). Based on the
(k) similar one is the flexible mixture model (FMM) (Si
co-clustering results, we can coarsely estimate P (cU ), & Jin, 2003) which simultaneously models users and
(l) (k) (l)
P (cV ), and P (r|cU , cV ). We use random values for items into mixture models in terms of two latent clus-
(z) (k) (z) (l)
initializing P (ui |cU ) and P (vi |cV ). Note that the ter variables. However, as pointed out in Section 4,
five sets of initialized parameters should be respec- our RMGM is different from FMM in both training
 (k)  (l)
k P (cU ) = 1, l P (cV ) = 1,
tively normalized: and prediction algorithms; moreover, the major dif-
 (k) (l)   (z) (k) ference is that RMGM is able to generate rating ma-
P (r|c , c ) = 1, i P (ui |cU ) = 1, and
r  U (z) V (l) z
trices in different domains. Several methods also aim
z i P (vi |cV ) = 1. at simultaneously clustering users and items for mod-
Regularization: In order to avoid unfavorable local eling rating patterns, such as the two-sided cluster-
maxima, we also impose regularization on the EM al- ing model (Hofmann & Puzicha, 1999) and the co-
gorithm (Hofmann & Puzicha, 1998). We adopt the clustering-based model (George & Merugu, 2005).
same strategy used in (Si & Jin, 2003) and we skip
this part for space limitation. 6. Experiments
Model Selection: We need to set the numbers of user
In this section, we investigate whether the CF perfor-
and item clusters, K and L, to start with. The cluster-
mance can be improved by applying RMGM to ex-
level rating model B should be not only expressive
tracting the shared knowledge from multiple rating
enough to encode and compress various cluster-level
matrices in related domains. We compare our RMGM-
user-item rating patterns but also compact enough to
based cross-domain collaborative filtering method to
avoid over-fitting. In the empirical tests, we observed
two baseline single-task methods. One is the well
that the performance is rather stable when K and L
known memory-based method, Pearson correlation
are in the range of [20, 50]. Thus, we simply set K = 20
coefficients (PCC) (Resnick et al., 1994), and we
and L = 20 in our experiments.
search 20-nearest neighbors in our experiments. The
other is the flexible mixture model (FMM) (Si & Jin,
5. Related Works 2003), which can be viewed as a single-task version
of RMGM. Since (Si & Jin, 2003) claims that FMM
The proposed cross-domain collaborative filtering be- performs better than some well-known state-of-the-art
longs to multi-task learning. The earliest studies on
model-based methods, we only compare our method to
multi-task learning should be (Caruana, 1997; Baxter,
FMM. We aim to validate that sharing useful informa-
2000), which learn multiple tasks by sharing a hid- tion by learning a common rating model for multiple
den layer in neural network. In our proposed RMGM
related CF tasks can obtain better performance than
method, each given rating matrix in the related do- learning individual models for these tasks separately.
mains can be generated by drawing a set of users and
items as well as the corresponding ratings from the
6.1. Data Sets
RMGM. In other words, each user/item in the given
rating matrix is a linear combination of the proto- The following three real-world CF data sets are used
types for user/item clusters (see Eq. (19)). The shared for performance evaluation. Our method will learn
cluster-level rating model B is a two-sided feature rep- a shared model (RMGM) on the union of the rating
resentation for both users and items. This knowledge data from these data sets, and the learned model is
sharing fashion is similar to the feature-representation applicable for either task.
based multi-task/transfer learning, such as (Jebara,
MovieLens1 : A movie rating data set comprising
2004; Argyriou et al., 2007; Raina et al., 2007). They
100,000 ratings (scales 1 − 5) provided by 943 users
intend to find a common feature representation (usu-
ally a low-dimensional subspace) that is beneficial for on 1682 movies. We randomly select 500 users with
more than 20 ratings and 1000 movies for experiments
the related tasks. A major difference from our work is
(rating ratio 4.33%).
that these methods learn a one-sided feature represen-
tation (in row space) while our method learns a two- 1
http://www.grouplens.org/node/73
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

EachMovie2 : A movie rating data set comprising 2.8


Table 1. MAE Comparison on MovieLens (ML), Each-
million ratings (scales 1 − 6) provided by 72,916 users
Movie (EM), and Book-Crossing (BX).
on 1628 movies. We randomly select 500 users with
more than 20 ratings and 1000 movies for experiments
(rating ratio 3.28%). For a rating scale consistency Train Method Given5 Given10 Given15
with other tasks, we replace 6 with 5 in the rating
matrix to make the rating scales from 1 to 5. PCC 0.930 0.908 0.895
ML100 FMM 0.908 0.868 0.846
Book-Crossing3: A book rating data set comprising RMGM 0.868 0.822 0.808
more than 1.1 million ratings (scales 1 − 10) provided
by 278,858 users on 271,379 books. We randomly se- PCC 0.934 0.899 0.888
lect 500 users and 1000 books with more than 16 rat- ML200 FMM 0.890 0.863 0.847
RMGM 0.859 0.821 0.806
ings for experiments (rating ratio 2.78%). We also
normalize the rating scales from 1 to 5. PCC 0.935 0.896 0.888
ML300 FMM 0.885 0.868 0.846
6.2. Evaluation Protocol RMGM 0.857 0.820 0.804

We evaluate the performance of the compared methods


PCC 0.996 0.952 0.936
under different configurations. The first 100, 200, and EM100 FMM 0.969 0.937 0.924
300 users in the three rating matrices (each data set RMGM 0.942 0.908 0.895
forms a 500 × 1000 rating matrix) are used for train-
ing, respectively, and the last 200 users for testing. PCC 0.983 0.943 0.930
For each test user, three different sizes of the observed EM200 FMM 0.955 0.933 0.923
RMGM 0.934 0.905 0.890
ratings (Given5, Given10, Given15) are provided for
training and the remaining ratings are used for evalua-
PCC 0.976 0.937 0.933
tion. Note that in our experiments, the given observed EM300 FMM 0.952 0.930 0.924
rating indices are randomly selected 10 times, so that RMGM 0.934 0.906 0.890
the reported results in Table 1 are the average results
over 10 splits. PCC 0.617 0.599 0.600
BX100 FMM 0.619 0.592 0.583
The evaluation
 metric we adopt is mean absolute error RMGM 0.612 0.583 0.573
(MAE): ( i∈T |ri − r̃i |)/|T |, where T denotes the set
of test ratings, ri is the ground truth and r̃i is the PCC 0.621 0.612 0.620
predicted rating. A smaller value of MAE means a BX200 FMM 0.617 0.602 0.596
better performance. RMGM 0.615 0.591 0.583

PCC 0.621 0.619 0.630


6.3. Results BX300 FMM 0.615 0.604 0.596
The comparison results on the three data sets are re- RMGM 0.612 0.590 0.581
ported in Table 1. One can see that our method clearly
outperforms the two baseline methods under all the
testing configurations on all the three data sets. FMM
6.4. Discussion
performs slightly better than PCC, which implies that
the model-based methods can benefit from sharing Although the proposed method can clearly outperform
knowledge within user and item clusters. RMGM per- the other compared methods on all the three data sets,
forms even better than FMM, which implies that clus- we can see that there still exists some room for further
tering users and items across multiple related tasks performance improvements. A crucial problem lies in
can aggregate even more useful knowledge than clus- the inherent problem of the data sets, i.e., the users
tering users and items in individual tasks. The overall and items in the rating matrices may not always be
experimental results have validated that the proposed able to be grouped into high quality clusters. We ob-
RMGM indeed can gain additional useful knowledge serve that the average ratings of the three data sets
by pooling the rating data from multiple related CF are far larger than the medians (given the median be-
tasks to make these tasks benefit from one another. ing 3, the average ratings are 3.64, 3.95, and 4.22 for
2
http://www.cs.cmu.edu/˜lebanon/IR-lab.htm the three data sets, respectively). This may be caused
3
http://www.informatik.uni-freiburg.de/˜cziegler/BX/ by the fact that the items with the most ratings are
usually the most popular ones. In other words, users
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model

are willing to rate their favorite items and to recom- Adaptive Hypermedia and Adaptive Web-Based Sys-
mend them to others, but have little interest to rate tems (pp. 103–112).
the items they dislike. Given that no clear user and
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977).
item groups can be discovered for these cases, it is hard
Maximum likelihood from incomplete data via the
to learn a good cluster-level rating model.
EM algorithm. J. of the Royal Statistical Society,
B39, 1–38.
7. Conclusion
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthog-
In this paper, we proposed a novel cross-domain col- onal nonnegative matrix tri-factorizations for clus-
laborative filtering method based on the rating-matrix tering. Proc. of the 12th ACM SIGKDD Int’l Conf.
generative model (RMGM) for recommender systems. (pp. 126–135).
RMGM can share useful knowledge across multiple
rating matrices in related domains to alleviate the George, T., & Merugu, S. (2005). A scalable collab-
sparsity problems in individual tasks. The knowledge orative filtering framework based on co-clustering.
is shared in the form of a latent cluster-level rating Proc. of the Fifth IEEE Int’l Conf. on Data Mining
model, which is trained on the pooled rating data (pp. 625–628).
from multiple related rating matrices. Each rating ma- Hofmann, T., & Puzicha, J. (1998). Statistical mod-
trix can thus be viewed as drawing a set of users and els for co-occurrence data (Technical Report AIM-
items from the user-item joint mixture model as well 1625). Artifical Intelligence Laboratory, MIT.
as drawing the corresponding ratings from the cluster-
level rating model. The experimental results have val- Hofmann, T., & Puzicha, J. (1999). Latent class mod-
idated that the proposed RMGM indeed can gain ad- els for collaborative filtering. Proc. of the 16th Int’l
ditional useful knowledge by pooling the rating data Joint Conf. on Artificial Intelligence (pp. 688–693).
from multiple related tasks to make these tasks benefit Jebara, T. (2004). Multi-task feature and kernel se-
from one another. lection for SVMs. Proc. of the 21st Int’l Conf. on
In our future work, we will 1) investigate how to statis- Machine Learning (pp. 329–336).
tically quantify the “relatedness” between rating ma- Pennock, D. M., Horvitz, E., Lawrence, S., & Giles,
trices in different domains, and 2) consider an asym- C. L. (2000). Collaborative filtering by personality
metric problem setting where knowledge can be trans- diagnosis: A hybrid memory- and model-based ap-
ferred from a dense auxiliary rating matrix in one do- proach. Proc. of the 16th Conf. on Uncertainty in
main to a sparse target one in another domain. Artificial Intelligence (pp. 473–480).
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y.
Acknowledgments
(2007). Self-taught learning: Transfer learning from
Bin Li and Qiang Yang are supported by Hong Kong unlabeled data. Proc. of the Int’l Conf. on Machine
CERG Grant 621307; Bin Li and Xiangyang Xue are Learning (pp. 759–766).
supported in part by Shanghai Leading Academic Dis-
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P.,
cipline Project (No. B114) and NSF of China (No.
& Riedl, J. (1994). GroupLens: An open architec-
60873178).
ture for collaborative filtering of netnews. Proc. of
the ACM Conf. on Computer Supported Cooperative
References Work (pp. 175–186).
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Sarwar, B., Karypis, G., Konstan, J., & Riedl, J.
Multi-task feature learning. Advances in Neural In- (2001). Item-based collaborative filtering recom-
formation Processing Systems 19 (pp. 41–48). mendation algorithms. Proc. of the 10th Int’l World
Wide Web Conf. (pp. 285–295).
Baxter, J. (2000). A model of inductive bias learning.
J. of Artificial Intelligence Research, 12, 149–198. Si, L., & Jin, R. (2003). Flexible mixture model for
collaborative filtering. Proc. of the 20th Int’l Conf.
Caruana, R. A. (1997). Multitask learning. Machine on Machine Learning (pp. 704–711).
Learning, 28, 41–75.
Srebro, N., & Jaakkola, T. (2003). Weighted low-rank
Coyle, M., & Smyth, B. (2008). Web search shared: approximations. Proc. of the 20th Int’l Conf. on
Social aspects of a collaborative, community-based Machine Learning (pp. 720–727).
search network. Proc. of the Fifth Int’l Conf. on

You might also like