Transfer Learning For Collaborative Filtering Via A Rating-Matrix Generative Model
Transfer Learning For Collaborative Filtering Via A Rating-Matrix Generative Model
Bin Li [email protected]
School of Computer Science, Fudan University, Shanghai 200433, China
Qiang Yang [email protected]
Dept. of Computer Science & Engineering, Hong Kong University of Science & Technology, Hong Kong, China
Xiangyang Xue [email protected]
School of Computer Science, Fudan University, Shanghai 200433, China
studies have been done to uncover this knowledge. ing data in the z-th rating matrix is a set of triplets
(z) (z) (z) (z) (z) (z)
In this paper, we solve the problem of learning a Dz = {(u1 , v1 , r1 ), . . . , (usz , vsz , rsz )}, where sz
rating-matrix generative model from a set of rating is the number of available ratings in the z-th rating
matrices in multiple related recommender systems (do- matrix. The ratings in {D1 , . . . , DZ } should be in the
mains) for collaborative filtering. Our aim is to al- same rating scales R (e.g., 1 − 5).
leviate the sparsity problem in individual rating ma- For model-based CF methods, a preference/rating
trices by discovering what is common among them. model, e.g., the aspect model (Hofmann & Puzicha,
We first show that the relatedness across multiple rat- 1999), can be trained on Dz for the z-th task. In our
ing matrices can be established by sharing an im- cross-domain collaborative filtering setting, we wish to
plicit cluster-level rating matrix. Then, we extend train a rating-matrix generative model (RMGM) for
the shared cluster-level rating matrix to a more gen- all the given
related tasks on the pooled rating data,
eral cluster-level rating model, which defines a rating namely, z Dz . Then, the z-th rating matrix can be
function in terms of the latent user- and item-cluster viewed as drawing a set of users, Uz , and a set of items,
variables. Consequently, a rating matrix of any re- Vz , from the learned RMGM. The missing values in the
lated task can be viewed as drawing a set of users and z-th rating matrix can be generated by the RMGM.
items from a user-item joint mixture model as well
as drawing the corresponding ratings from the cluster-
level rating model. The combination of these two mod-
3. Cluster-Level Rating Matrix as
els gives the rating-matrix generative model (RMGM). Knowledge Sharing
We also propose an algorithm for training the RMGM To allow knowledge-sharing across multiple rating ma-
on the pooled rating data from multiple related rat- trices, we first investigate how to establish the relat-
ing matrices as well as an algorithm for predicting edness among the given tasks. A difficulty is that
the missing ratings for new users in different tasks. no explicit correspondence among the user sets or the
Experimental comparison is carried out on the three item sets in the given rating matrices can be exploited.
real-world CF data sets. The results show that our However, some collaborative filtering tasks are some-
proposed RMGM learned from multiple CF tasks can what related in certain aspects. Take movie-rating
outperform the individual models trained separately. and book-rating web sites for example. On one hand,
The remainder of the paper is organized as follows. In movies and books have correspondence in genre. On
Section 2, we first introduce the problem setting for the other hand, although the user sets are different
cross-domain collaborative filtering and the notations from one another, they are the subsets sampled from
used in this paper. In Section 3, we describe how to es- the same population (this assumption only holds for
tablish the relatedness across multiple rating matrices popular web sites) and should reflect similar social as-
via a shared cluster-level rating matrix. The RMGM pects (Coyle & Smyth, 2008).
is presented in Section 4 as well as the training and The above observation suggests that, although we can
prediction algorithms. Related work is introduced in not find an explicit correspondence among individual
Section 5. We experimentally validate the effectiveness users or items, we can establish a cluster-level rating-
of the RMGM for cross-domain collaborative filtering pattern representation as a “bridge” to connect all the
in Section 6 and conclude the paper in Section 7. related rating matrices. Figure 1 illustrates how the
implicit relatedness among three artificially generated
2. Problem Setting rating matrices is established via a cluster-level rating
matrix. By permuting the rows and columns (which
Suppose that we are given Z rating matrices in related is equivalent to co-clustering) in each rating matrix,
domains for collaborative filtering. In the z-th rating we can obtain three block rating matrices. Each block
(z) (z)
matrix, a set of users, Uz = {u1 , . . . , unz } ⊂ U, make comprises a set of ratings provided by a user group on
(z) (z)
ratings on a set of items, Vz = {v1 , . . . , vmz } ⊂ V, an item group. We can further reduce the block ma-
where nz and mz denote the numbers of rows (users) trices to be the cluster-level rating matrices, in which
and columns (items), respectively. The random vari- each row corresponds to a user cluster and each col-
ables u and v are assumed to be independent from umn an item cluster. The entries in the cluster-level
each other. To consider the more difficult case, we rating matrices are the average ratings of the corre-
assume that neither the user sets nor the item sets sponding user-item co-clusters. The resulting cluster-
in
the given rating matrices have intersections, i.e., level rating matrices reveal that the three rating ma-
z U z = ∅ and z Vz = ∅ (in fact, there may exist trices implicitly share a common 4 × 4 cluster-level
intersections, but they are unobservable). The rat- rating-pattern representation.
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
a b c d e f a e b f c d
3 3 ? 2 ? 3 1 1 ? 2 3 3 ? 3 II 2 3 3 II 2 3 3 1
4 3 ? 1 1 ? 2 5 2 2 3 ? 3 ? III 3 2 1 III 3 2 1 2
5 2 3 3 ? 2 ? 4 3 ? ? 2 1 1 IV 1 1 2 3
6 3 2 ? 1 3 2 6 3 3 2 2 ? 1 Cluster-level
Rating Matrix
a b c d e b d a c e
3 ? 1 2 1 1 1 3 ? 3 ? 1 II 3 3 1 II 2 3 3 1
4 ? 3 3 3 1 4 3 3 ? 3 1 III 2 1 2 III 3 2 1 2
5 2 ? 2 1 ? 7 2 2 1 ? 2 IV 1 2 3 IV 1 1 2 3
6 ? 1 2 1 3 2 ? 1 2 2 ?
7 1 2 ? 2 2 6 1 1 ? 2 3 Cluster-level
Rating Matrix
Permute rows & cols
a b c d e f g a c d f e b g A B C D
1 2 1 ? 3 3 ? 1 1 2 ? 3 ? 3 1 1 A B C D I 3 1 2 1
CF Task III
2 ? ? 3 2 1 2 2 3 2 2 ? 3 3 1 ? II 2 3 3 1 II 2 3 3 1
3 2 1 2 ? 3 3 ? 2 ? 3 2 2 1 ? 2 III 3 2 1 2 III 3 2 1 2
4 1 3 ? 1 2 1 3 5 3 3 ? 2 ? 2 ? IV 1 1 2 3 IV 1 1 2 3
5 3 2 3 ? ? 2 ? 4 1 ? 1 1 2 3 3
Figure 1. Sharing cluster-level user-item rating patterns among three toy rating matrices in different domains. The missing
values are denoted by ‘?’. After permuting the rows (users) and columns (items) in each rating matrix, it is revealed that
the three rating matrices implicitly share a common 4 × 4 cluster-level rating matrix.
This toy example shows an ideal case in which the where (1) is obtained based on the assumption that
users and items in the same cluster behave exactly random variables u and v are independent.
the same. In many real-world cases, since users may
We can further rewrite (1) in the form of matrices
have multiple personalities and items may have multi-
ple attributes, a user or an item can simultaneously fR (u, v) = p p
u Bpv , u 1 = 1, pv 1 = 1, (2)
belong to multiple clusters with different member-
ships. Thus, we need to introduce softness to clus- where pu ∈ RK and pv ∈ RL are the user- and item-
tering models. Suppose there are K user clusters, (k)
cluster membership vectors ([pu ]k = P (cU |u) and
(1) (K) (1) (L)
{cU , . . . , cU }, and L item clusters, {cV , . . . , cV }, (l)
[pv ]l = P (cV |v)), and B is a K × L relaxed cluster-
in the shared cluster-level rating patterns. The mem- level rating matrix in which an entry can have multiple
bership of a user-item pair (u, v) to a user-item co- ratings with different probabilities
(k) (l)
cluster (cU , cV ) is the joint posterior membership
(k) (l) (k) (l)
probability P (cU , cV |u, v). Furthermore, a user-item Bkl = rP (r|cU , cV ). (3)
co-cluster can also have multiple ratings with different r
(k) (l)
probabilities P (r|cU , cV ). Then, we can define the
Eq. (2) implies that the relaxed cluster-level rating ma-
rating function fR (u, v) for a user u on an item v in
(k) (l) trix B is a cluster-level rating model. In the next sec-
terms of the two latent cluster variables cU and cV tion, we focus on learning the user-item joint mixture
model as well as the shared cluster-level rating model
fR (u, v) = rP (r|u, v)
on the pooled rating data from multiple related tasks.
r
(k) (l) (k) (l)
= r P (r|cU , cV )P (cU , cV |u, v)
r k,l
4. Rating-Matrix Generative Model
(k) (l) (k) (l)
= r P (r|cU , cV )P (cU |u)P (cV |v), (1) In order to extend the shared cluster-level rating ma-
r k,l trix to a more general cluster-level rating model, we
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
Draw users and items from the same user-item joint mixture model for different tasks
3 1 2 1
2 3 3 1
P(u)
3 2 1 2
1 1 2 3
Figure 2. Each rating matrix can be viewed as drawing a set of users (horizontal straight lines) and items (vertical straight
lines) from the same user-item joint mixture model (the joint probability of a user-item pair is indicted by gray-scales)
as well as drawing the corresponding ratings (the crossing points of the horizontal and vertical lines) from a shared
cluster-level rating model (the figures denote the ratings which are most likely to be obtained in those co-clusters).
(k)
should first define a user-item bivariate probability u given the user cluster cU . The user-item bivariate
histogram over U × V. Let PU (u) and PV (v) denote probability histogram (4) can be rewritten as
the marginal distributions for users and items, respec- (k) (l) (k) (l)
tively. The user-item bivariate probability histogram Huv = P (cU )P (cV )P (u|cU )P (v|cV ). (8)
is a |U| × |V| matrix, H, which is defined as the user- k,l
item joint distribution
Then, the users and items can be drawn respectively
Huv = P (u, v) = PU (u)PV (v). (4) from the user and the item mixture models which are
in terms of the two latent cluster variables
Thus, the user-item pairs for all the given tasks can be (z) (z) (k) (l) (k) (l)
drawn from H ui , vi ∼ P (cU )P (cV )P (u|cU )P (v|cV ).
(z) (z) k,l
ui , vi ∼ Pr(H), (5) (9)
Eq. (9) defines the user-item joint mixture model. Fur-
for z = 1, . . . , Z; i = 1, . . . , sz .
thermore, the ratings also can be drawn from the con-
Based on the assumption that there are K clusters in ditional distributions given the latent cluster variables
U and L clusters in V, we can model the user and item (z) (k) (l)
marginal distributions in the form of mixture mod- ri ∼ P (r|cU , cV ). (10)
els, in which each component corresponds to a latent
user/item cluster Eq. (10) defines the cluster-level rating model.
(k) (k)
Combining (9) and (10), we can obtain the rating-
PU (u) = P (cU )P (u|cU ), (6) matrix generative model (RMGM), which can generate
k rating matrices. Figure 2 illustrates the rating-matrix
(l) (l)
PV (v) = P (cV )P (v|cV ), (7) generating process on the three toy rating matrices.
l The 4 × 4 cluster-level rating matrix from Figure 1 is
(k)
extended to a cluster-level rating model. Each rating
where P (cU ) denotes the prior for the user cluster matrix can thus be viewed as drawing a set of users Uz
(k) (k)
cU and P (u|cU ) the conditional probability of a user and items Vz from the user-item joint mixture model as
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
well as drawing the corresponding ratings for (Uz , Vz ) z j:r
(z)
=r
P (k, l|j (z) )
(k) (l)
from the cluster-level rating model. Generally speak- P (r|cU , cV ) = j . (16)
z j P (k, l|j (z) )
ing, each rating matrix can be viewed as drawing Dz
from the RMGM. In Eqs. (12–16), all the parameters in terms of the two
latent cluster
variables are computed using the pooled
The formulation of RMGM is similar to the flexible
rating data z Dz . By alternating E-step and M-step,
mixture model (FMM) (Si & Jin, 2003). The major
an RMGM which is fit onto a set of related CF tasks
difference is that RMGM can generaterating matri-
can be obtained. In particular, the user-item joint
for different CF tasks (recall that z Uz = ∅ and
ces
mixture model defined in (9) and the shared cluster-
z Vz = ∅ and the sizes of rating matrices are also level rating model defined in (10) can be learned. A
different from one another). RMGM can be viewed (z) (z) (z)
as extending FMM to a multi-task version such that rating triplet (ui , vi , ri ) from any task can thus
the user- and item-cluster variables are shared by and be viewed as drawing from the RMGM.
learned from multiple tasks. Furthermore, since the
RMGM is trained on the pooled rating data from mul- 4.2. RMGM-Based Prediction
tiple tasks, the training and prediction algorithms for After training the RMGM, according to (1), the miss-
RMGM are also different from those for FMM. ing values in the K given rating matrices can be gen-
erated by
4.1. Training the RMGM (z) (z)
(k) (l)
fR (ui , vi ) = r P (r|cU , cV )
In this section, we introduce how to train an RMGM r k,l
on the pooled rating data z Dz . We need to learn five (k) (z) (l) (z)
(k) P (cU |ui )P (cV |vi ), (17)
sets of model parameters in (9) and (10), i.e., P (cU ),
(l) (k) (l) (k) (l) (k) (z) (l) (z)
P (cV ), P (u|cU ), P (v|cV ), and where P (cU |ui ) and P (cV |vi ) can be computed
P (r|cU , cV ), for k =
1, . . . , K; l = 1, . . . , L; u ∈ z Uz ; v ∈ z Vz ; and using the learned parameters based on the Bayes rule.
r ∈ R. To predict the ratings on Vz for a new user u(z) in
We adopt the Expectation Maximization (EM) algo- the z-th task, we can solve a quadratic optimiza-
rithm (Dempster et al., 1977) for RMGM training. In tion problem to compute the user-cluster member-
(k) (l)
the E-step, the joint posterior probability of (cU , cV ) ship, pu(z) ∈ RK , for u(z) based on the given ratings
(z) (z) (z) ru(z) ∈ {R, 0}mz (the unobserved ratings are set to 0)
given (ui , vi , ri ) can be computed using the five
2
sets of model parameters minpu(z) [BPVz ] pu(z) − ru(z) W (18)
u(z)
(k) (l) (z) (z) (z)
P (cU , cV |ui , vi , ri ) = (11) s.t. p
u(z) 1 = 1.
(z) (k) (z) (l) (z) (k) (l)
P (ui , cU )P (vi , cV )P (ri |cU , cV ) In Eq. (18), PVz is an L × mz item-cluster member-
(z) (p) (z) (q) (z) (p) (q) (l) (z)
p,q P (ui , cU )P (vi , cV )P (ri |cU , cV )
ship matrix, where [PVz ]li = P (cV |vi ); Wu(z) is an
mz ×mz diagonal matrix, where [Wu(z) ]ii = 1 if [ru(z) ]i
(z) (k) (k) (z) (k) (z) (l)
and P (ui , cU ) = P (cU )P (ui |cU ), P (vi , cV ) = is given and [Wu(z) ]ii = 0 otherwise.
√ Here xW de-
(l) (z) (l)
P (cV )P (vi |cV ). notes a weighted l2 -norm, x Wx. The quadratic
optimization problem (18) is very simple and can be
In the M-step, the five sets of model parameters for Z solved by any quadratic solver. After obtaining the op-
given tasks are updated as follows (let P (k, l|j (z) ) as a timal user-cluster membership p̂u(z) for u(z) , the rat-
(k) (l) (z) (z) (z)
shorthand for P (cU , cV |uj , vj , rj ) for simplicity) (z)
ings of u(z) on vi can be predicted by
P (k, l|j (z) ) (z)
fR (u(z) , vi ) = p̂
(k)
P (cU ) =
z l
j (12) u(z) Bpv (z) , i
(19)
z sz
where pv(z) is the i-th column in PVz . Similarly, based
z k P (k, l|j (z) ) i
(l)
P (cV ) = j (13) on the learned parameters, we can also predict the
z sz ratings of all the existing users in the z-th task on a
(z)
(z) (z) P (k, l|j ) new item. Due to space limitation, we skip the details.
(z) (k) l j:uj =ui
P (ui |cU ) = (k)
(14)
P (cU ) z sz 4.3. Implementation Details
(z)
(z) (z) P (k, l|j )
(z) (l) k j:vj =vi Initialization: Since the optimization problem for
P (vi |cV ) = (l)
(15)
P (cV ) z sz RMGM training is non-convex, the initialization for
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
the five sets of model parameters is crucial for search- sided feature representation (in both row and column
ing a better local maximum. We first select the spaces). Owing to such two-sided feature representa-
densest rating matrix from the given tasks, and si- tion, RMGM can share the knowledge across multiple
multaneously cluster the rows (users) and columns tabular data sets from different domains.
(items) in that matrix using orthogonal nonnegative
Since RMGM is a mixture model, our method is also
matrix tri-factorization (Ding et al., 2006) (other co-
related to various model-based CF methods. The most
clustering methods are also applicable). Based on the
(k) similar one is the flexible mixture model (FMM) (Si
co-clustering results, we can coarsely estimate P (cU ), & Jin, 2003) which simultaneously models users and
(l) (k) (l)
P (cV ), and P (r|cU , cV ). We use random values for items into mixture models in terms of two latent clus-
(z) (k) (z) (l)
initializing P (ui |cU ) and P (vi |cV ). Note that the ter variables. However, as pointed out in Section 4,
five sets of initialized parameters should be respec- our RMGM is different from FMM in both training
(k) (l)
k P (cU ) = 1, l P (cV ) = 1,
tively normalized: and prediction algorithms; moreover, the major dif-
(k) (l) (z) (k) ference is that RMGM is able to generate rating ma-
P (r|c , c ) = 1, i P (ui |cU ) = 1, and
r U (z) V (l) z
trices in different domains. Several methods also aim
z i P (vi |cV ) = 1. at simultaneously clustering users and items for mod-
Regularization: In order to avoid unfavorable local eling rating patterns, such as the two-sided cluster-
maxima, we also impose regularization on the EM al- ing model (Hofmann & Puzicha, 1999) and the co-
gorithm (Hofmann & Puzicha, 1998). We adopt the clustering-based model (George & Merugu, 2005).
same strategy used in (Si & Jin, 2003) and we skip
this part for space limitation. 6. Experiments
Model Selection: We need to set the numbers of user
In this section, we investigate whether the CF perfor-
and item clusters, K and L, to start with. The cluster-
mance can be improved by applying RMGM to ex-
level rating model B should be not only expressive
tracting the shared knowledge from multiple rating
enough to encode and compress various cluster-level
matrices in related domains. We compare our RMGM-
user-item rating patterns but also compact enough to
based cross-domain collaborative filtering method to
avoid over-fitting. In the empirical tests, we observed
two baseline single-task methods. One is the well
that the performance is rather stable when K and L
known memory-based method, Pearson correlation
are in the range of [20, 50]. Thus, we simply set K = 20
coefficients (PCC) (Resnick et al., 1994), and we
and L = 20 in our experiments.
search 20-nearest neighbors in our experiments. The
other is the flexible mixture model (FMM) (Si & Jin,
5. Related Works 2003), which can be viewed as a single-task version
of RMGM. Since (Si & Jin, 2003) claims that FMM
The proposed cross-domain collaborative filtering be- performs better than some well-known state-of-the-art
longs to multi-task learning. The earliest studies on
model-based methods, we only compare our method to
multi-task learning should be (Caruana, 1997; Baxter,
FMM. We aim to validate that sharing useful informa-
2000), which learn multiple tasks by sharing a hid- tion by learning a common rating model for multiple
den layer in neural network. In our proposed RMGM
related CF tasks can obtain better performance than
method, each given rating matrix in the related do- learning individual models for these tasks separately.
mains can be generated by drawing a set of users and
items as well as the corresponding ratings from the
6.1. Data Sets
RMGM. In other words, each user/item in the given
rating matrix is a linear combination of the proto- The following three real-world CF data sets are used
types for user/item clusters (see Eq. (19)). The shared for performance evaluation. Our method will learn
cluster-level rating model B is a two-sided feature rep- a shared model (RMGM) on the union of the rating
resentation for both users and items. This knowledge data from these data sets, and the learned model is
sharing fashion is similar to the feature-representation applicable for either task.
based multi-task/transfer learning, such as (Jebara,
MovieLens1 : A movie rating data set comprising
2004; Argyriou et al., 2007; Raina et al., 2007). They
100,000 ratings (scales 1 − 5) provided by 943 users
intend to find a common feature representation (usu-
ally a low-dimensional subspace) that is beneficial for on 1682 movies. We randomly select 500 users with
more than 20 ratings and 1000 movies for experiments
the related tasks. A major difference from our work is
(rating ratio 4.33%).
that these methods learn a one-sided feature represen-
tation (in row space) while our method learns a two- 1
http://www.grouplens.org/node/73
Transfer Learning for Collaborative Filtering via a Rating-Matrix Generative Model
are willing to rate their favorite items and to recom- Adaptive Hypermedia and Adaptive Web-Based Sys-
mend them to others, but have little interest to rate tems (pp. 103–112).
the items they dislike. Given that no clear user and
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977).
item groups can be discovered for these cases, it is hard
Maximum likelihood from incomplete data via the
to learn a good cluster-level rating model.
EM algorithm. J. of the Royal Statistical Society,
B39, 1–38.
7. Conclusion
Ding, C., Li, T., Peng, W., & Park, H. (2006). Orthog-
In this paper, we proposed a novel cross-domain col- onal nonnegative matrix tri-factorizations for clus-
laborative filtering method based on the rating-matrix tering. Proc. of the 12th ACM SIGKDD Int’l Conf.
generative model (RMGM) for recommender systems. (pp. 126–135).
RMGM can share useful knowledge across multiple
rating matrices in related domains to alleviate the George, T., & Merugu, S. (2005). A scalable collab-
sparsity problems in individual tasks. The knowledge orative filtering framework based on co-clustering.
is shared in the form of a latent cluster-level rating Proc. of the Fifth IEEE Int’l Conf. on Data Mining
model, which is trained on the pooled rating data (pp. 625–628).
from multiple related rating matrices. Each rating ma- Hofmann, T., & Puzicha, J. (1998). Statistical mod-
trix can thus be viewed as drawing a set of users and els for co-occurrence data (Technical Report AIM-
items from the user-item joint mixture model as well 1625). Artifical Intelligence Laboratory, MIT.
as drawing the corresponding ratings from the cluster-
level rating model. The experimental results have val- Hofmann, T., & Puzicha, J. (1999). Latent class mod-
idated that the proposed RMGM indeed can gain ad- els for collaborative filtering. Proc. of the 16th Int’l
ditional useful knowledge by pooling the rating data Joint Conf. on Artificial Intelligence (pp. 688–693).
from multiple related tasks to make these tasks benefit Jebara, T. (2004). Multi-task feature and kernel se-
from one another. lection for SVMs. Proc. of the 21st Int’l Conf. on
In our future work, we will 1) investigate how to statis- Machine Learning (pp. 329–336).
tically quantify the “relatedness” between rating ma- Pennock, D. M., Horvitz, E., Lawrence, S., & Giles,
trices in different domains, and 2) consider an asym- C. L. (2000). Collaborative filtering by personality
metric problem setting where knowledge can be trans- diagnosis: A hybrid memory- and model-based ap-
ferred from a dense auxiliary rating matrix in one do- proach. Proc. of the 16th Conf. on Uncertainty in
main to a sparse target one in another domain. Artificial Intelligence (pp. 473–480).
Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y.
Acknowledgments
(2007). Self-taught learning: Transfer learning from
Bin Li and Qiang Yang are supported by Hong Kong unlabeled data. Proc. of the Int’l Conf. on Machine
CERG Grant 621307; Bin Li and Xiangyang Xue are Learning (pp. 759–766).
supported in part by Shanghai Leading Academic Dis-
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P.,
cipline Project (No. B114) and NSF of China (No.
& Riedl, J. (1994). GroupLens: An open architec-
60873178).
ture for collaborative filtering of netnews. Proc. of
the ACM Conf. on Computer Supported Cooperative
References Work (pp. 175–186).
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Sarwar, B., Karypis, G., Konstan, J., & Riedl, J.
Multi-task feature learning. Advances in Neural In- (2001). Item-based collaborative filtering recom-
formation Processing Systems 19 (pp. 41–48). mendation algorithms. Proc. of the 10th Int’l World
Wide Web Conf. (pp. 285–295).
Baxter, J. (2000). A model of inductive bias learning.
J. of Artificial Intelligence Research, 12, 149–198. Si, L., & Jin, R. (2003). Flexible mixture model for
collaborative filtering. Proc. of the 20th Int’l Conf.
Caruana, R. A. (1997). Multitask learning. Machine on Machine Learning (pp. 704–711).
Learning, 28, 41–75.
Srebro, N., & Jaakkola, T. (2003). Weighted low-rank
Coyle, M., & Smyth, B. (2008). Web search shared: approximations. Proc. of the 20th Int’l Conf. on
Social aspects of a collaborative, community-based Machine Learning (pp. 720–727).
search network. Proc. of the Fifth Int’l Conf. on