Neural collaborative filtering-발표

Neural collaborative
filtering
Xiangnan He et al.
이현성

• preliminary
• Model description
• Experiment
• Discussion and conclusion

Linearity
• 𝑓 𝑎𝑥 = 𝑎𝑓 𝑥
•
𝑑𝑓 𝑥
𝑑𝑥
= 𝑐 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
ex)
𝑓 = 𝑎𝑥
(Plane passing the origin)
• 𝑓 𝑎𝑥1, … , 𝑥 𝑘 = 𝑓 𝑎𝑥1, 𝑎𝑥2, … . , 𝑎𝑥 𝑘 =
𝑓 𝑎𝒙 = 𝑎𝑓 𝑥1, … 𝑥 𝑘
•
𝜕𝑓
𝜕x 𝑘
= 𝑐 𝑘 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
ex)
𝑓 𝑎, 𝑏 =
𝑖
𝑤𝑖 𝑎𝑖 𝑏𝑖
(linear combination, inner product)
Any other functions, such as 𝑓 = 𝑥2
, 𝑓 𝑥1, 𝑥2 =
𝑥1
|𝑥2|
+
𝑥2
|𝑥1|
are non-linear function(mapping)

User/Item embedding
• 𝑉 𝑇 𝑥 = 𝑣𝑖 where 𝑥 is one-hot vector and 𝑣𝑖 is ith row vector
of 𝑉
• Like embedding one hot words, users and items can be
represented as dense vectors
• 𝑉 ≔ 𝑈 𝑏𝑦 𝑘 𝑚𝑎𝑡𝑟𝑖𝑥, 𝑤ℎ𝑒𝑟𝑒 𝑈 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑠𝑒𝑟𝑠 𝑉 𝑇
• 𝑢𝑖 ≔ 𝑙𝑒𝑛𝑔𝑡ℎ 𝑈 𝑣𝑒𝑐𝑡𝑜𝑟, 𝑎𝑙𝑙 0 𝑏𝑢𝑡 𝑖′ 𝑡ℎ 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑖𝑠 1

Description of the model
• Represent users and items as a dense vector representation by embedding
model
• It can be thought as latent factorization like SVD or embedding usually used in MLP
• Embedded vector, Latent vector, factorized vector…
Two models are proposed in the paper
• Linear model (Generalized Matrix Factorization Model)
• linear combination of the user and item vector
• Non-linear model (Multilayer Perceptron Model)
• Multi-layer perceptron

How to represent user or item as latent
vector?
User U1 U2 U3 U4
𝑢1 1 0 0 0
𝑢2 0 1 0 0
𝑃 = 𝑘 𝑏𝑦 𝑈 matrix , where 𝑈 is the set of users
𝑃 𝑇 𝑢1 = 𝑝1 where 𝑝1 is the first row-vector of matrix 𝑃
We can also represent items as a dense vector representation by 𝑘 𝑏𝑦 𝐼 matrix where 𝐼 is the set
of items.

filtering framework
• Input 𝑥 𝑗 ≔ 𝑢 𝑗 , 𝑖 𝑗
• Output 𝑦 𝑢𝑖 ≔ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∈ 0,1
• Objective J = log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)

Description of input and output
• User vector 𝑢 𝑗
= 0,0, … 1,0,0, … 0 𝑇
𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝑈| is one-hot encoded vector
• Item vector 𝑖 𝑗 = 0,0, … 1,0,0, … 0 𝑇 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝐼| is one-hot encoded vector
Where U, I are the set of Users and Items, respectively.
• 𝑡𝑎𝑟𝑔𝑒𝑡 𝑦 𝑢𝑖 ∶ Whether the user 𝑢 have interaction with item 𝑖 .
𝑦 𝑢𝑖 ∈ 0,1
• 𝑦 𝑢𝑖 ∶ The probability whether user 𝑢 will have interaction with item 𝑖.
𝑦 𝑢𝑖 ∈ 0,1
• Objective argmion J Θ, 𝑈, 𝐼 ≔ 𝑢,𝑖∈(𝑈,𝐼) log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)

Generalized Matrix
Factorization
• Let the latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘
• Let the latent vector of length kfor item 𝑖 to be 𝑞𝑖𝑘
𝜙 𝑢 𝑢, 𝑖𝑖 =
𝑡=0
𝑘
ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
Where ℎ 𝑡 is the weight of element−wise multiplication
between 𝑝 𝑢𝑘 and 𝑞𝑖𝑘.
Output 𝑦 𝑢𝑖 =
𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎
𝑡=0
𝑘
where 𝜎 is the sigmoid function.
Generalized Matrix
Factorization Layer
Element-wise product

filtering framework
• Let the latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘
• Let the latent vector of length k for item 𝑖 to be 𝑞𝑖𝑘
𝑧1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑝 𝑢𝑘, 𝑞𝑖𝑘)
𝜙2 𝑧1 = 𝛼2 𝑊2
𝑇
𝑧1 + 𝑏 + 2
… …
𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿
𝑇
𝑧 𝐿−1 + 𝑏 𝐿
Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇
𝜙 𝐿 𝑧 𝐿−1
Where 𝑊𝑙, 𝑏𝑙 𝑎𝑛𝑑 𝛼𝑙 denote the weight matrix,
bias vector, and activation of the layer 𝑙.
ReLU function is used as an activation function.

Neural network Model
• It is the Combination of two model just expla
ined before.
For GMF,
𝜙 𝐺𝑀𝐹
≔ 𝜙 𝑢 𝑢, 𝑖𝑖 =
𝑡=0
𝑘
Where ℎ 𝑡 is the weight of element-wise multiplication between 𝑝 𝑢𝑘 and
𝑞𝑖𝑘.
For MLP,
𝜙 𝑀𝐿𝑃 ≔ 𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿
𝑇
𝑧 𝐿−1 + 𝑏 𝐿
Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇
𝑐𝑜𝑛𝑐𝑎𝑡 𝜙 𝐺𝑀𝐹
, 𝜙 𝑀𝐿𝑃
)

Pretraining
• Proposed NeuMF is an emssemble of GMF and MLP, we can
initialize NeuMF using the pretrained GMF model and MLP
model.
• 𝑦 𝑀𝐿𝑃
𝑢𝑖 = 𝜎 (ℎ 𝑀𝐿𝑃) 𝑇 𝜙 𝐿 𝑧 𝐿−1
• 𝑦 𝐺𝑀𝐹
𝑢𝑖 = 𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎 𝑡=0
𝑘
ℎ 𝑡
𝐺𝑀𝐹
𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
• We can reuse ℎ 𝑀𝐿𝑃
and ℎ 𝐺𝑀𝐹
also in ensemble model.

Evaluation protocols
Leave-one-out evaluation Evaluation algorithm
1) Sample 100 items that are
not interacted by the user.
2) Rank these 100 items and
make top-K list
3) Whether check if the
item(in users’ last interaction)
is in the top-10 list
user, Item
U1 I1
U1 I3
U1 I2
U1 I5
U1 I5
U1 I4
U1 I6
Used as trainset
Used as test-item

Evalution
Hit ratio
#𝑢𝑠𝑒𝑟 𝑠. 𝑡. 𝑡𝑒𝑠𝑡𝑠𝑒𝑡 ∈ 𝑡𝑜𝑝 𝑘 𝑙𝑖𝑠𝑡
#𝑢𝑠𝑒𝑟𝑠
Normalized Discounted
Cumulative Gain
i𝑓 𝑡𝑒𝑠𝑡 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑝𝑘 𝑙𝑖𝑠𝑡
log 2
log 𝑖 + 1
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0

Increasing latent vector size
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39
0.4
0.41
0.42
8 16 32 64
NDCG
계열 1
Latent vector size
0.58
0.6
0.62
0.64
0.66
0.68
0.7
8 16 32 64
Hit ratio
GMF

Hit ratio/NDCG for top-K
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
@5 @10 @15 @20 @25 @30
Hit Ratio, epoch = 25
MF MLP-3 layers NeuMF(MF + MLP-3 layers)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
@5 @10 @15 @20 @25 @30
NDCG, epoch = 25
MF MLP-2 layers NeuMF(MF + MLP-2 layers)
Latent vector size = 32

Evaluation 2
• MovieLens 1m data
• 6040 users, 3953 items
• [1,2,3,4,5] five scale explicit rating
• 5-fold cross validation
Five-fold
averages
neuMF (15
iters)
SVD++(100 iters)
(average of Two)
SVD (100 iters) KNN
RMSE 0.8590 0.8436 0.8741 0.8948
MAE 0.6728 0.6677 0.6860 0.7062

Neural collaborative filtering-발표

More Related Content

What's hot

Similar to Neural collaborative filtering-발표

More from hyunsung lee

Recently uploaded

Neural collaborative filtering-발표