Neural collaborative
filtering
Xiangnan He et al.
이현성
• preliminary
• Model description
• Experiment
• Discussion and conclusion
preliminary
Linearity
Linearity
• 𝑓 𝑎𝑥 = 𝑎𝑓 𝑥
•
𝑑𝑓 𝑥
𝑑𝑥
= 𝑐 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
ex)
𝑓 = 𝑎𝑥
(Plane passing the origin)
• 𝑓 𝑎𝑥1, … , 𝑥 𝑘 = 𝑓 𝑎𝑥1, 𝑎𝑥2, … . , 𝑎𝑥 𝑘 =
𝑓 𝑎𝒙 = 𝑎𝑓 𝑥1, … 𝑥 𝑘
•
𝜕𝑓
𝜕x 𝑘
= 𝑐 𝑘 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡
ex)
𝑓 𝑎, 𝑏 =
𝑖
𝑤𝑖 𝑎𝑖 𝑏𝑖
(linear combination, inner product)
Any other functions, such as 𝑓 = 𝑥2
, 𝑓 𝑥1, 𝑥2 =
𝑥1
|𝑥2|
+
𝑥2
|𝑥1|
are non-linear function(mapping)
User/Item embedding
• 𝑉 𝑇 𝑥 = 𝑣𝑖 where 𝑥 is one-hot vector and 𝑣𝑖 is ith row vector
of 𝑉
• Like embedding one hot words, users and items can be
represented as dense vectors
• 𝑉 ≔ 𝑈 𝑏𝑦 𝑘 𝑚𝑎𝑡𝑟𝑖𝑥, 𝑤ℎ𝑒𝑟𝑒 𝑈 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑠𝑒𝑟𝑠 𝑉 𝑇
• 𝑢𝑖 ≔ 𝑙𝑒𝑛𝑔𝑡ℎ 𝑈 𝑣𝑒𝑐𝑡𝑜𝑟, 𝑎𝑙𝑙 0 𝑏𝑢𝑡 𝑖′ 𝑡ℎ 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑖𝑠 1
Model description
Description of the model
• Represent users and items as a dense vector representation by embedding
model
• It can be thought as latent factorization like SVD or embedding usually used in MLP
• Embedded vector, Latent vector, factorized vector…
Two models are proposed in the paper
• Linear model (Generalized Matrix Factorization Model)
• linear combination of the user and item vector
• Non-linear model (Multilayer Perceptron Model)
• Multi-layer perceptron
How to represent user or item as latent
vector?
User U1 U2 U3 U4
𝑢1 1 0 0 0
𝑢2 0 1 0 0
𝑃 = 𝑘 𝑏𝑦 𝑈 matrix , where 𝑈 is the set of users
𝑃 𝑇 𝑢1 = 𝑝1 where 𝑝1 is the first row-vector of matrix 𝑃
We can also represent items as a dense vector representation by 𝑘 𝑏𝑦 𝐼 matrix where 𝐼 is the set
of items.
Neural collaborative
filtering framework
• Input 𝑥 𝑗 ≔ 𝑢 𝑗 , 𝑖 𝑗
• Output 𝑦 𝑢𝑖 ≔ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∈ 0,1
• Objective J = log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)
Description of input and output
• User vector 𝑢 𝑗
= 0,0, … 1,0,0, … 0 𝑇
𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝑈| is one-hot encoded vector
• Item vector 𝑖 𝑗 = 0,0, … 1,0,0, … 0 𝑇 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝐼| is one-hot encoded vector
Where U, I are the set of Users and Items, respectively.
• 𝑡𝑎𝑟𝑔𝑒𝑡 𝑦 𝑢𝑖 ∶ Whether the user 𝑢 have interaction with item 𝑖 .
𝑦 𝑢𝑖 ∈ 0,1
• 𝑦 𝑢𝑖 ∶ The probability whether user 𝑢 will have interaction with item 𝑖.
𝑦 𝑢𝑖 ∈ 0,1
• Objective argmion J Θ, 𝑈, 𝐼 ≔ 𝑢,𝑖∈(𝑈,𝐼) log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)
Generalized Matrix
Factorization
• Let the latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘
• Let the latent vector of length kfor item 𝑖 to be 𝑞𝑖𝑘
𝜙 𝑢 𝑢, 𝑖𝑖 =
𝑡=0
𝑘
ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
Where ℎ 𝑡 is the weight of element−wise multiplication
between 𝑝 𝑢𝑘 and 𝑞𝑖𝑘.
Output 𝑦 𝑢𝑖 =
𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎
𝑡=0
𝑘
ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
where 𝜎 is the sigmoid function.
Generalized Matrix
Factorization Layer
Element-wise product
Neural collaborative
filtering framework
• Let the latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘
• Let the latent vector of length k for item 𝑖 to be 𝑞𝑖𝑘
𝑧1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑝 𝑢𝑘, 𝑞𝑖𝑘)
𝜙2 𝑧1 = 𝛼2 𝑊2
𝑇
𝑧1 + 𝑏 + 2
… …
𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿
𝑇
𝑧 𝐿−1 + 𝑏 𝐿
Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇
𝜙 𝐿 𝑧 𝐿−1
Where 𝑊𝑙, 𝑏𝑙 𝑎𝑛𝑑 𝛼𝑙 denote the weight matrix,
bias vector, and activation of the layer 𝑙.
ReLU function is used as an activation function.
Neural network Model
• It is the Combination of two model just expla
ined before.
For GMF,
𝜙 𝐺𝑀𝐹
≔ 𝜙 𝑢 𝑢, 𝑖𝑖 =
𝑡=0
𝑘
ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
Where ℎ 𝑡 is the weight of element-wise multiplication between 𝑝 𝑢𝑘 and
𝑞𝑖𝑘.
For MLP,
𝜙 𝑀𝐿𝑃 ≔ 𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿
𝑇
𝑧 𝐿−1 + 𝑏 𝐿
Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇
𝑐𝑜𝑛𝑐𝑎𝑡 𝜙 𝐺𝑀𝐹
, 𝜙 𝑀𝐿𝑃
)
Pretraining
• Proposed NeuMF is an emssemble of GMF and MLP, we can
initialize NeuMF using the pretrained GMF model and MLP
model.
• 𝑦 𝑀𝐿𝑃
𝑢𝑖 = 𝜎 (ℎ 𝑀𝐿𝑃) 𝑇 𝜙 𝐿 𝑧 𝐿−1
• 𝑦 𝐺𝑀𝐹
𝑢𝑖 = 𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎 𝑡=0
𝑘
ℎ 𝑡
𝐺𝑀𝐹
𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡
• We can reuse ℎ 𝑀𝐿𝑃
and ℎ 𝐺𝑀𝐹
also in ensemble model.
Experiment
Evaluation protocols
Leave-one-out evaluation Evaluation algorithm
1) Sample 100 items that are
not interacted by the user.
2) Rank these 100 items and
make top-K list
3) Whether check if the
item(in users’ last interaction)
is in the top-10 list
user, Item
U1 I1
U1 I3
U1 I2
U1 I5
U1 I5
U1 I4
U1 I6
Used as trainset
Used as test-item
Evalution
Hit ratio
#𝑢𝑠𝑒𝑟 𝑠. 𝑡. 𝑡𝑒𝑠𝑡𝑠𝑒𝑡 ∈ 𝑡𝑜𝑝 𝑘 𝑙𝑖𝑠𝑡
#𝑢𝑠𝑒𝑟𝑠
Normalized Discounted
Cumulative Gain
i𝑓 𝑡𝑒𝑠𝑡 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑝𝑘 𝑙𝑖𝑠𝑡
log 2
log 𝑖 + 1
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0
Increasing latent vector size
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39
0.4
0.41
0.42
8 16 32 64
NDCG
계열 1
Latent vector size
0.58
0.6
0.62
0.64
0.66
0.68
0.7
8 16 32 64
Hit ratio
GMF
Hit ratio/NDCG for top-K
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
@5 @10 @15 @20 @25 @30
Hit Ratio, epoch = 25
MF MLP-3 layers NeuMF(MF + MLP-3 layers)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
@5 @10 @15 @20 @25 @30
NDCG, epoch = 25
MF MLP-2 layers NeuMF(MF + MLP-2 layers)
Latent vector size = 32
Evaluation 2
• MovieLens 1m data
• 6040 users, 3953 items
• [1,2,3,4,5] five scale explicit rating
• 5-fold cross validation
Five-fold
averages
neuMF (15
iters)
SVD++(100 iters)
(average of Two)
SVD (100 iters) KNN
RMSE 0.8590 0.8436 0.8741 0.8948
MAE 0.6728 0.6677 0.6860 0.7062

Neural collaborative filtering-발표

  • 1.
  • 2.
    • preliminary • Modeldescription • Experiment • Discussion and conclusion
  • 3.
  • 4.
    Linearity • 𝑓 𝑎𝑥= 𝑎𝑓 𝑥 • 𝑑𝑓 𝑥 𝑑𝑥 = 𝑐 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ex) 𝑓 = 𝑎𝑥 (Plane passing the origin) • 𝑓 𝑎𝑥1, … , 𝑥 𝑘 = 𝑓 𝑎𝑥1, 𝑎𝑥2, … . , 𝑎𝑥 𝑘 = 𝑓 𝑎𝒙 = 𝑎𝑓 𝑥1, … 𝑥 𝑘 • 𝜕𝑓 𝜕x 𝑘 = 𝑐 𝑘 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ex) 𝑓 𝑎, 𝑏 = 𝑖 𝑤𝑖 𝑎𝑖 𝑏𝑖 (linear combination, inner product) Any other functions, such as 𝑓 = 𝑥2 , 𝑓 𝑥1, 𝑥2 = 𝑥1 |𝑥2| + 𝑥2 |𝑥1| are non-linear function(mapping)
  • 5.
    User/Item embedding • 𝑉𝑇 𝑥 = 𝑣𝑖 where 𝑥 is one-hot vector and 𝑣𝑖 is ith row vector of 𝑉 • Like embedding one hot words, users and items can be represented as dense vectors • 𝑉 ≔ 𝑈 𝑏𝑦 𝑘 𝑚𝑎𝑡𝑟𝑖𝑥, 𝑤ℎ𝑒𝑟𝑒 𝑈 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑠𝑒𝑟𝑠 𝑉 𝑇 • 𝑢𝑖 ≔ 𝑙𝑒𝑛𝑔𝑡ℎ 𝑈 𝑣𝑒𝑐𝑡𝑜𝑟, 𝑎𝑙𝑙 0 𝑏𝑢𝑡 𝑖′ 𝑡ℎ 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑖𝑠 1
  • 6.
  • 7.
    Description of themodel • Represent users and items as a dense vector representation by embedding model • It can be thought as latent factorization like SVD or embedding usually used in MLP • Embedded vector, Latent vector, factorized vector… Two models are proposed in the paper • Linear model (Generalized Matrix Factorization Model) • linear combination of the user and item vector • Non-linear model (Multilayer Perceptron Model) • Multi-layer perceptron
  • 8.
    How to representuser or item as latent vector? User U1 U2 U3 U4 𝑢1 1 0 0 0 𝑢2 0 1 0 0 𝑃 = 𝑘 𝑏𝑦 𝑈 matrix , where 𝑈 is the set of users 𝑃 𝑇 𝑢1 = 𝑝1 where 𝑝1 is the first row-vector of matrix 𝑃 We can also represent items as a dense vector representation by 𝑘 𝑏𝑦 𝐼 matrix where 𝐼 is the set of items.
  • 9.
    Neural collaborative filtering framework •Input 𝑥 𝑗 ≔ 𝑢 𝑗 , 𝑖 𝑗 • Output 𝑦 𝑢𝑖 ≔ 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 ∈ 0,1 • Objective J = log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)
  • 10.
    Description of inputand output • User vector 𝑢 𝑗 = 0,0, … 1,0,0, … 0 𝑇 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝑈| is one-hot encoded vector • Item vector 𝑖 𝑗 = 0,0, … 1,0,0, … 0 𝑇 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ |𝐼| is one-hot encoded vector Where U, I are the set of Users and Items, respectively. • 𝑡𝑎𝑟𝑔𝑒𝑡 𝑦 𝑢𝑖 ∶ Whether the user 𝑢 have interaction with item 𝑖 . 𝑦 𝑢𝑖 ∈ 0,1 • 𝑦 𝑢𝑖 ∶ The probability whether user 𝑢 will have interaction with item 𝑖. 𝑦 𝑢𝑖 ∈ 0,1 • Objective argmion J Θ, 𝑈, 𝐼 ≔ 𝑢,𝑖∈(𝑈,𝐼) log 𝑙𝑜𝑠𝑠(𝑦 𝑢𝑖, 𝑦 𝑢𝑖)
  • 11.
    Generalized Matrix Factorization • Letthe latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘 • Let the latent vector of length kfor item 𝑖 to be 𝑞𝑖𝑘 𝜙 𝑢 𝑢, 𝑖𝑖 = 𝑡=0 𝑘 ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡 Where ℎ 𝑡 is the weight of element−wise multiplication between 𝑝 𝑢𝑘 and 𝑞𝑖𝑘. Output 𝑦 𝑢𝑖 = 𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎 𝑡=0 𝑘 ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡 where 𝜎 is the sigmoid function. Generalized Matrix Factorization Layer Element-wise product
  • 12.
    Neural collaborative filtering framework •Let the latent vector of length 𝑘 for user u to be 𝑝 𝑢𝑘 • Let the latent vector of length k for item 𝑖 to be 𝑞𝑖𝑘 𝑧1 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑝 𝑢𝑘, 𝑞𝑖𝑘) 𝜙2 𝑧1 = 𝛼2 𝑊2 𝑇 𝑧1 + 𝑏 + 2 … … 𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿 𝑇 𝑧 𝐿−1 + 𝑏 𝐿 Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇 𝜙 𝐿 𝑧 𝐿−1 Where 𝑊𝑙, 𝑏𝑙 𝑎𝑛𝑑 𝛼𝑙 denote the weight matrix, bias vector, and activation of the layer 𝑙. ReLU function is used as an activation function.
  • 13.
    Neural network Model •It is the Combination of two model just expla ined before. For GMF, 𝜙 𝐺𝑀𝐹 ≔ 𝜙 𝑢 𝑢, 𝑖𝑖 = 𝑡=0 𝑘 ℎ 𝑡 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡 Where ℎ 𝑡 is the weight of element-wise multiplication between 𝑝 𝑢𝑘 and 𝑞𝑖𝑘. For MLP, 𝜙 𝑀𝐿𝑃 ≔ 𝜙 𝐿 𝑧 𝐿−1 = 𝛼 𝐿 𝑊𝐿 𝑇 𝑧 𝐿−1 + 𝑏 𝐿 Output 𝑦 𝑢𝑖 = 𝜎(ℎ 𝑇 𝑐𝑜𝑛𝑐𝑎𝑡 𝜙 𝐺𝑀𝐹 , 𝜙 𝑀𝐿𝑃 )
  • 14.
    Pretraining • Proposed NeuMFis an emssemble of GMF and MLP, we can initialize NeuMF using the pretrained GMF model and MLP model. • 𝑦 𝑀𝐿𝑃 𝑢𝑖 = 𝜎 (ℎ 𝑀𝐿𝑃) 𝑇 𝜙 𝐿 𝑧 𝐿−1 • 𝑦 𝐺𝑀𝐹 𝑢𝑖 = 𝜎 𝜙 𝑢𝑖, 𝑖𝑖 = 𝜎 𝑡=0 𝑘 ℎ 𝑡 𝐺𝑀𝐹 𝑝 𝑢𝑘 𝑡 𝑞𝑖𝑘 𝑡 • We can reuse ℎ 𝑀𝐿𝑃 and ℎ 𝐺𝑀𝐹 also in ensemble model.
  • 15.
  • 16.
    Evaluation protocols Leave-one-out evaluationEvaluation algorithm 1) Sample 100 items that are not interacted by the user. 2) Rank these 100 items and make top-K list 3) Whether check if the item(in users’ last interaction) is in the top-10 list user, Item U1 I1 U1 I3 U1 I2 U1 I5 U1 I5 U1 I4 U1 I6 Used as trainset Used as test-item
  • 17.
    Evalution Hit ratio #𝑢𝑠𝑒𝑟 𝑠.𝑡. 𝑡𝑒𝑠𝑡𝑠𝑒𝑡 ∈ 𝑡𝑜𝑝 𝑘 𝑙𝑖𝑠𝑡 #𝑢𝑠𝑒𝑟𝑠 Normalized Discounted Cumulative Gain i𝑓 𝑡𝑒𝑠𝑡 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑝𝑘 𝑙𝑖𝑠𝑡 log 2 log 𝑖 + 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0
  • 18.
    Increasing latent vectorsize 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 8 16 32 64 NDCG 계열 1 Latent vector size 0.58 0.6 0.62 0.64 0.66 0.68 0.7 8 16 32 64 Hit ratio GMF
  • 19.
    Hit ratio/NDCG fortop-K 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 @5 @10 @15 @20 @25 @30 Hit Ratio, epoch = 25 MF MLP-3 layers NeuMF(MF + MLP-3 layers) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 @5 @10 @15 @20 @25 @30 NDCG, epoch = 25 MF MLP-2 layers NeuMF(MF + MLP-2 layers) Latent vector size = 32
  • 20.
    Evaluation 2 • MovieLens1m data • 6040 users, 3953 items • [1,2,3,4,5] five scale explicit rating • 5-fold cross validation Five-fold averages neuMF (15 iters) SVD++(100 iters) (average of Two) SVD (100 iters) KNN RMSE 0.8590 0.8436 0.8741 0.8948 MAE 0.6728 0.6677 0.6860 0.7062