Modul Belajar #01
Tgl: 22.05.2025
User-Based Collaborative Filtering
Example of User-Based Collaborative Filtering for Product
Recommendations
This example demonstrates how to build a user-based collaborative
filtering (CF) system for product recommendations using Python. We'll
use a small synthetic dataset for clarity.
1. Dataset Example
Consider an e-commerce dataset where users rate products (1-5 scale).
Here's a sample user-item rating matrix:
Use Product Product Product Product
r A B C D
U1 5 3 4 1
U2 3 1 2 3
U3 4 3 4 3
U4 3 3 1 5
U5 1 5 5 2
2. Key Steps in User-Based CF
(1) Calculate User Similarity (Cosine Similarity)
We measure how similar users are based on their ratings.
Cosine Similarity Formula:
sim(u,v)=∑iru,i⋅rv,i∑iru,i2⋅∑irv,i2sim(u,v)=∑iru,i2⋅∑irv,i2∑iru,i⋅rv,i
Example:
Calculate similarity between U1 and U3:
Both rated Products A, B, C, D.
Dot product: (5×4)+(3×3)+(4×4)+(1×3) = 20 + 9 + 16 + 3 = 48
Magnitude of U1: √(5² + 3² + 4² + 1²) = √(25+9+16+1) = √51 ≈
7.14
Magnitude of U3: √(4² + 3² + 4² + 3²) = √(16+9+16+9) = √50 ≈
7.07
Cosine Similarity = 48 / (7.14 × 7.07) ≈ 0.95 (Highly similar)
(2) Find Nearest Neighbors
For a target user (e.g., U5), compute similarity with all users and pick top-
2 most similar users:
Sim(U5, U1) ≈ 0.32
Sim(U5, U2) ≈ 0.65
Sim(U5, U3) ≈ 0.52
Sim(U5, U4) ≈ 0.22
Top-2 neighbors for U5: U2 (0.65), U3 (0.52)
(3) Predict Ratings for Unseen Products
Predict U5’s rating for Product A (which U5 hasn’t rated):
Prediction=∑(sim(u,v)×rv,Product A)∑∣sim(u,v)∣Prediction=∑∣sim(u,v)∣∑(sim
(u,v)×rv,Product A)
U2’s rating for Product A = 3, similarity = 0.65
U3’s rating for Product A = 4, similarity = 0.52
Prediction = (0.65×3 + 0.52×4) / (0.65 + 0.52) ≈ 3.4
(4) Generate Recommendations
Predict ratings for all products U5 hasn’t rated:
o Product A: 3.4
o Product C: 4.1 (calculated similarly)
Recommend the highest-predicted products: Product C
(4.1) > Product A (3.4)
3. Python Implementation
python
Copy
Download
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# User-item ratings matrix (rows: users, columns: products)
ratings = np.array([
[5, 3, 4, 1],
[3, 1, 2, 3],
[4, 3, 4, 3],
[3, 3, 1, 5],
[1, 5, 5, 2]
])
# Compute user-user similarity matrix
user_sim = cosine_similarity(ratings)
# Example: Find top-2 similar users to U5 (index 4)
target_user = 4
similar_users = np.argsort(user_sim[target_user])[::-1][1:3] # Exclude
self
# Predict U5's rating for Product A (index 0)
numerator = sum(user_sim[target_user, u] * ratings[u, 0] for u in
similar_users)
denominator = sum(user_sim[target_user, u] for u in similar_users)
predicted_rating = numerator / denominator
print(f"Predicted rating for U5 on Product A: {predicted_rating:.2f}")
Output:
Copy
Download
Predicted rating for U5 on Product A: 3.40
4. Practical Considerations
1. Cold Start Problem: New users/items lack ratings. Solutions:
o Hybrid models (combine CF with content-based filtering).
o Use demographic data for new users.
2. Scalability: User-based CF is computationally expensive for large
datasets.
o Switch to item-based CF or matrix factorization (e.g.,
SVD).
3. Real-World Adjustments:
o Incorporate implicit feedback (clicks, cart additions).
o Apply weighting (e.g., time decay for recent purchases).
Conclusion
This example illustrates how user-based collaborative filtering works:
1. Compute user similarities.
2. Find nearest neighbors.
3. Predict ratings for unseen products.
4. Recommend top-scoring items.
For production systems, consider libraries like:
Surprise (scikit-learn for recommender systems)
LightFM (hybrid recommendation models)
Would you like an extended example with a real dataset (e.g., MovieLens)?
Lampiran:
Python
Proses
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# User-item ratings matrix (rows: users, columns: products)
ratings = np.array([
[5, 3, 4, 1],
[3, 1, 2, 3],
[4, 3, 4, 3],
[3, 3, 1, 5],
[1, 5, 5, 2]
])
# Compute user-user similarity matrix
user_sim = cosine_similarity(ratings)
# Example: Find top-2 similar users to U5 (index 4)
target_user = 4
similar_users = np.argsort(user_sim[target_user])[::-1][1:3] # Exclude self
# Predict U5's rating for Product A (index 0)
numerator = sum(user_sim[target_user, u] * ratings[u, 0] for u in
similar_users)
denominator = sum(user_sim[target_user, u] for u in similar_users)
predicted_rating = numerator / denominator
print(f"Predicted rating for U5 on Product A: {predicted_rating:.2f}")
Output
Predicted rating for U5 on Product A: 3.40