0% found this document useful (0 votes)

17 views48 pages

Rec Sys CF

The document introduces recommender systems, which suggest items to users based on their preferences and behaviors, aiming to enhance user experience and increase sales. It discusses various paradigms such as collaborative filtering, content-based, and hybrid approaches, highlighting their strengths and challenges, including data sparsity and cold start problems. The document also covers algorithmic techniques and evaluation criteria for the effectiveness of these systems.

Uploaded by

Vivek Gurjar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views48 pages

Rec Sys CF

Uploaded by

Vivek Gurjar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Introduction to Recommender

Systems

Sukomal Pal
IIT (BHU)
[email protected]
Some problems we face online
Which digital camera should I buy?
What is the best holiday for me and
my family?
Which is the best investment for supporting the education of my
children?
Which movie should I rent?
Which web sites will I find interesting?

Which book should I buy for my next vacation?

Which degree and university
are the best for my future?
Introduction
●Systems for recommending items (e.g., books,
movies, CD’s, web pages, newsgroup messages) to
users based on examples of their preferences.
●Objectives:
●To propose objects fitting the user needs/wishes
●To sell services (site visits) or goods
●Many search engines and online stores provide
recommendations (e.g. Google Playstore, Amazon,
Netflix, YouTube)
●Recommenders have been shown to substantially
increase clicks (and sales)
Problem domain

- 5-
Purpose and success criteria (1)

▪Different perspectives/aspects
–Depends on domain and purpose
–No holistic evaluation scenario exists

▪Retrieval perspective
–Reduce search costs
–Provide "correct" proposals
–Users know in advance what they want

▪Recommendation perspective
–Serendipity – identify items from the Long Tail
–Users did not know about existence

- 6-
When does a RS do its job well?

▪"Recommend widely
unknown items that users
might actually like!"

Recommend items ▪20% of items

from the long tail
accumulate 74% of all
positive ratings

▪Items rated > 3 in

MovieLens 100K dataset

- 7-
Purpose and success criteria

▪Prediction perspective
–Predict to what degree users like an item
–Most popular evaluation scenario in research

▪Interaction perspective
–Give users a "good feeling"
–Educate users about the product domain
–Convince/persuade users - explain

▪Finally, conversion perspective

–Commercial situations
–Increase "hit", "clickthrough", "lookers to bookers" rates
–Optimize sales margins and profit

- 8-
Recommender Systems

- 9-
Recommender systems

- 10 -
Paradigms of recommender systems

Recommender systems reduce

information overload by estimating
relevance

- 11 -
Paradigms of recommender systems

Personalized recommendations

- 12 -
Paradigms of recommender systems

Content-based: "Show me more of the

same what I've liked"

- 13 -
Paradigms of recommender systems

Collaborative: "Tell me what's popular

among my peers"

- 14 -
Paradigms of recommender systems

Knowledge-based: "Tell me what fits

based on my needs"

- 15 -
Paradigms of recommender systems

Hybrid: combinations of various inputs

and/or composition of different
mechanism

- 16 -
- 17 -
Collaborative Filtering (CF)

▪ The most prominent approach to generate recommendations

– used by large, commercial e-commerce sites
– well-understood, various algorithms and variations exist
– applicable in many domains (book, movies, DVDs, ..)

▪ Approach
– use the "wisdom of the crowd" to recommend items

▪ Basic assumption and idea

– Users give ratings to catalog items (implicitly or explicitly)
– Customers who had similar tastes in the past, will have similar tastes in the
future

- 19 -
Pure CF Approaches

▪ Input
– Only a matrix of given user–item ratings

▪ Output types
– A (numerical) prediction indicating to what degree the current user will like
or dislike a certain item
– A top-N list of recommended items

- 20 -
User-based nearest-neighbor collaborative filtering (1)

- 21 -
User-based nearest-neighbor collaborative filtering (2)

▪ Example
– A database of ratings of the current user, Alice, and some other users is
given:
Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

– Determine whether Alice will like or dislike Item5, which Alice has not yet
rated or seen

- 22 -
User-based nearest-neighbor collaborative filtering (3)

▪ Some first questions

– How do we measure similarity?
– How many neighbors should we consider?
– How do we generate a prediction from the neighbors' ratings?

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

- 23 -
Measuring user similarity (1)

- 24 -
Measuring user similarity (2)

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3 Sim = 0.85
User2 4 3 4 3 5 Sim = 0.00
User3 3 3 1 5 4 Sim = 0.70
User4 1 5 5 2 1 Sim = -0.79

- 25 -
Pearson correlation

▪ Takes differences in rating behavior into account

▪ Works well in usual domains, compared with alternative measures

– such as cosine similarity

- 26 -
Making predictions

- 27 -
Improving the metrics / prediction function

▪ Not all neighbor ratings might be equally "valuable"

– Agreement on commonly liked items is not so informative as agreement on
controversial items
– Possible solution: Give more weight to items that have a higher variance

▪ Value of number of co-rated items

– Use "significance weighting", by e.g., linearly reducing the weight when the
number of co-rated items is low
▪ Case amplification
– Intuition: Give more weight to "very similar" neighbors, i.e., where the
similarity value is close to 1.
▪ Neighborhood selection
– Use similarity threshold or fixed number of neighbors

- 28 -
Memory-based and model-based approaches

▪ User-based CF is said to be "memory-based"

– the rating matrix is directly used to find neighbors / make predictions
– does not scale for most real-world scenarios
– large e-commerce sites have tens of millions of customers and millions of items

▪ Model-based approaches
– based on an offline pre-processing or "model-learning"phase
– at run-time, only the learned model is used to make predictions
– models are updated / re-trained periodically
– large variety of techniques used
– model-building and updating can be computationally expensive
– item-based CF is an example for model-based approaches

- 29 -
Item-based collaborative filtering

▪ Basic idea:
– Use the similarity between items (and not users) to make predictions

▪ Example:
– Look for items that are similar to Item5
– Take Alice's ratings for these items to predict the rating for Item5

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

- 30 -
The cosine similarity measure

- 31 -
Making predictions

▪ A common prediction function:

▪ Neighborhood size is typically also limited to a specific size

▪ Not all neighbors are taken into account for the prediction
▪ An analysis of the MovieLens dataset indicates that "in most real-world
situations, a neighborhood of 20 to 50 neighbors seems reasonable"
(Herlocker et al. 2002)

- 32 -
USER-USER: User-based Algorithm

- 33 -
ITEM-ITEM: Mean-centering for Cosine Similarity computation

- 34 -
Predicted ratings

Using raw-ratings Using Mean-centered ratings

- 35 -
Pre-processing for item-based filtering
▪ Item-based filtering does not solve the scalability problem itself
▪ Pre-processing approach by Amazon.com (in 2003)
– Calculate all pair-wise item similarities in advance
– The neighborhood to be used at run-time is typically rather small, because only items
are taken into account which the user has rated
– Item similarities are supposed to be more stable than user similarities

▪ Memory requirements
– Up to N2 pair-wise similarities to be memorized (N = number of items) in theory
– In practice, this is significantly lower (items with no co-ratings)
– Further reductions possible
▪ Minimum threshold for co-ratings
▪ Limit the neighborhood size (might affect recommendation accuracy)

- 36 -
More on ratings – Explicit ratings
▪ Probably the most precise ratings

▪ Most commonly used (1 to 5, 1 to 7 Likert response scales)

▪ Research topics
– Optimal granularity of scale; indication that 10-point scale is better accepted in movie dom.
– An even more fine-grained scale was chosen in the joke recommender discussed by Goldberg et al.
(2001), where a continuous scale (from −10 to +10) and a graphical input bar were used
▪ No precision loss from the discretization
▪ User preferences can be captured at a finer granularity
▪ Users actually "like" the graphical interaction method
– Multidimensional ratings (multiple ratings per movie such as ratings for actors and sound)

▪ Main problems
– Users not always willing to rate many items
▪ number of available ratings could be too small → sparse rating matrices → poor recommendation
quality
– How to stimulate users to rate more items?

- 37 -
More on ratings – Implicit ratings
▪ Typically collected by the web shop or application in which the recommender system is
embedded
▪ When a customer buys an item, for instance, many recommender systems interpret this
behavior as a positive rating
▪ Clicks, page views, time spent on some page, demo downloads …
▪ Implicit ratings can be collected constantly and do not require additional efforts from the side
of the user
▪ Main problem
– One cannot be sure whether the user behavior is correctly interpreted
– For example, a user might not like all the books he or she has bought; the user also might
have bought a book for someone else
▪ Implicit ratings can be used in addition to explicit ones; question of correctness of
interpretation

- 38 -
Data sparsity problems

▪ Cold start problem

– How to recommend new items? What to recommend to new users?

▪ Straightforward approaches
– Ask/force users to rate a set of items
– Use another method (e.g., content-based, demographic or simply non-personalized) in
the initial phase
– Default voting: assign default values to items that only one of the two users to be
compared has rated (Breese et al. 1998)
▪ Alternatives
– Use better algorithms (beyond nearest-neighbor approaches)
– Example:
▪ In nearest-neighbor approaches, the set of sufficiently similar neighbors might be
too small to make good predictions
▪ Assume "transitivity" of neighborhoods

- 39 -
Example algorithms for sparse datasets

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
sim = 0.85
User1 3 1 2 3 ?
Predict
User2 4 3 4 3 5 rating for
User3 3 3 1 5 4 User1
User4 1 5 5 2 1
- 40 -
Graph-based methods (1)
▪ "Spreading activation" (Huang et al. 2004)
– Exploit the supposed "transitivity" of customer tastes and thereby augment the matrix
with additional information
– Assume that we are looking for a recommendation for User1
– When using a standard CF approach, User2 will be considered a peer for User1 because
they both bought Item2 and Item4
– Thus Item3 will be recommended to User1 because the nearest neighbor, User2, also
bought or liked it

- 41 -
Graph-based methods (2)
▪ "Spreading activation" (Huang et al. 2004)
– In a standard user-based or item-based CF approach, paths of length 3 will be
considered – that is, Item3 is relevant for User1 because there exists a three-step path
(User1–Item2–User2–Item3) between them
– Because the number of such paths of length 3 is small in sparse rating databases, the
idea is to also consider longer paths (indirect associations) to compute
recommendations
– Using path length 5, for instance

- 42 -
Graph-based methods (3)

▪ "Spreading activation" (Huang et al. 2004)

– Idea: Use paths of lengths > 3
to recommend items
– Length 3: Recommend Item3 to User1
– Length 5: Item1 also recommendable

- 43 -
Collaborative Filtering Issues
▪ Pros:
– well-understood, works well in some domains, no knowledge engineering required

▪ Cons:
– requires user community, sparsity problems, no integration of other knowledge sources, no
explanation of results

▪ What is the best CF method?

– In which situation and which domain? Inconsistent findings; always the same domains and
data sets; differences between methods are often very small (1/100)

▪ How to evaluate the prediction quality?

– MAE / RMSE: What does an MAE of 0.7 actually mean?
– Serendipity (novelty and surprising effect of recommendations)
▪ Not yet fully understood

▪ What about multi-dimensional ratings?

- 44 -
The Google News personalization engine

- 45 -
Google News portal (1)

▪ Aggregates news articles from several thousand sources

▪ Displays them to signed-in users in a personalized way
▪ Collaborative recommendation approach based on
– the click history of the active user and
– the history of the larger community

▪ Main challenges
– Vast number of articles and users
– Generate recommendation list in real time (at most one second)
– Constant stream of new items
– Immediately react to user interaction
▪ Significant efforts with respect to algorithms, engineering, and
parallelization are required

- 46 -
Google News portal (2)
▪ Pure memory-based approaches are not directly applicable and for model-
based approaches, the problem of continuous model updates must be
solved
▪ A combination of model- and memory-based techniques is used
▪ Model-based part: Two clustering techniques are used
– Probabilistic Latent Semantic Indexing (PLSI) as proposed by (Hofmann 2004)
– MinHash as a hashing method

▪ Memory-based part: Analyze story co-visits for dealing with new users
▪ Google's MapReduce technique is used for parallelization in order to make
computation scalable

- 47 -
Discussion & summary

▪ In contrast to collaborative approaches, content-based techniques do not require

user community in order to work
▪ Presented approaches aim to learn a model of user's interest preferences based on
explicit or implicit feedback
– Deriving implicit feedback from user behavior can be problematic

▪ Evaluations show that a good recommendation accuracy can be achieved with help of
machine learning techniques
– These techniques do not require a user community

▪ Danger exists that recommendation lists contain too many similar items
– All learning techniques require a certain amount of training data
– Some learning methods tend to overfit the training data

▪ Pure content-based systems are rarely found in commercial environments

- 50 -
Reference

Charu C. Aggarwal, Recommender Systems: The Textbook, Springer,

Switzerland, 2016.

- 51 -

RecommenderSystems Shortened
No ratings yet
RecommenderSystems Shortened
95 pages
MS - BDA Lec - Recommendation Systems I
No ratings yet
MS - BDA Lec - Recommendation Systems I
31 pages
Recommender Systems Overview and Methods
No ratings yet
Recommender Systems Overview and Methods
36 pages
Slides Lecture 2 RecSys
No ratings yet
Slides Lecture 2 RecSys
86 pages
Module5 Recommender Systems PartA
No ratings yet
Module5 Recommender Systems PartA
54 pages
10 Recommender Systems
No ratings yet
10 Recommender Systems
35 pages
Rec - Unit 1
No ratings yet
Rec - Unit 1
66 pages
Recommender System - New
No ratings yet
Recommender System - New
49 pages
An Optimized Item-Based Collaborative Filtering Recommendation Algorithm
No ratings yet
An Optimized Item-Based Collaborative Filtering Recommendation Algorithm
5 pages
Recommender Systems
No ratings yet
Recommender Systems
23 pages
Online Book Recommendation System
100% (1)
Online Book Recommendation System
21 pages
.Trashed-1724941095-Recommender Systems
No ratings yet
.Trashed-1724941095-Recommender Systems
30 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
72 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
Unit III Collaborative Filtering Final
No ratings yet
Unit III Collaborative Filtering Final
65 pages
Lect 13 DM
No ratings yet
Lect 13 DM
20 pages
Machine Learning Recommender Systems
No ratings yet
Machine Learning Recommender Systems
33 pages
DM - Lecture 5
No ratings yet
DM - Lecture 5
75 pages
CSE545 sp23 (9) Recommendation Systems 4-10
No ratings yet
CSE545 sp23 (9) Recommendation Systems 4-10
72 pages
Recommendations Using Collaborative Filtering
No ratings yet
Recommendations Using Collaborative Filtering
37 pages
Recommender - Introduction
No ratings yet
Recommender - Introduction
25 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
E-Commerce Recommendation Systems Guide
No ratings yet
E-Commerce Recommendation Systems Guide
29 pages
Introduction to Recommender Systems
No ratings yet
Introduction to Recommender Systems
16 pages
Unit Iii-Collaborative Filtering
No ratings yet
Unit Iii-Collaborative Filtering
34 pages
Module5 Recommender Systems PartB
No ratings yet
Module5 Recommender Systems PartB
57 pages
Recommender Systems Overview
No ratings yet
Recommender Systems Overview
26 pages
AI Recommendation System
No ratings yet
AI Recommendation System
20 pages
Unit 1 Recommender Systems
No ratings yet
Unit 1 Recommender Systems
33 pages
Unit Iii Collaborative Filtering
No ratings yet
Unit Iii Collaborative Filtering
51 pages
Understanding Recommendation Systems
No ratings yet
Understanding Recommendation Systems
37 pages
Recommender Systems Asanov
No ratings yet
Recommender Systems Asanov
7 pages
CS345A Data Mining: Recommendation Systems
No ratings yet
CS345A Data Mining: Recommendation Systems
26 pages
Moradabad Institute of Technology
No ratings yet
Moradabad Institute of Technology
13 pages
Unit 3
No ratings yet
Unit 3
21 pages
Experiment
No ratings yet
Experiment
36 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
58 pages
Module 5
No ratings yet
Module 5
50 pages
Clustering in Recommender Systems Review
No ratings yet
Clustering in Recommender Systems Review
22 pages
Recommender Systems Overview and Techniques
No ratings yet
Recommender Systems Overview and Techniques
92 pages
Module 5
No ratings yet
Module 5
8 pages
HRS CT2 Revision
No ratings yet
HRS CT2 Revision
15 pages
T10 Recommender System
No ratings yet
T10 Recommender System
45 pages
Lec15-S Sarkar
No ratings yet
Lec15-S Sarkar
12 pages
DM Lec 6
No ratings yet
DM Lec 6
4 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Chatbot-Driven Recommendation Insights
No ratings yet
Chatbot-Driven Recommendation Insights
49 pages
Recommendation Engine Survey
No ratings yet
Recommendation Engine Survey
9 pages
Online Movie Recommendation System
No ratings yet
Online Movie Recommendation System
21 pages
6CS4 ML Unit-5
No ratings yet
6CS4 ML Unit-5
33 pages
An Introduction To Recommender Systems
No ratings yet
An Introduction To Recommender Systems
6 pages
TECHNICAL+NOTE Recommender+Systems+v.27
No ratings yet
TECHNICAL+NOTE Recommender+Systems+v.27
16 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Recommender Systems
No ratings yet
Recommender Systems
12 pages
Collaborative Filtering
No ratings yet
Collaborative Filtering
31 pages
Recommendation System
No ratings yet
Recommendation System
8 pages
Recommendation in Social Media: Recommender System
No ratings yet
Recommendation in Social Media: Recommender System
29 pages
Module 6 - Link Analysis Recommendation Systems
No ratings yet
Module 6 - Link Analysis Recommendation Systems
68 pages
Autoencoders: Types and Uses
No ratings yet
Autoencoders: Types and Uses
72 pages
21ai71 Solutions
No ratings yet
21ai71 Solutions
21 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
9 pages
Causal Inference in Recommender Systems - A Survey and Future Directions
No ratings yet
Causal Inference in Recommender Systems - A Survey and Future Directions
32 pages
Movie Recommendation System Insights
No ratings yet
Movie Recommendation System Insights
10 pages
Big Data Analytics: Key Concepts & Tools
No ratings yet
Big Data Analytics: Key Concepts & Tools
24 pages
Math & Data Science Solutions
No ratings yet
Math & Data Science Solutions
3 pages
Travel Recommendation Model3
No ratings yet
Travel Recommendation Model3
20 pages
Algorithmic Collective Action in Recommender Systems
No ratings yet
Algorithmic Collective Action in Recommender Systems
26 pages
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
No ratings yet
Hybrid Travel Recommendation Algorithm Based On Center Aggregation Parameters
7 pages
Mini Project Synopsis Report Tempe 1
No ratings yet
Mini Project Synopsis Report Tempe 1
25 pages
MLib Cheat Sheet Design
No ratings yet
MLib Cheat Sheet Design
1 page
(1-09) 2020. Recommendation Generation Using Personalized Weight of Meta-Paths in Heterogeneous Information Networks
No ratings yet
(1-09) 2020. Recommendation Generation Using Personalized Weight of Meta-Paths in Heterogeneous Information Networks
15 pages
Report
No ratings yet
Report
28 pages
Movie Recommendation System
50% (4)
Movie Recommendation System
55 pages
Internship Report on Data Science Techniques
No ratings yet
Internship Report on Data Science Techniques
65 pages
Collaborative Filtering-Based Recommendation of Online Social Voting
No ratings yet
Collaborative Filtering-Based Recommendation of Online Social Voting
9 pages
MovieLens Ratings Analysis Case Study
No ratings yet
MovieLens Ratings Analysis Case Study
5 pages
Artificial Intelligence (Unit - 2)
No ratings yet
Artificial Intelligence (Unit - 2)
118 pages
Al Mamunur Rashid Getting To Know You Learning New
No ratings yet
Al Mamunur Rashid Getting To Know You Learning New
10 pages
Smart LMS: Machine Learning Review
No ratings yet
Smart LMS: Machine Learning Review
7 pages
AI's Role in E-Commerce Growth
No ratings yet
AI's Role in E-Commerce Growth
4 pages
Contextual Recommendation System RAG
No ratings yet
Contextual Recommendation System RAG
84 pages
Ilovepdf Merged Organized
No ratings yet
Ilovepdf Merged Organized
47 pages
Module-2 Part-1 - Merged
No ratings yet
Module-2 Part-1 - Merged
66 pages
Unit Iii
No ratings yet
Unit Iii
19 pages
Pixie Www18
No ratings yet
Pixie Www18
10 pages
Final Report Kmean 3
No ratings yet
Final Report Kmean 3
9 pages
2404 16177v1
No ratings yet
2404 16177v1
6 pages
Filter Bubbles in Recommender Systems Fact
No ratings yet
Filter Bubbles in Recommender Systems Fact
21 pages

Rec Sys CF

Uploaded by

Rec Sys CF

Uploaded by

Introduction to Recommender

Which book should I buy for my next vacation?

Recommend items ▪20% of items

▪Items rated > 3 in

▪Finally, conversion perspective

Recommender systems reduce

Content-based: "Show me more of the

Collaborative: "Tell me what's popular

Knowledge-based: "Tell me what fits

Hybrid: combinations of various inputs

▪ The most prominent approach to generate recommendations

▪ Basic assumption and idea

▪ Some first questions

Item1 Item2 Item3 Item4 Item5

Item1 Item2 Item3 Item4 Item5

▪ Takes differences in rating behavior into account

▪ Works well in usual domains, compared with alternative measures

▪ Not all neighbor ratings might be equally "valuable"

▪ Value of number of co-rated items

▪ User-based CF is said to be "memory-based"

Item1 Item2 Item3 Item4 Item5

▪ A common prediction function:

▪ Neighborhood size is typically also limited to a specific size

Using raw-ratings Using Mean-centered ratings

▪ Most commonly used (1 to 5, 1 to 7 Likert response scales)

▪ Cold start problem

Item1 Item2 Item3 Item4 Item5

▪ "Spreading activation" (Huang et al. 2004)

▪ What is the best CF method?

▪ How to evaluate the prediction quality?

▪ What about multi-dimensional ratings?

▪ Aggregates news articles from several thousand sources

▪ In contrast to collaborative approaches, content-based techniques do not require

▪ Pure content-based systems are rarely found in commercial environments

Charu C. Aggarwal, Recommender Systems: The Textbook, Springer,

You might also like