Introduction to Recommender
Systems
Sukomal Pal
IIT (BHU)
[email protected]
Some problems we face online
Which digital camera should I buy?
What is the best holiday for me and
my family?
Which is the best investment for supporting the education of my
children?
Which movie should I rent?
Which web sites will I find interesting?
Which book should I buy for my next vacation?
Which degree and university
are the best for my future?
Introduction
●Systems for recommending items (e.g., books,
movies, CD’s, web pages, newsgroup messages) to
users based on examples of their preferences.
●Objectives:
●To propose objects fitting the user needs/wishes
●To sell services (site visits) or goods
●Many search engines and online stores provide
recommendations (e.g. Google Playstore, Amazon,
Netflix, YouTube)
●Recommenders have been shown to substantially
increase clicks (and sales)
Problem domain
- 5-
Purpose and success criteria (1)
▪Different perspectives/aspects
–Depends on domain and purpose
–No holistic evaluation scenario exists
▪Retrieval perspective
–Reduce search costs
–Provide "correct" proposals
–Users know in advance what they want
▪Recommendation perspective
–Serendipity – identify items from the Long Tail
–Users did not know about existence
- 6-
When does a RS do its job well?
▪"Recommend widely
unknown items that users
might actually like!"
Recommend items ▪20% of items
from the long tail
accumulate 74% of all
positive ratings
▪Items rated > 3 in
MovieLens 100K dataset
- 7-
Purpose and success criteria
▪Prediction perspective
–Predict to what degree users like an item
–Most popular evaluation scenario in research
▪Interaction perspective
–Give users a "good feeling"
–Educate users about the product domain
–Convince/persuade users - explain
▪Finally, conversion perspective
–Commercial situations
–Increase "hit", "clickthrough", "lookers to bookers" rates
–Optimize sales margins and profit
- 8-
Recommender Systems
- 9-
Recommender systems
- 10 -
Paradigms of recommender systems
Recommender systems reduce
information overload by estimating
relevance
- 11 -
Paradigms of recommender systems
Personalized recommendations
- 12 -
Paradigms of recommender systems
Content-based: "Show me more of the
same what I've liked"
- 13 -
Paradigms of recommender systems
Collaborative: "Tell me what's popular
among my peers"
- 14 -
Paradigms of recommender systems
Knowledge-based: "Tell me what fits
based on my needs"
- 15 -
Paradigms of recommender systems
Hybrid: combinations of various inputs
and/or composition of different
mechanism
- 16 -
- 17 -
Collaborative Filtering (CF)
▪ The most prominent approach to generate recommendations
– used by large, commercial e-commerce sites
– well-understood, various algorithms and variations exist
– applicable in many domains (book, movies, DVDs, ..)
▪ Approach
– use the "wisdom of the crowd" to recommend items
▪ Basic assumption and idea
– Users give ratings to catalog items (implicitly or explicitly)
– Customers who had similar tastes in the past, will have similar tastes in the
future
- 19 -
Pure CF Approaches
▪ Input
– Only a matrix of given user–item ratings
▪ Output types
– A (numerical) prediction indicating to what degree the current user will like
or dislike a certain item
– A top-N list of recommended items
- 20 -
User-based nearest-neighbor collaborative filtering (1)
- 21 -
User-based nearest-neighbor collaborative filtering (2)
▪ Example
– A database of ratings of the current user, Alice, and some other users is
given:
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
– Determine whether Alice will like or dislike Item5, which Alice has not yet
rated or seen
- 22 -
User-based nearest-neighbor collaborative filtering (3)
▪ Some first questions
– How do we measure similarity?
– How many neighbors should we consider?
– How do we generate a prediction from the neighbors' ratings?
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
- 23 -
Measuring user similarity (1)
- 24 -
Measuring user similarity (2)
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3 Sim = 0.85
User2 4 3 4 3 5 Sim = 0.00
User3 3 3 1 5 4 Sim = 0.70
User4 1 5 5 2 1 Sim = -0.79
- 25 -
Pearson correlation
▪ Takes differences in rating behavior into account
▪ Works well in usual domains, compared with alternative measures
– such as cosine similarity
- 26 -
Making predictions
- 27 -
Improving the metrics / prediction function
▪ Not all neighbor ratings might be equally "valuable"
– Agreement on commonly liked items is not so informative as agreement on
controversial items
– Possible solution: Give more weight to items that have a higher variance
▪ Value of number of co-rated items
– Use "significance weighting", by e.g., linearly reducing the weight when the
number of co-rated items is low
▪ Case amplification
– Intuition: Give more weight to "very similar" neighbors, i.e., where the
similarity value is close to 1.
▪ Neighborhood selection
– Use similarity threshold or fixed number of neighbors
- 28 -
Memory-based and model-based approaches
▪ User-based CF is said to be "memory-based"
– the rating matrix is directly used to find neighbors / make predictions
– does not scale for most real-world scenarios
– large e-commerce sites have tens of millions of customers and millions of items
▪ Model-based approaches
– based on an offline pre-processing or "model-learning"phase
– at run-time, only the learned model is used to make predictions
– models are updated / re-trained periodically
– large variety of techniques used
– model-building and updating can be computationally expensive
– item-based CF is an example for model-based approaches
- 29 -
Item-based collaborative filtering
▪ Basic idea:
– Use the similarity between items (and not users) to make predictions
▪ Example:
– Look for items that are similar to Item5
– Take Alice's ratings for these items to predict the rating for Item5
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
- 30 -
The cosine similarity measure
- 31 -
Making predictions
▪ A common prediction function:
▪ Neighborhood size is typically also limited to a specific size
▪ Not all neighbors are taken into account for the prediction
▪ An analysis of the MovieLens dataset indicates that "in most real-world
situations, a neighborhood of 20 to 50 neighbors seems reasonable"
(Herlocker et al. 2002)
- 32 -
USER-USER: User-based Algorithm
- 33 -
ITEM-ITEM: Mean-centering for Cosine Similarity computation
- 34 -
Predicted ratings
Using raw-ratings Using Mean-centered ratings
- 35 -
Pre-processing for item-based filtering
▪ Item-based filtering does not solve the scalability problem itself
▪ Pre-processing approach by Amazon.com (in 2003)
– Calculate all pair-wise item similarities in advance
– The neighborhood to be used at run-time is typically rather small, because only items
are taken into account which the user has rated
– Item similarities are supposed to be more stable than user similarities
▪ Memory requirements
– Up to N2 pair-wise similarities to be memorized (N = number of items) in theory
– In practice, this is significantly lower (items with no co-ratings)
– Further reductions possible
▪ Minimum threshold for co-ratings
▪ Limit the neighborhood size (might affect recommendation accuracy)
- 36 -
More on ratings – Explicit ratings
▪ Probably the most precise ratings
▪ Most commonly used (1 to 5, 1 to 7 Likert response scales)
▪ Research topics
– Optimal granularity of scale; indication that 10-point scale is better accepted in movie dom.
– An even more fine-grained scale was chosen in the joke recommender discussed by Goldberg et al.
(2001), where a continuous scale (from −10 to +10) and a graphical input bar were used
▪ No precision loss from the discretization
▪ User preferences can be captured at a finer granularity
▪ Users actually "like" the graphical interaction method
– Multidimensional ratings (multiple ratings per movie such as ratings for actors and sound)
▪ Main problems
– Users not always willing to rate many items
▪ number of available ratings could be too small → sparse rating matrices → poor recommendation
quality
– How to stimulate users to rate more items?
- 37 -
More on ratings – Implicit ratings
▪ Typically collected by the web shop or application in which the recommender system is
embedded
▪ When a customer buys an item, for instance, many recommender systems interpret this
behavior as a positive rating
▪ Clicks, page views, time spent on some page, demo downloads …
▪ Implicit ratings can be collected constantly and do not require additional efforts from the side
of the user
▪ Main problem
– One cannot be sure whether the user behavior is correctly interpreted
– For example, a user might not like all the books he or she has bought; the user also might
have bought a book for someone else
▪ Implicit ratings can be used in addition to explicit ones; question of correctness of
interpretation
- 38 -
Data sparsity problems
▪ Cold start problem
– How to recommend new items? What to recommend to new users?
▪ Straightforward approaches
– Ask/force users to rate a set of items
– Use another method (e.g., content-based, demographic or simply non-personalized) in
the initial phase
– Default voting: assign default values to items that only one of the two users to be
compared has rated (Breese et al. 1998)
▪ Alternatives
– Use better algorithms (beyond nearest-neighbor approaches)
– Example:
▪ In nearest-neighbor approaches, the set of sufficiently similar neighbors might be
too small to make good predictions
▪ Assume "transitivity" of neighborhoods
- 39 -
Example algorithms for sparse datasets
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
sim = 0.85
User1 3 1 2 3 ?
Predict
User2 4 3 4 3 5 rating for
User3 3 3 1 5 4 User1
User4 1 5 5 2 1
- 40 -
Graph-based methods (1)
▪ "Spreading activation" (Huang et al. 2004)
– Exploit the supposed "transitivity" of customer tastes and thereby augment the matrix
with additional information
– Assume that we are looking for a recommendation for User1
– When using a standard CF approach, User2 will be considered a peer for User1 because
they both bought Item2 and Item4
– Thus Item3 will be recommended to User1 because the nearest neighbor, User2, also
bought or liked it
- 41 -
Graph-based methods (2)
▪ "Spreading activation" (Huang et al. 2004)
– In a standard user-based or item-based CF approach, paths of length 3 will be
considered – that is, Item3 is relevant for User1 because there exists a three-step path
(User1–Item2–User2–Item3) between them
– Because the number of such paths of length 3 is small in sparse rating databases, the
idea is to also consider longer paths (indirect associations) to compute
recommendations
– Using path length 5, for instance
- 42 -
Graph-based methods (3)
▪ "Spreading activation" (Huang et al. 2004)
– Idea: Use paths of lengths > 3
to recommend items
– Length 3: Recommend Item3 to User1
– Length 5: Item1 also recommendable
- 43 -
Collaborative Filtering Issues
▪ Pros:
– well-understood, works well in some domains, no knowledge engineering required
▪ Cons:
– requires user community, sparsity problems, no integration of other knowledge sources, no
explanation of results
▪ What is the best CF method?
– In which situation and which domain? Inconsistent findings; always the same domains and
data sets; differences between methods are often very small (1/100)
▪ How to evaluate the prediction quality?
– MAE / RMSE: What does an MAE of 0.7 actually mean?
– Serendipity (novelty and surprising effect of recommendations)
▪ Not yet fully understood
▪ What about multi-dimensional ratings?
- 44 -
The Google News personalization engine
- 45 -
Google News portal (1)
▪ Aggregates news articles from several thousand sources
▪ Displays them to signed-in users in a personalized way
▪ Collaborative recommendation approach based on
– the click history of the active user and
– the history of the larger community
▪ Main challenges
– Vast number of articles and users
– Generate recommendation list in real time (at most one second)
– Constant stream of new items
– Immediately react to user interaction
▪ Significant efforts with respect to algorithms, engineering, and
parallelization are required
- 46 -
Google News portal (2)
▪ Pure memory-based approaches are not directly applicable and for model-
based approaches, the problem of continuous model updates must be
solved
▪ A combination of model- and memory-based techniques is used
▪ Model-based part: Two clustering techniques are used
– Probabilistic Latent Semantic Indexing (PLSI) as proposed by (Hofmann 2004)
– MinHash as a hashing method
▪ Memory-based part: Analyze story co-visits for dealing with new users
▪ Google's MapReduce technique is used for parallelization in order to make
computation scalable
- 47 -
Discussion & summary
▪ In contrast to collaborative approaches, content-based techniques do not require
user community in order to work
▪ Presented approaches aim to learn a model of user's interest preferences based on
explicit or implicit feedback
– Deriving implicit feedback from user behavior can be problematic
▪ Evaluations show that a good recommendation accuracy can be achieved with help of
machine learning techniques
– These techniques do not require a user community
▪ Danger exists that recommendation lists contain too many similar items
– All learning techniques require a certain amount of training data
– Some learning methods tend to overfit the training data
▪ Pure content-based systems are rarely found in commercial environments
- 50 -
Reference
Charu C. Aggarwal, Recommender Systems: The Textbook, Springer,
Switzerland, 2016.
- 51 -