0% found this document useful (0 votes)
17 views48 pages

Rec Sys CF

The document introduces recommender systems, which suggest items to users based on their preferences and behaviors, aiming to enhance user experience and increase sales. It discusses various paradigms such as collaborative filtering, content-based, and hybrid approaches, highlighting their strengths and challenges, including data sparsity and cold start problems. The document also covers algorithmic techniques and evaluation criteria for the effectiveness of these systems.

Uploaded by

Vivek Gurjar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views48 pages

Rec Sys CF

The document introduces recommender systems, which suggest items to users based on their preferences and behaviors, aiming to enhance user experience and increase sales. It discusses various paradigms such as collaborative filtering, content-based, and hybrid approaches, highlighting their strengths and challenges, including data sparsity and cold start problems. The document also covers algorithmic techniques and evaluation criteria for the effectiveness of these systems.

Uploaded by

Vivek Gurjar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Introduction to Recommender

Systems

Sukomal Pal
IIT (BHU)
[email protected]
Some problems we face online
Which digital camera should I buy?
What is the best holiday for me and
my family?
Which is the best investment for supporting the education of my
children?
Which movie should I rent?
Which web sites will I find interesting?

Which book should I buy for my next vacation?


Which degree and university
are the best for my future?
Introduction
●Systems for recommending items (e.g., books,
movies, CD’s, web pages, newsgroup messages) to
users based on examples of their preferences.
●Objectives:
●To propose objects fitting the user needs/wishes
●To sell services (site visits) or goods
●Many search engines and online stores provide
recommendations (e.g. Google Playstore, Amazon,
Netflix, YouTube)
●Recommenders have been shown to substantially
increase clicks (and sales)
Problem domain

- 5-
Purpose and success criteria (1)

▪Different perspectives/aspects
–Depends on domain and purpose
–No holistic evaluation scenario exists

▪Retrieval perspective
–Reduce search costs
–Provide "correct" proposals
–Users know in advance what they want

▪Recommendation perspective
–Serendipity – identify items from the Long Tail
–Users did not know about existence

- 6-
When does a RS do its job well?

▪"Recommend widely
unknown items that users
might actually like!"

Recommend items ▪20% of items


from the long tail
accumulate 74% of all
positive ratings

▪Items rated > 3 in


MovieLens 100K dataset

- 7-
Purpose and success criteria

▪Prediction perspective
–Predict to what degree users like an item
–Most popular evaluation scenario in research

▪Interaction perspective
–Give users a "good feeling"
–Educate users about the product domain
–Convince/persuade users - explain

▪Finally, conversion perspective


–Commercial situations
–Increase "hit", "clickthrough", "lookers to bookers" rates
–Optimize sales margins and profit

- 8-
Recommender Systems

- 9-
Recommender systems

- 10 -
Paradigms of recommender systems

Recommender systems reduce


information overload by estimating
relevance

- 11 -
Paradigms of recommender systems

Personalized recommendations

- 12 -
Paradigms of recommender systems

Content-based: "Show me more of the


same what I've liked"

- 13 -
Paradigms of recommender systems

Collaborative: "Tell me what's popular


among my peers"

- 14 -
Paradigms of recommender systems

Knowledge-based: "Tell me what fits


based on my needs"

- 15 -
Paradigms of recommender systems

Hybrid: combinations of various inputs


and/or composition of different
mechanism

- 16 -
- 17 -
Collaborative Filtering (CF)

▪ The most prominent approach to generate recommendations


– used by large, commercial e-commerce sites
– well-understood, various algorithms and variations exist
– applicable in many domains (book, movies, DVDs, ..)

▪ Approach
– use the "wisdom of the crowd" to recommend items

▪ Basic assumption and idea


– Users give ratings to catalog items (implicitly or explicitly)
– Customers who had similar tastes in the past, will have similar tastes in the
future

- 19 -
Pure CF Approaches

▪ Input
– Only a matrix of given user–item ratings

▪ Output types
– A (numerical) prediction indicating to what degree the current user will like
or dislike a certain item
– A top-N list of recommended items

- 20 -
User-based nearest-neighbor collaborative filtering (1)

- 21 -
User-based nearest-neighbor collaborative filtering (2)

▪ Example
– A database of ratings of the current user, Alice, and some other users is
given:
Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

– Determine whether Alice will like or dislike Item5, which Alice has not yet
rated or seen

- 22 -
User-based nearest-neighbor collaborative filtering (3)

▪ Some first questions


– How do we measure similarity?
– How many neighbors should we consider?
– How do we generate a prediction from the neighbors' ratings?

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

- 23 -
Measuring user similarity (1)

- 24 -
Measuring user similarity (2)

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3 Sim = 0.85
User2 4 3 4 3 5 Sim = 0.00
User3 3 3 1 5 4 Sim = 0.70
User4 1 5 5 2 1 Sim = -0.79

- 25 -
Pearson correlation

▪ Takes differences in rating behavior into account

▪ Works well in usual domains, compared with alternative measures


– such as cosine similarity

- 26 -
Making predictions

- 27 -
Improving the metrics / prediction function

▪ Not all neighbor ratings might be equally "valuable"


– Agreement on commonly liked items is not so informative as agreement on
controversial items
– Possible solution: Give more weight to items that have a higher variance

▪ Value of number of co-rated items


– Use "significance weighting", by e.g., linearly reducing the weight when the
number of co-rated items is low
▪ Case amplification
– Intuition: Give more weight to "very similar" neighbors, i.e., where the
similarity value is close to 1.
▪ Neighborhood selection
– Use similarity threshold or fixed number of neighbors

- 28 -
Memory-based and model-based approaches

▪ User-based CF is said to be "memory-based"


– the rating matrix is directly used to find neighbors / make predictions
– does not scale for most real-world scenarios
– large e-commerce sites have tens of millions of customers and millions of items

▪ Model-based approaches
– based on an offline pre-processing or "model-learning"phase
– at run-time, only the learned model is used to make predictions
– models are updated / re-trained periodically
– large variety of techniques used
– model-building and updating can be computationally expensive
– item-based CF is an example for model-based approaches

- 29 -
Item-based collaborative filtering

▪ Basic idea:
– Use the similarity between items (and not users) to make predictions

▪ Example:
– Look for items that are similar to Item5
– Take Alice's ratings for these items to predict the rating for Item5

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1

- 30 -
The cosine similarity measure

- 31 -
Making predictions

▪ A common prediction function:

▪ Neighborhood size is typically also limited to a specific size


▪ Not all neighbors are taken into account for the prediction
▪ An analysis of the MovieLens dataset indicates that "in most real-world
situations, a neighborhood of 20 to 50 neighbors seems reasonable"
(Herlocker et al. 2002)

- 32 -
USER-USER: User-based Algorithm

- 33 -
ITEM-ITEM: Mean-centering for Cosine Similarity computation

- 34 -
Predicted ratings

Using raw-ratings Using Mean-centered ratings

- 35 -
Pre-processing for item-based filtering
▪ Item-based filtering does not solve the scalability problem itself
▪ Pre-processing approach by Amazon.com (in 2003)
– Calculate all pair-wise item similarities in advance
– The neighborhood to be used at run-time is typically rather small, because only items
are taken into account which the user has rated
– Item similarities are supposed to be more stable than user similarities

▪ Memory requirements
– Up to N2 pair-wise similarities to be memorized (N = number of items) in theory
– In practice, this is significantly lower (items with no co-ratings)
– Further reductions possible
▪ Minimum threshold for co-ratings
▪ Limit the neighborhood size (might affect recommendation accuracy)

- 36 -
More on ratings – Explicit ratings
▪ Probably the most precise ratings

▪ Most commonly used (1 to 5, 1 to 7 Likert response scales)

▪ Research topics
– Optimal granularity of scale; indication that 10-point scale is better accepted in movie dom.
– An even more fine-grained scale was chosen in the joke recommender discussed by Goldberg et al.
(2001), where a continuous scale (from −10 to +10) and a graphical input bar were used
▪ No precision loss from the discretization
▪ User preferences can be captured at a finer granularity
▪ Users actually "like" the graphical interaction method
– Multidimensional ratings (multiple ratings per movie such as ratings for actors and sound)

▪ Main problems
– Users not always willing to rate many items
▪ number of available ratings could be too small → sparse rating matrices → poor recommendation
quality
– How to stimulate users to rate more items?

- 37 -
More on ratings – Implicit ratings
▪ Typically collected by the web shop or application in which the recommender system is
embedded
▪ When a customer buys an item, for instance, many recommender systems interpret this
behavior as a positive rating
▪ Clicks, page views, time spent on some page, demo downloads …
▪ Implicit ratings can be collected constantly and do not require additional efforts from the side
of the user
▪ Main problem
– One cannot be sure whether the user behavior is correctly interpreted
– For example, a user might not like all the books he or she has bought; the user also might
have bought a book for someone else
▪ Implicit ratings can be used in addition to explicit ones; question of correctness of
interpretation

- 38 -
Data sparsity problems

▪ Cold start problem


– How to recommend new items? What to recommend to new users?

▪ Straightforward approaches
– Ask/force users to rate a set of items
– Use another method (e.g., content-based, demographic or simply non-personalized) in
the initial phase
– Default voting: assign default values to items that only one of the two users to be
compared has rated (Breese et al. 1998)
▪ Alternatives
– Use better algorithms (beyond nearest-neighbor approaches)
– Example:
▪ In nearest-neighbor approaches, the set of sufficiently similar neighbors might be
too small to make good predictions
▪ Assume "transitivity" of neighborhoods

- 39 -
Example algorithms for sparse datasets

Item1 Item2 Item3 Item4 Item5

Alice 5 3 4 4 ?
sim = 0.85
User1 3 1 2 3 ?
Predict
User2 4 3 4 3 5 rating for
User3 3 3 1 5 4 User1
User4 1 5 5 2 1
- 40 -
Graph-based methods (1)
▪ "Spreading activation" (Huang et al. 2004)
– Exploit the supposed "transitivity" of customer tastes and thereby augment the matrix
with additional information
– Assume that we are looking for a recommendation for User1
– When using a standard CF approach, User2 will be considered a peer for User1 because
they both bought Item2 and Item4
– Thus Item3 will be recommended to User1 because the nearest neighbor, User2, also
bought or liked it

- 41 -
Graph-based methods (2)
▪ "Spreading activation" (Huang et al. 2004)
– In a standard user-based or item-based CF approach, paths of length 3 will be
considered – that is, Item3 is relevant for User1 because there exists a three-step path
(User1–Item2–User2–Item3) between them
– Because the number of such paths of length 3 is small in sparse rating databases, the
idea is to also consider longer paths (indirect associations) to compute
recommendations
– Using path length 5, for instance

- 42 -
Graph-based methods (3)

▪ "Spreading activation" (Huang et al. 2004)


– Idea: Use paths of lengths > 3
to recommend items
– Length 3: Recommend Item3 to User1
– Length 5: Item1 also recommendable

- 43 -
Collaborative Filtering Issues
▪ Pros:
– well-understood, works well in some domains, no knowledge engineering required

▪ Cons:
– requires user community, sparsity problems, no integration of other knowledge sources, no
explanation of results

▪ What is the best CF method?


– In which situation and which domain? Inconsistent findings; always the same domains and
data sets; differences between methods are often very small (1/100)

▪ How to evaluate the prediction quality?


– MAE / RMSE: What does an MAE of 0.7 actually mean?
– Serendipity (novelty and surprising effect of recommendations)
▪ Not yet fully understood

▪ What about multi-dimensional ratings?

- 44 -
The Google News personalization engine

- 45 -
Google News portal (1)

▪ Aggregates news articles from several thousand sources


▪ Displays them to signed-in users in a personalized way
▪ Collaborative recommendation approach based on
– the click history of the active user and
– the history of the larger community

▪ Main challenges
– Vast number of articles and users
– Generate recommendation list in real time (at most one second)
– Constant stream of new items
– Immediately react to user interaction
▪ Significant efforts with respect to algorithms, engineering, and
parallelization are required

- 46 -
Google News portal (2)
▪ Pure memory-based approaches are not directly applicable and for model-
based approaches, the problem of continuous model updates must be
solved
▪ A combination of model- and memory-based techniques is used
▪ Model-based part: Two clustering techniques are used
– Probabilistic Latent Semantic Indexing (PLSI) as proposed by (Hofmann 2004)
– MinHash as a hashing method

▪ Memory-based part: Analyze story co-visits for dealing with new users
▪ Google's MapReduce technique is used for parallelization in order to make
computation scalable

- 47 -
Discussion & summary

▪ In contrast to collaborative approaches, content-based techniques do not require


user community in order to work
▪ Presented approaches aim to learn a model of user's interest preferences based on
explicit or implicit feedback
– Deriving implicit feedback from user behavior can be problematic

▪ Evaluations show that a good recommendation accuracy can be achieved with help of
machine learning techniques
– These techniques do not require a user community

▪ Danger exists that recommendation lists contain too many similar items
– All learning techniques require a certain amount of training data
– Some learning methods tend to overfit the training data

▪ Pure content-based systems are rarely found in commercial environments

- 50 -
Reference

Charu C. Aggarwal, Recommender Systems: The Textbook, Springer,


Switzerland, 2016.

- 51 -

You might also like