0% found this document useful (0 votes)
28 views40 pages

RS Part 1

Uploaded by

subhan Bashir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views40 pages

RS Part 1

Uploaded by

subhan Bashir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Part-I Recommender Systems

Theory and Implementation

Dr. Mustansar Ali Ghazanfar


[email protected]

CN7030 2324 (T2) Machine Learning on Big Data


Learning Outcomes

By the end of this lecture on Recommender Systems, you will be

able to:

▪ Define key concepts in Recommender Systems.

▪ Differentiate Collaborative Filtering, Content-Based Filtering,

▪ Demographics-Based Systems, and Hybrid Systems.

▪ Dimensionality Reduction based

▪ Implement in pyspark (lab)


Agenda

❑Introductionof Recommender Systems (RS)


❑Types of Recommender Systems (Methodology
used)
❑Collaborative Filteringsystems
❑Content-based filtering systems
❑Demographic based systems
❑Hybrid Systems
❑Dimensionality Reduction
❑Conclusion
Collective Intelligence

❑Wisdom of the crowd


❑ Some pattern emerges
❑ Some intelligence is found which
was not visible at individual level.
❑Many applications
❑Collaborative filtering
❑socialmedia analysis
❑Stock market prediction Satnam Alag
❑Tag cloud analysis
Recommender Systems (RS)
❑Information filtering systems that make
recommendations on items based on a
model of user preferences.
❑Examples 5
Book Recommender

Red
Mars

Found
ation

Juras-
sic
Park
Machine User
Lost
Learning Profile
World

2001 Neuro- 2010


mancer

Differ-
ence
Engine
User and Item profiles

❑User profile is made of ratings (about items) they


provided
❑Item profile is made of ratings given by other
users, item’s content features
❑A data base of rating exists called training set
❑User x item matrix called user-item rating matrix
❑very sparse matrix
Collaborative Filtering

❑ Maintain a database of many users’ ratings of a


variety of items.
❑ For a given user, find other similar users whose
ratings strongly correlate with the current user.
8

❑ Recommend items rated highly by these similar


users, but not rated by the current user.
❑ Almost all existing commercial recommenders
use this approach (e.g. Amazon).
Collaborative Filtering

A 9 A A 5 A 6 A 10 Community
Active User B 3 B B 3 B 4 B 4
C C 9 C C C 8
. . : : : : : : . .
Z 5 Z2 Z 7 Z Z 5

Rating
Correlation
Prediction Match

Aggregate

Votes
Neighbours
Collaborative Filtering

A 9 A A 5 A A 6 A 10
User B 3 B B 3 B B 4 B 4
10
C C 9 C C 8 C C 8
Database : : : : : : : : : : . .
Z 5 Z 10 Z 7 Z Z Z 1

A 9 A 10
B 3 B 4
Correlation C C 8
Match : : . .
Z 5 Z 1

A 9
Active B 3 Extract C
C Recommendations
User . .
Z 5
User-Based CF

Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
11
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5
User-Based CF

Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
12
x
User2 5 5 5 1

User3 4 4 4 1

User4 3 3 5

Sim (u1,u2) = {-1,+1} → 0.5


Sim (u1,u3) = {-1,+1} → -0.1
Sim (u1,u4) = {-1,+1} → 0.3

Sum(|sim|) = 0.5 + 0.1 + 0.3 = 0.9


User-Based CF
Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
x
User2 5 5 13
5 1

User3 4 4 4 1

User4 3 3 5

Prediction = {(Sim (u1,u2) * rating_User2)


+ (Sim (u1,u3) * rating_User3)
+ (Sim (u1,u4) * rating_User4) } / Sum(sim)

Prediction = {(0.5 * 5) + (-0.1 * 4) + (0.3 * 3)} /(0.9)


= 3.05
User-Based CF

Item1 Item2 Item3 Item4 Item5 Item6

User1 4 2 1 5
14

User2 5 5 5 1
x
User3 4 4 4 1

User4 3 3 5
Item-Based CF

Item1 Item2 Item3 Item4 Item5 Item6

User1 4 1 5
15
x
User2 5 5 5 1 1

User3 4 4 4 1

User4 3 3 5
Item-Based CF

U1 U2 U3 U4 U5 U6

I1 4 1 5
16
x
I2 5 5 5 1 1

I3 4 4 4 1

I4 3 3 5

Sim (I1,I2) = {-1,+1} → 0.4


Sim (I1,I3) = {-1,+1} → -0.1
Sim (I1,I4) = {-1,+1} → 0.3

Sum(|sim|) = 0.4 + 0.1 + 0.3 = 0.8


Item-Based CF

U1 U2 U3 U4 U5 U6

I1 4 1 5
17
x
I2 5 5 5 1 1

I3 4 4 4 1

I4 3 3 5

Prediction = {(Sim (I1,I2) * rating_User3_I2)


+ (Sim (I1,I3) * rating_User3_I3)
+ (Sim (I1,I4) * rating_User43_I4 } / Sum(sim)

Prediction = {(0.4 * 5) + (-0.1 * 4) + (0.3 * 3)} /(0.9)


= 2.85
What is similarity?
Find similar users, Similarity measures.

Vector Similarity

Pearson Correlation
Pearson Correlation
Cold start scenarios

❑New user cold start scenario, where the user is


new and we don’t have any data about them
❑New item cold start scenario, where item is new
and no one has rated it
❑New system cold start scenario, where the system
is new so the problem is how to bootstrap it?
Types of Collaborative Filtering (CF)

❑ Memory based approaches


▪ User-based CF
▪ Item-based CF

❑ Model-based approaches
▪ Clustering
▪ MF
▪ SVD etc
Advantages with Collaborative Filtering

▪ Simple algorithm
▪ Use wisdom of crowd
22

▪ Can recommend ‘out of the box’


▪ High quality recommendations (high accuracy)
Problems with Collaborative Filtering
▪ Cold Start: There needs to be enough other users already in the
system to find a match.
▪ Sparsity: If there are many items to be recommended, even if
there are many users, the user/ratings matrix is sparse, and it is
hard to find users that have rated the same items.
23

▪ First Rater: Cannot recommend an item that has not been


previously rated.
▪ New items
▪ Popularity Bias: Cannot recommend items to someone with
unique tastes.
▪ Gray-Sheep Users (i.e. outlier problem)
Content-Based Recommending
(CBR)
▪ Recommendations are based on information on the
content of items rather than on other users’
24

opinions.
▪ Uses a machine learning algorithm to induce a
profile of the users preferences from examples
based on a featural description of content.
Advantages of CBR
▪ No cold-start or sparsity problems.
▪ Able to recommend to users with unique tastes.
▪ Able to recommend new and unpopular items
▪ Can provide explanations of recommended items by listing
25

content-features that caused an item to be recommended.


Disadvantages of CBR
▪ Requires content that can be encoded as meaningful features
(problem for multimedia data).
▪ Users’ tastes must be represented as a learnable function of
these content features.26

▪ Low quality recommendation (low accuracy)


▪ Unable to exploit quality judgments of other users.
Content-Based Recommending Vs
collaborating Filtering

Content Based Filtering Collaborating Filtering

Info about content of items


27
Info about the similar users
(or items)
Features & keywords about Ratings & Past history of user
item

Train Machine Learning Use (social) collective


classifiers and predict Intelligence to predict
Feature Extraction for CBF
❑Identify specific pieces of information in a
unstructured or semi-structured textual
document.
❑Steps [9]
▪Tokenization
▪Stop word removal
▪Normalization
▪Stemming
▪Indexing
▪Weighting schemes
Ontology-Based Recommendations in CBF
❑Represent user and item profiles by ontology
❑Ontology inferences (is-a, etc.)
❑Benefits claim
▪Accuracy
▪Overcome some problems
▪Coverage
Ratings-based profile representation
Relevance ratings
Statistical techniques find useful correlations
Demographic-Based RS
❑Cluster users/items based on demographic features
❑Demographic features:
▪ User
✓ age, gender, post code, etc.
▪ Item
✓Movie → horror, romantic, etc.
Integration Framework

F_Ratings Correlation F_Demo Correlation Feature Correlation

Target Item
Neighbour Item
No. Of neighbours Active User’s
Active User
Rating

Prediction:
SVD-Based Recommendations

❑Matrixfactorization based technique


❑SVD decomposes a matrix into three factors

=
SSk V’
RRk UUk Vk’

mXn m
m XXkr kr XX rk kr XXnn

Rk = Uk.Sk.Vk’ ≈ R (K-Rank approximation)

This reduced matrix can be used to predict ratings


Which were not present in the original matrix
Clustering-Based Recommendations

Slow Convergence? Centroid selection?


Cluster Sizes? Online One pass
Things you need to know

❑Collective Intelligence
❑Recommender systems
❑Collaborative Filtering
35

❑Content based filtering


❑Demographic based systems
❑Hybrid systems
❑SVD
❑Clustering
That’s all folk!
Online Resources

❑Coursera
https://www.coursera.org/course/recsys
❑Robust, scalable, and practical algorithms for
recommender systems, Mustansar Ali Ghazanfar,
http://eprints.soton.ac.uk/343761/
37

❑Toward the next generation of recommender


systems: a survey of the state-of-the-art and
possible extensions, Adomavicius, G.
How Amazon Makes Business
References and Online Resources

▪ LinkedIn: https://www.linkedin.com/learning/comptia-data-plus-
da0-001-cert-prep-domain-3-0-data-analysis/data-
analysis?u=56744289

▪ Interview Tips: https://www.linkedin.com/advice/1/what-best-


ways-prepare-data-analysis-interviews

You might also like