0% found this document useful (0 votes)
35 views70 pages

11 Association Rules Mining and Recommendation Systems

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views70 pages

11 Association Rules Mining and Recommendation Systems

Uploaded by

mahesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

How To Make The Best Use Of Live Sessions

• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session

• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class

• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic

• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance

• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience

Copyright © edureka and/or its affiliates. All rights reserved.


Course Outline

Introduction to Python Dimensionality Reduction

Sequences and File Operations Supervised Learning - II

Deep Dive-Functions, OOPS,


Modules, Errors and Exceptions Unsupervised Learning

Introduction to Numpy, Pandas Association Rules Mining and


and Matplotlib Recommendation Systems

Data Manipulation Reinforcement Learning

Introduction to Machine Learning


with Python Time Series Analysis

Supervised Learning - I Model Selection and Boosting

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining and
Recommendation Systems
Topics
The topics covered in this module are:

▪ Association Rule Mining

▪ Apriori Algorithm

▪ Recommendation Engines

▪ Building a Recommender System

Copyright © edureka and/or its affiliates. All rights reserved.


Objectives
After completing this module, you should be able to:
▪ Define Association Rules
▪ Understand Apriori Algorithm
▪ Define Recommendation Engine
▪ Discuss types of Recommendation Engines
❖ Collaborative Filtering
❖ Content-Based Filtering
▪ Illustrate steps to build Recommendation Engines

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining
▪ Association rule mining is a method for
discovering interesting relations between
variables in large databases

▪ Pattern that states when an event occurs,


one more event occurs with a certain
probability in parallel

Customers who purchase a keyboard have 60%


likelihood of also purchasing a mouse for their PC
as well

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining
An Example of association rule is given below,

X Y

It means that if a person buys item X then he will also buy


item Y

Let’s dive a bit


deeper into
Association
Rules

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining: Parameters
Association rule mining takes care of the following parameters:

Support
Gives fraction of
transactions which contains
the item X and Y 02

01
Lift Confidence
Lift indicates the strength of Gives how often the items X
a rule over the random co- 03 and Y occurs together, given
occurrence of X and Y no. of times X occurs

Copyright © edureka and/or its affiliates. All rights reserved.


Calculating Support, Confidence & Lift

Support = no. of times item X occurred / Total Confidence = no. of times item X & Y occurred
number of transactions = / Total occurrence of X =
P (X  Y))= Pr (Y | X) =

 = support

Lift = no. of times item X & Y occurred / Total


occurrence of X multiplied by Total occurrence
of Y =

Goal: Find all rules with user-specified minimum support (minsup) and minimum confidence (minconf)

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining
▪ Lets take an example,
▪ Suppose we have five transactions T1,T2,T3,T4,T5 as given below:
T1 : A, B, C
T2 : A, C, D
T3 : B, C, D
T4 : A, D, E
T5 : B, C, E
▪ Here,
❖ A,B,C,D,E are items in a store, I = {A,B,C,D,E}
❖ Set of all transactions T = {T1,T2,T3,T4,T5}
❖ Each transaction is a set of items, T ⊆ I

Copyright © edureka and/or its affiliates. All rights reserved.


Association Rule Mining
▪ Suppose, you made some association rules using our transaction database as given below:

AD
CA
AC
B&CD

▪ Now we can find support, confidence and lift for these rules using the formula explained earlier:

Rule Support Confidence Lift


AD 2/5 2/3 2/9
CA 2/5 2/4 1/6
AC 2/5 2/3 1/6
B&CD 1/5 1/3 1/9

Copyright © edureka and/or its affiliates. All rights reserved.


Now, let’s understand
how apriori algorithm
is used for generating
association rules.

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm
Uses frequent itemsets to generate association rules,

“A subset of a frequent itemset must also be a frequent itemset”

Frequent Itemset
𝜖 Frequent Itemset

Note: Frequent Itemset: Support value > Threshold value

Copyright © edureka and/or its affiliates. All rights reserved.


Let’s understand
Apriori with an
example

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm
Consider the following transaction Dataset. We are using a threshold value = 2 for this example:

TID Items
Minimum support count = 2
100 134

200 235

300 1235

400 25

500 135 First step is to


build a list of
itemsets of size
one using this
dataset

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – First Iteration
List of itemsets of size one is made. Also, its support values are calculated

TID Items Itemset Support CI1


Itemset Support FI1
100 134 {1} 3
{1} 3
200 235 {2} 3
{2} 3
300 1235 {3} 4
{3} 4
400 25 {4} 1
{5} 4
500 135 {5} 4

Since, our threshold value is 2, any itemsets with support less than it are omitted.

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Second Iteration
▪ In this iteration we will extend the length of our item set with 1, i.e. k = k+1

▪ All the combinations of itemsets in FI1 is used in this iteration

TID Items Itemset Support CI2


Itemset Support FI2
100 134 {1,2} 1
{1,3} 3
200 235 {1,3} 3
{1,5} 2
300 1235 {1,5} 2
{2,3} 2
400 25 {2,3} 2
{2,5} 3
500 135 {2,5} 3
{3,5} 3
{3,5} 3

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Third Iteration
▪ In this iteration also we will extend the length of our item set. All the combinations of itemsets in FI2 is
used

TID Items CI3


Itemset Support
100 134
{1,2,3}
200 235
{1,2,5}
300 1235
{1,3,5}
400 25
{2,3,5}
500 135 Before finding
the support value
we will do some
pruning of the
dataset

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Pruning
▪ After the combinations are made you will divide your itemsets to check if there any other subsets whose
support you have’nt calculated yet
FI2
Itemset In FI2? CI3
TID Items Itemset Support
{1,2,3}
100 134 No {1,3} 3
{1,2},{1,3},{2,3}
200 235 {1,2,5} {1,5} 2
No
{1,2},{1,5},{2,5} {2,3} 2
300 1235
{1,3,5} {2,5} 3
400 25 Yes
{1,5},{1,3},{3,5}
{3,5} 3
500 135 {2,3,5}
Yes
{2,3},{2,5},{3,5}

If any of the subsets of these itemsets are not there in FI2 then remove that itemset

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Fourth Iteration
▪ Using the itemsets of CI3 you will create new itemset CI4

TID Items
100 134
Itemset Support FI3
200 235 Itemset Support CI4
{1,3,5} 2
300 1235 {1,2,3,5} 1
{2,3,5} 2
400 25
500 135

Since, support of your CI4 is less than 2, you will stop and return to the previous itemset, i.e. CI3

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Subset Creation
▪ Now you have the list of frequent itemsets as:

Itemset Support FI3 Let’s assume the


minimum
{1,3,5} 2 confidence value is
60%
{2,3,5} 2

▪ Using this you will generate all non empty subsets for each frequent itemsets:
❖ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
❖ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}

▪ For every subsets S of I, you output the rule


❖ S → (I-S) (means S recommends I-S)
❖ if support(I) / support(S) >= min_conf value

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 1: {1,3} → ({1,3,5} - {1,3}) means 1 & 3 → 5
Confidence = support(1,3,5)/support(1,3) = 2/3 = 66.66% > 60% 100 134
Rule 1 is selected 200 235

❖ Rule 2: {1,5} → ({1,3,5} - {1,5}) means 1 & 5 → 3 300 1235

Confidence = support(1,3,5)/support(1,5) = 2/2 = 100% > 60% 400 25


Rule 2 is selected
500 135
❖ Rule 3: {3,5} → ({1,3,5} - {3,5}) means 3 & 5 → 1

Confidence = support(1,3,5)/support(3,5) = 2/3 = 66.66% > 60%


Rule 3 is selected

Copyright © edureka and/or its affiliates. All rights reserved.


Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 4: {1} → ({1,3,5} - {1}) means 1 → 3 & 5
100 134
Confidence = support(1,3,5)/support(1) = 2/3 = 66.66% > 60%
Rule 4 is selected 200 235

❖ Rule 5: {3} → ({1,3,5} - {3}) means 3 → 1 & 5 300 1235

Confidence = support(1,3,5)/support(3) = 2/4 = 50% <60% 400 25


Rule 5 is rejected
500 135
❖ Rule 6: {5} → ({1,3,5} - {5}) means 5 → 1 & 3
Confidence = support(1,3,5)/support(5) = 2/4 = 50% < 60%
Rule 6 is rejected

Copyright © edureka and/or its affiliates. All rights reserved.


Now let’s learn how
association rules are
used in Market Basket
Analysis in Python

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis
We will be using the following online transactional data of a retail store for generating association rules
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 01-12-2010 08:26 2.55 17850 United Kingdom
536365 71053 WHITE METAL LANTERN 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84406B CREAM CUPID HEARTS COAT HANGER 8 01-12-2010 08:26 2.75 17850 United Kingdom
536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 22752 SET 7 BABUSHKA NESTING BOXES 2 01-12-2010 08:26 7.65 17850 United Kingdom
536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6 01-12-2010 08:26 4.25 17850 United Kingdom
536366 22633 HAND WARMER UNION JACK 6 01-12-2010 08:28 1.85 17850 United Kingdom
536366 22632 HAND WARMER RED POLKA DOT 6 01-12-2010 08:28 1.85 17850 United Kingdom
536367 84879 ASSORTED COLOUR BIRD ORNAMENT 32 01-12-2010 08:34 1.69 13047 United Kingdom
536367 22745 POPPY'S PLAYHOUSE BEDROOM 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22748 POPPY'S PLAYHOUSE KITCHEN 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22749 FELTCRAFT PRINCESS CHARLOTTE DOLL 8 01-12-2010 08:34 3.75 13047 United Kingdom
536367 22310 IVORY KNITTED MUG COSY 6 01-12-2010 08:34 1.65 13047 United Kingdom

Data can be downloaded from the LMS

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step1
First you need to get your pandas and MLxtend libraries imported and read the data

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

df = pd.read_excel('Online_Retail.xlsx')
df.head()

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 2
In this step, you will be doing:
▪ Data clean up which includes removing spaces from some of the descriptions
▪ Drop the rows that don’t have invoice numbers and remove the credit transactions
df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]
df

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 3
▪ After the clean-up, we need to consolidate the items into 1 transaction per row with each product
▪ For the sake of keeping the data set small, we are only looking at sales for France
basket = (df[df['Country'] =="France"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
basket

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 4
▪ There are a lot of zeros in the data but we also need to make sure any positive values are converted to a 1
and anything less the 0 is set to 0

def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)
basket_sets

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 4 (O/P)
Now, you have structured the data properly

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 5
In this step, you will:
▪ Generate frequent item sets that have a support of at least 7% (this number is chosen so that you can get
close enough)
▪ Generate the rules with their corresponding support, confidence and lift

frequent_itemsets = apriori(basket_sets, min_support=0.07,


use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1)
rules.head()

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 5 (O/P)

Observations:
▪ A few rules with a high lift value, which means that it occurs more frequently than would be expected,
given the number of transaction and product combinations
▪ Most of the places the confidence is high as well

Copyright © edureka and/or its affiliates. All rights reserved.


Market Basket Analysis: Step 6
▪ Filter the dataframe using standard pandas code, for a large lift (6) and high confidence (.8)

rules[ (rules['lift'] >= 6) &


(rules['confidence'] >= 0.8) ]

Copyright © edureka and/or its affiliates. All rights reserved.


For association rules, the granularity
lies at the transaction level. They use
transactions as a central entity and
hence, do not provide user specific
insights

For that we will use


Recommendation
Engines

Copyright © edureka and/or its affiliates. All rights reserved.


Recommendation Engines
▪ A Recommendation engine (sometimes referred to as a recommender system) is a tool, that
allows algorithm developers predict what a user may or may not like among a list of given items

▪ Help users discover products or content that we may not come across otherwise

This makes recommendation engines a great part


of web sites and services such as Facebook,
YouTube, Amazon, and more

Copyright © edureka and/or its affiliates. All rights reserved.


Recommendation Engine Types
Recommendation engines work ideally in one of two ways:

User-based filtering Content-based filtering

Building a model from a user's past Utilizes a series of discrete


behavior as well as similar decisions characteristics of an item in order to
made by other users. This model is then recommend additional items to user
used to predict items that the user may with similar properties
have an interest in

It is also possible to combine both these methods to build a much more robust recommendation
engine (Hybrid Recommender Systems)

Copyright © edureka and/or its affiliates. All rights reserved.


Hybrid Recommender System – Example
A Hybrid recommender system is a based on both UBF and CBF

Netflix, nowadays is using hybrid recommender systems for


movie/series recommendations to users

Copyright © edureka and/or its affiliates. All rights reserved.


User-Based Collaborative Filtering (UBCF)
▪ This algorithm searches a large group of people and finds a smaller set with tastes similar to yours

▪ It looks at other things they like and combines them to create a ranked list of suggestions

▪ Many algorithms have been used in measuring user similarity or item similarity:
❖ K – nearest neighbor (k-NN) approach[21]
❖ Pearson Correlation

It does not rely on machine analyzable content and therefore it’s capable of accurately recommending complex
items such as drinks without requiring an "understanding" of the item itself

Copyright © edureka and/or its affiliates. All rights reserved.


UBCF Working: Step 1
Consider an example of Movie Recommendation

Suppose Sarah has just watched the movie Inside out. Let’s see how
the recommendation engine works and which are the movies that it
thinks she would like to see next

First step

1. Generate a list of users who have seen the following movies

Copyright © edureka and/or its affiliates. All rights reserved.


UBCF Working: Step 2
2. Here we have 4 users who has
watched the following movies

John Yes Yes Yes Yes


Dave No Yes No Yes

Stuart Yes No Yes No


Sam No No Yes Yes

Copyright © edureka and/or its affiliates. All rights reserved.


UBCF Working: Step 3
Now, we find Users similar to Sarah

John Yes Yes Yes Yes


Dave No Yes No Yes

Stuart Yes No Yes No


Sam No No Yes Yes
Sarah Yes ?? ?? ??

Copyright © edureka and/or its affiliates. All rights reserved.


UBCF Working: Step 4
4. Based on the data we have, John and Stuart has also watched the movie inside out, so they are similar to
Sarah

John Yes Yes Yes Yes


Dave No Yes No Yes

Stuart Yes No Yes No


Sam No No Yes Yes
Sarah Yes ?? ?? ??

Copyright © edureka and/or its affiliates. All rights reserved.


UBCF Working: Step 5
▪ Using the data of similar users we
can see that the movie Avengers
gets more vote, so it is
recommended to Sarah

John Yes Yes Yes Yes


Dave No Yes No Yes

Stuart Yes No Yes No


Sam No No Yes Yes
Sarah Yes ?? ?? ??

1 vote 2 votes 1 vote

Copyright © edureka and/or its affiliates. All rights reserved.


Pros & Cons of User-based Filtering
Data not a • Works on consumer item scenario without any user or item
constraint feature data availability

Easy to
• Easy to explain overall mathematical logic
Comprehend

Differentiated
• More differentiated output than associated rule
Output

• Need enough users or items to find a match, does not work for
Cold Start
new user or item

• User/ratings matrix is sparse and hence, hard to find users that


Sparsity
have rated the same items

• Tends to recommend popular items, cannot recommend items


Popularity Bias
to one with unique taste

Copyright © edureka and/or its affiliates. All rights reserved.


Content Based Filtering

Copyright © edureka and/or its affiliates. All rights reserved.


Content Based Filtering
Based on that data,
Have the content as a user profile is
the central entities generated to make
suggestions to the
user
03
02
01
Works with data that
the user provides, 04
either explicitly As the user provides
(rating) or implicitly more and more
(clicking on a link, input the engine’s
purchase history) accuracy increases

Copyright © edureka and/or its affiliates. All rights reserved.


Content Based Filtering – An Example

If Sam buys DUFF consumer merchandise, content based filtering considers DUFF beer
can as an entity and recommends other DUFF merchandise such as a tee shirt to the
buyer

Copyright © edureka and/or its affiliates. All rights reserved.


CBF Working: Step 1
Consider the same example of Movie Recommendation

Suppose we have watched the movie Inside out, Lets see how the
recommendation engine works and which are the movies that it
thinks we would like to go see

1. Generate a list of features about the movies like, Actors,


Directors, Themes etc

Copyright © edureka and/or its affiliates. All rights reserved.


CBF Working: Step 2
2. Compare columns of each movies
with column of the movie Inside
out and see which of them
matches

Animated Yes Yes No No


Marvel No No Yes Yes

Super Villain No Yes Yes Yes


IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes

Copyright © edureka and/or its affiliates. All rights reserved.


CBF Working: Step 3
3. The column with the most match
is of Minions, so the system will
recommend it to watch

Animated Yes Yes No No


Marvel No No Yes Yes

Super Villain No Yes Yes Yes


IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes
3
1
1
Copyright © edureka and/or its affiliates. All rights reserved.
Pros & Cons of Content Based Filtering
Only user data • No need for data on other users

No
• Able to recommend to users with unique tastes
Differentiation

No first rater
• Able to recommend new and unpopular item
problem

Find important • Finding the appropriate feature is hard. E.g.: For movies and
features images which all features are important

Over
• Never recommends items outside user’s content profile
specialization

No good
• Unable to exploit quality judgements of other users
judgements

Copyright © edureka and/or its affiliates. All rights reserved.


Use-Case: E-Commerce Sites
Many of the largest commerce Web sites are already using
recommender systems to help their customers find products to
purchase

▪ The products can be recommended based on the top overall


sellers on a site or based on an analysis of the past buying
behavior of the customer as a prediction for future buying
behavior

Copyright © edureka and/or its affiliates. All rights reserved.


Use-Case : Social Networks
Social networking sites employ recommendation systems in
contribution to providing better user experiences

▪ Facebook and LinkedIn focus on link recommendation where


friend recommendations are presented to users
▪ Most of the friend suggestion mechanism rely on pre-existing
user relationship to pick friend candidates

Copyright © edureka and/or its affiliates. All rights reserved.


Building a Recommender System

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case: Scenario
▪ Consider the ratings dataset below, containing the data on: UserID, MovieID, Rating and Timestamp
▪ Each line of this file represents one rating of one movie by one user, and has the following format:
UserID::MovieID::Rating::Timestamp 196 242 3 881250949
186 302 3 891717742
▪ Ratings are made on a 5 star scale with half star increments 22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
UserID: represents ID of the user 115 265 2 881171488
253 465 5 891628467
MovieID: represents ID of the movie 305 451 3 886324817
Timestamp: represents seconds since midnight Coordinated Universal Time 6 86 3 883603013
62 257 2 879372434
(UTC) of January 1, 1970
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457

Data can be downloaded from the LMS

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case: Tasks To Do
Predict recommendations Estimate the user-movie
based on user movie 2 model validation using root
collaberative filtering mean squared error
3
1
4 Estimate the movie-movie
Predict recommendations
model validation using root
based on movie-movie
mean squared error
collaberative filtering

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 1
Load the ‘Ratings’ movie data into pandas with labels

df = pd.read_csv('Recommend.csv',names=['user_id',
'movie_id', 'rating', 'timestamp'])
df

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 2
Declare number of users and movies and create a train test split of 75/25

n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)

The data here now gets split as train data and test data, such that the train data is 75% of
the total data

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 3
Populate the train matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating

train_data_matrix = np.zeros((n_users, n_movies))


for line in train_data.itertuples():
#[user_id index, movie_id index] = given rating.
train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 4
Populate the test matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating

test_data_matrix = np.zeros((n_users, n_movies))


for line in test_data.itertuples():
#[user_id index, movie_id index] = given rating.
test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 5
Creates cosine similarity matrices for users and movies and predict a user-movie recommendation model (based
on difference from mean rating as it’s a better indicator than absolute rating)
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
movie_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')
mean_user_rating = train_data_matrix.mean(axis=1)[:, np.newaxis]
ratings_diff = (train_data_matrix - mean_user_rating)
user_pred = mean_user_rating + user_similarity.dot(ratings_diff) /
np.array([np.abs(user_similarity).sum(axis=1)]).T
user_pred

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 6
Predict the same for the movie based recommendation model (based on difference from mean rating as it’s a
better indicator than absolute rating)
movie_pred = train_data_matrix.dot(movie_similarity) /
np.array([np.abs(movie_similarity).sum(axis=1)])
movie_pred

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 7
Define a root mean squared error (RMSE) function to check the validity of the user-based and movie-based
recommendation model

def rmse(pred, test):


pred = pred[test.nonzero()].flatten()
test = test[test.nonzero()].flatten()
return sqrt(mean_squared_error(pred, test))

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 8
Pass the user-based model that you have recently created into the rmse function

rmse(user_pred, test_data_matrix)

The error so obtained is 3.1205151270317386 which is minimal and thus you can
conclude that the model is a good model

Copyright © edureka and/or its affiliates. All rights reserved.


Use Case Solution: Step 9
Pass the movie-based model that you have recently created into the rmse function

rmse(movie_pred, test_data_matrix)

The error so obtained is 3.447038130496627 which is minimal and thus you can
conclude that the model is a good model

Copyright © edureka and/or its affiliates. All rights reserved.


Summary
▪ Association Rule Mining

▪ Support, Confidence & Lift Evaluation

▪ Apriori Algorithm

▪ Implementing Market Basket Analysis

▪ Recommendation Engines

Copyright © edureka and/or its affiliates. All rights reserved.


Copyright © edureka and/or its affiliates. All rights reserved.
Copyright © edureka and/or its affiliates. All rights reserved.

You might also like