11 Association Rules Mining and Recommendation Systems
11 Association Rules Mining and Recommendation Systems
• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session
• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class
• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic
• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance
• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience
▪ Apriori Algorithm
▪ Recommendation Engines
X Y
Support
Gives fraction of
transactions which contains
the item X and Y 02
01
Lift Confidence
Lift indicates the strength of Gives how often the items X
a rule over the random co- 03 and Y occurs together, given
occurrence of X and Y no. of times X occurs
Support = no. of times item X occurred / Total Confidence = no. of times item X & Y occurred
number of transactions = / Total occurrence of X =
P (X Y))= Pr (Y | X) =
= support
Goal: Find all rules with user-specified minimum support (minsup) and minimum confidence (minconf)
AD
CA
AC
B&CD
▪ Now we can find support, confidence and lift for these rules using the formula explained earlier:
Frequent Itemset
𝜖 Frequent Itemset
TID Items
Minimum support count = 2
100 134
200 235
300 1235
400 25
Since, our threshold value is 2, any itemsets with support less than it are omitted.
If any of the subsets of these itemsets are not there in FI2 then remove that itemset
TID Items
100 134
Itemset Support FI3
200 235 Itemset Support CI4
{1,3,5} 2
300 1235 {1,2,3,5} 1
{2,3,5} 2
400 25
500 135
Since, support of your CI4 is less than 2, you will stop and return to the previous itemset, i.e. CI3
▪ Using this you will generate all non empty subsets for each frequent itemsets:
❖ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
❖ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
df = pd.read_excel('Online_Retail.xlsx')
df.head()
def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)
basket_sets
Observations:
▪ A few rules with a high lift value, which means that it occurs more frequently than would be expected,
given the number of transaction and product combinations
▪ Most of the places the confidence is high as well
▪ Help users discover products or content that we may not come across otherwise
It is also possible to combine both these methods to build a much more robust recommendation
engine (Hybrid Recommender Systems)
▪ It looks at other things they like and combines them to create a ranked list of suggestions
▪ Many algorithms have been used in measuring user similarity or item similarity:
❖ K – nearest neighbor (k-NN) approach[21]
❖ Pearson Correlation
It does not rely on machine analyzable content and therefore it’s capable of accurately recommending complex
items such as drinks without requiring an "understanding" of the item itself
Suppose Sarah has just watched the movie Inside out. Let’s see how
the recommendation engine works and which are the movies that it
thinks she would like to see next
First step
Easy to
• Easy to explain overall mathematical logic
Comprehend
Differentiated
• More differentiated output than associated rule
Output
• Need enough users or items to find a match, does not work for
Cold Start
new user or item
If Sam buys DUFF consumer merchandise, content based filtering considers DUFF beer
can as an entity and recommends other DUFF merchandise such as a tee shirt to the
buyer
Suppose we have watched the movie Inside out, Lets see how the
recommendation engine works and which are the movies that it
thinks we would like to go see
No
• Able to recommend to users with unique tastes
Differentiation
No first rater
• Able to recommend new and unpopular item
problem
Find important • Finding the appropriate feature is hard. E.g.: For movies and
features images which all features are important
Over
• Never recommends items outside user’s content profile
specialization
No good
• Unable to exploit quality judgements of other users
judgements
df = pd.read_csv('Recommend.csv',names=['user_id',
'movie_id', 'rating', 'timestamp'])
df
n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)
The data here now gets split as train data and test data, such that the train data is 75% of
the total data
rmse(user_pred, test_data_matrix)
The error so obtained is 3.1205151270317386 which is minimal and thus you can
conclude that the model is a good model
rmse(movie_pred, test_data_matrix)
The error so obtained is 3.447038130496627 which is minimal and thus you can
conclude that the model is a good model
▪ Apriori Algorithm
▪ Recommendation Engines