0% found this document useful (0 votes)

35 views70 pages

11 Association Rules Mining and Recommendation Systems

Uploaded by

mahesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views70 pages

11 Association Rules Mining and Recommendation Systems

Uploaded by

mahesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

How To Make The Best Use Of Live Sessions

• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session

• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class

• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic

• Raise a ticket through your LMS in case of any queries. Our dedicated support team is available 24 x 7 for your assistance

• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience

Copyright © edureka and/or its affiliates. All rights reserved.

Course Outline

Introduction to Python Dimensionality Reduction

Sequences and File Operations Supervised Learning - II

Deep Dive-Functions, OOPS,

Modules, Errors and Exceptions Unsupervised Learning

Introduction to Numpy, Pandas Association Rules Mining and

and Matplotlib Recommendation Systems

Data Manipulation Reinforcement Learning

Introduction to Machine Learning

with Python Time Series Analysis

Supervised Learning - I Model Selection and Boosting

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining and
Recommendation Systems
Topics
The topics covered in this module are:

▪ Association Rule Mining

▪ Apriori Algorithm

▪ Recommendation Engines

▪ Building a Recommender System

Copyright © edureka and/or its affiliates. All rights reserved.

Objectives
After completing this module, you should be able to:
▪ Define Association Rules
▪ Understand Apriori Algorithm
▪ Define Recommendation Engine
▪ Discuss types of Recommendation Engines
❖ Collaborative Filtering
❖ Content-Based Filtering
▪ Illustrate steps to build Recommendation Engines

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining
▪ Association rule mining is a method for
discovering interesting relations between
variables in large databases

▪ Pattern that states when an event occurs,

one more event occurs with a certain
probability in parallel

Customers who purchase a keyboard have 60%

likelihood of also purchasing a mouse for their PC
as well

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining
An Example of association rule is given below,

X Y

It means that if a person buys item X then he will also buy

item Y

Let’s dive a bit

deeper into
Association
Rules

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining: Parameters
Association rule mining takes care of the following parameters:

Support
Gives fraction of
transactions which contains
the item X and Y 02

01
Lift Confidence
Lift indicates the strength of Gives how often the items X
a rule over the random co- 03 and Y occurs together, given
occurrence of X and Y no. of times X occurs

Copyright © edureka and/or its affiliates. All rights reserved.

Calculating Support, Confidence & Lift

Support = no. of times item X occurred / Total Confidence = no. of times item X & Y occurred
number of transactions = / Total occurrence of X =
P (X  Y))= Pr (Y | X) =

 = support

Lift = no. of times item X & Y occurred / Total

occurrence of X multiplied by Total occurrence
of Y =

Goal: Find all rules with user-specified minimum support (minsup) and minimum confidence (minconf)

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining
▪ Lets take an example,
▪ Suppose we have five transactions T1,T2,T3,T4,T5 as given below:
T1 : A, B, C
T2 : A, C, D
T3 : B, C, D
T4 : A, D, E
T5 : B, C, E
▪ Here,
❖ A,B,C,D,E are items in a store, I = {A,B,C,D,E}
❖ Set of all transactions T = {T1,T2,T3,T4,T5}
❖ Each transaction is a set of items, T ⊆ I

Copyright © edureka and/or its affiliates. All rights reserved.

Association Rule Mining
▪ Suppose, you made some association rules using our transaction database as given below:

AD
CA
AC
B&CD

▪ Now we can find support, confidence and lift for these rules using the formula explained earlier:

Rule Support Confidence Lift

AD 2/5 2/3 2/9
CA 2/5 2/4 1/6
AC 2/5 2/3 1/6
B&CD 1/5 1/3 1/9

Copyright © edureka and/or its affiliates. All rights reserved.

Now, let’s understand
how apriori algorithm
is used for generating
association rules.

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm
Uses frequent itemsets to generate association rules,

“A subset of a frequent itemset must also be a frequent itemset”

Frequent Itemset
𝜖 Frequent Itemset

Note: Frequent Itemset: Support value > Threshold value

Copyright © edureka and/or its affiliates. All rights reserved.

Let’s understand
Apriori with an
example

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm
Consider the following transaction Dataset. We are using a threshold value = 2 for this example:

TID Items
Minimum support count = 2
100 134

200 235

300 1235

400 25

500 135 First step is to

build a list of
itemsets of size
one using this
dataset

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – First Iteration
List of itemsets of size one is made. Also, its support values are calculated

TID Items Itemset Support CI1

Itemset Support FI1
100 134 {1} 3
{1} 3
200 235 {2} 3
{2} 3
300 1235 {3} 4
{3} 4
400 25 {4} 1
{5} 4
500 135 {5} 4

Since, our threshold value is 2, any itemsets with support less than it are omitted.

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Second Iteration
▪ In this iteration we will extend the length of our item set with 1, i.e. k = k+1

▪ All the combinations of itemsets in FI1 is used in this iteration

TID Items Itemset Support CI2

Itemset Support FI2
100 134 {1,2} 1
{1,3} 3
200 235 {1,3} 3
{1,5} 2
300 1235 {1,5} 2
{2,3} 2
400 25 {2,3} 2
{2,5} 3
500 135 {2,5} 3
{3,5} 3
{3,5} 3

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Third Iteration
▪ In this iteration also we will extend the length of our item set. All the combinations of itemsets in FI2 is
used

TID Items CI3

Itemset Support
100 134
{1,2,3}
200 235
{1,2,5}
300 1235
{1,3,5}
400 25
{2,3,5}
500 135 Before finding
the support value
we will do some
pruning of the
dataset

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Pruning
▪ After the combinations are made you will divide your itemsets to check if there any other subsets whose
support you have’nt calculated yet
FI2
Itemset In FI2? CI3
TID Items Itemset Support
{1,2,3}
100 134 No {1,3} 3
{1,2},{1,3},{2,3}
200 235 {1,2,5} {1,5} 2
No
{1,2},{1,5},{2,5} {2,3} 2
300 1235
{1,3,5} {2,5} 3
400 25 Yes
{1,5},{1,3},{3,5}
{3,5} 3
500 135 {2,3,5}
Yes
{2,3},{2,5},{3,5}

If any of the subsets of these itemsets are not there in FI2 then remove that itemset

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Fourth Iteration
▪ Using the itemsets of CI3 you will create new itemset CI4

TID Items
100 134
Itemset Support FI3
200 235 Itemset Support CI4
{1,3,5} 2
300 1235 {1,2,3,5} 1
{2,3,5} 2
400 25
500 135

Since, support of your CI4 is less than 2, you will stop and return to the previous itemset, i.e. CI3

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Subset Creation
▪ Now you have the list of frequent itemsets as:

Itemset Support FI3 Let’s assume the

minimum
{1,3,5} 2 confidence value is
60%
{2,3,5} 2

▪ Using this you will generate all non empty subsets for each frequent itemsets:
❖ For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
❖ For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}

▪ For every subsets S of I, you output the rule

❖ S → (I-S) (means S recommends I-S)
❖ if support(I) / support(S) >= min_conf value

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 1: {1,3} → ({1,3,5} - {1,3}) means 1 & 3 → 5
Confidence = support(1,3,5)/support(1,3) = 2/3 = 66.66% > 60% 100 134
Rule 1 is selected 200 235

❖ Rule 2: {1,5} → ({1,3,5} - {1,5}) means 1 & 5 → 3 300 1235

Confidence = support(1,3,5)/support(1,5) = 2/2 = 100% > 60% 400 25

Rule 2 is selected
500 135
❖ Rule 3: {3,5} → ({1,3,5} - {3,5}) means 3 & 5 → 1

Confidence = support(1,3,5)/support(3,5) = 2/3 = 66.66% > 60%

Rule 3 is selected

Copyright © edureka and/or its affiliates. All rights reserved.

Apriori Algorithm – Applying Rules
▪ Now we will apply our rules to the itemsets of FI3:
1. {1,3,5}
TID Items
❖ Rule 4: {1} → ({1,3,5} - {1}) means 1 → 3 & 5
100 134
Confidence = support(1,3,5)/support(1) = 2/3 = 66.66% > 60%
Rule 4 is selected 200 235

❖ Rule 5: {3} → ({1,3,5} - {3}) means 3 → 1 & 5 300 1235

Confidence = support(1,3,5)/support(3) = 2/4 = 50% <60% 400 25

Rule 5 is rejected
500 135
❖ Rule 6: {5} → ({1,3,5} - {5}) means 5 → 1 & 3
Confidence = support(1,3,5)/support(5) = 2/4 = 50% < 60%
Rule 6 is rejected

Copyright © edureka and/or its affiliates. All rights reserved.

Now let’s learn how
association rules are
used in Market Basket
Analysis in Python

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis
We will be using the following online transactional data of a retail store for generating association rules
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 01-12-2010 08:26 2.55 17850 United Kingdom
536365 71053 WHITE METAL LANTERN 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84406B CREAM CUPID HEARTS COAT HANGER 8 01-12-2010 08:26 2.75 17850 United Kingdom
536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 01-12-2010 08:26 3.39 17850 United Kingdom
536365 22752 SET 7 BABUSHKA NESTING BOXES 2 01-12-2010 08:26 7.65 17850 United Kingdom
536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6 01-12-2010 08:26 4.25 17850 United Kingdom
536366 22633 HAND WARMER UNION JACK 6 01-12-2010 08:28 1.85 17850 United Kingdom
536366 22632 HAND WARMER RED POLKA DOT 6 01-12-2010 08:28 1.85 17850 United Kingdom
536367 84879 ASSORTED COLOUR BIRD ORNAMENT 32 01-12-2010 08:34 1.69 13047 United Kingdom
536367 22745 POPPY'S PLAYHOUSE BEDROOM 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22748 POPPY'S PLAYHOUSE KITCHEN 6 01-12-2010 08:34 2.1 13047 United Kingdom
536367 22749 FELTCRAFT PRINCESS CHARLOTTE DOLL 8 01-12-2010 08:34 3.75 13047 United Kingdom
536367 22310 IVORY KNITTED MUG COSY 6 01-12-2010 08:34 1.65 13047 United Kingdom

Data can be downloaded from the LMS

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step1
First you need to get your pandas and MLxtend libraries imported and read the data

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

df = pd.read_excel('Online_Retail.xlsx')
df.head()

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 2
In this step, you will be doing:
▪ Data clean up which includes removing spaces from some of the descriptions
▪ Drop the rows that don’t have invoice numbers and remove the credit transactions
df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]
df

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 3
▪ After the clean-up, we need to consolidate the items into 1 transaction per row with each product
▪ For the sake of keeping the data set small, we are only looking at sales for France
basket = (df[df['Country'] =="France"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
basket

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 4
▪ There are a lot of zeros in the data but we also need to make sure any positive values are converted to a 1
and anything less the 0 is set to 0

def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)
basket_sets

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 4 (O/P)
Now, you have structured the data properly

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 5
In this step, you will:
▪ Generate frequent item sets that have a support of at least 7% (this number is chosen so that you can get
close enough)
▪ Generate the rules with their corresponding support, confidence and lift

frequent_itemsets = apriori(basket_sets, min_support=0.07,

use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift",
min_threshold=1)
rules.head()

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 5 (O/P)

Observations:
▪ A few rules with a high lift value, which means that it occurs more frequently than would be expected,
given the number of transaction and product combinations
▪ Most of the places the confidence is high as well

Copyright © edureka and/or its affiliates. All rights reserved.

Market Basket Analysis: Step 6
▪ Filter the dataframe using standard pandas code, for a large lift (6) and high confidence (.8)

rules[ (rules['lift'] >= 6) &

(rules['confidence'] >= 0.8) ]

Copyright © edureka and/or its affiliates. All rights reserved.

For association rules, the granularity
lies at the transaction level. They use
transactions as a central entity and
hence, do not provide user specific
insights

For that we will use

Recommendation
Engines

Copyright © edureka and/or its affiliates. All rights reserved.

Recommendation Engines
▪ A Recommendation engine (sometimes referred to as a recommender system) is a tool, that
allows algorithm developers predict what a user may or may not like among a list of given items

▪ Help users discover products or content that we may not come across otherwise

This makes recommendation engines a great part

of web sites and services such as Facebook,
YouTube, Amazon, and more

Copyright © edureka and/or its affiliates. All rights reserved.

Recommendation Engine Types
Recommendation engines work ideally in one of two ways:

User-based filtering Content-based filtering

Building a model from a user's past Utilizes a series of discrete

behavior as well as similar decisions characteristics of an item in order to
made by other users. This model is then recommend additional items to user
used to predict items that the user may with similar properties
have an interest in

It is also possible to combine both these methods to build a much more robust recommendation
engine (Hybrid Recommender Systems)

Copyright © edureka and/or its affiliates. All rights reserved.

Hybrid Recommender System – Example
A Hybrid recommender system is a based on both UBF and CBF

Netflix, nowadays is using hybrid recommender systems for

movie/series recommendations to users

Copyright © edureka and/or its affiliates. All rights reserved.

User-Based Collaborative Filtering (UBCF)
▪ This algorithm searches a large group of people and finds a smaller set with tastes similar to yours

▪ It looks at other things they like and combines them to create a ranked list of suggestions

▪ Many algorithms have been used in measuring user similarity or item similarity:
❖ K – nearest neighbor (k-NN) approach[21]
❖ Pearson Correlation

It does not rely on machine analyzable content and therefore it’s capable of accurately recommending complex
items such as drinks without requiring an "understanding" of the item itself

Copyright © edureka and/or its affiliates. All rights reserved.

UBCF Working: Step 1
Consider an example of Movie Recommendation

Suppose Sarah has just watched the movie Inside out. Let’s see how
the recommendation engine works and which are the movies that it
thinks she would like to see next

First step

1. Generate a list of users who have seen the following movies

Copyright © edureka and/or its affiliates. All rights reserved.

UBCF Working: Step 2
2. Here we have 4 users who has
watched the following movies

John Yes Yes Yes Yes

Dave No Yes No Yes

Stuart Yes No Yes No

Sam No No Yes Yes

Copyright © edureka and/or its affiliates. All rights reserved.

UBCF Working: Step 3
Now, we find Users similar to Sarah

John Yes Yes Yes Yes

Dave No Yes No Yes

Stuart Yes No Yes No

Sam No No Yes Yes
Sarah Yes ?? ?? ??

Copyright © edureka and/or its affiliates. All rights reserved.

UBCF Working: Step 4
4. Based on the data we have, John and Stuart has also watched the movie inside out, so they are similar to
Sarah

John Yes Yes Yes Yes

Dave No Yes No Yes

Stuart Yes No Yes No

Sam No No Yes Yes
Sarah Yes ?? ?? ??

Copyright © edureka and/or its affiliates. All rights reserved.

UBCF Working: Step 5
▪ Using the data of similar users we
can see that the movie Avengers
gets more vote, so it is
recommended to Sarah

John Yes Yes Yes Yes

Dave No Yes No Yes

Stuart Yes No Yes No

Sam No No Yes Yes
Sarah Yes ?? ?? ??

1 vote 2 votes 1 vote

Pros & Cons of User-based Filtering
Data not a • Works on consumer item scenario without any user or item
constraint feature data availability

Easy to
• Easy to explain overall mathematical logic
Comprehend

Differentiated
• More differentiated output than associated rule
Output

• Need enough users or items to find a match, does not work for
Cold Start
new user or item

• User/ratings matrix is sparse and hence, hard to find users that

Sparsity
have rated the same items

• Tends to recommend popular items, cannot recommend items

Popularity Bias
to one with unique taste

Content Based Filtering

Content Based Filtering
Based on that data,
Have the content as a user profile is
the central entities generated to make
suggestions to the
user
03
02
01
Works with data that
the user provides, 04
either explicitly As the user provides
(rating) or implicitly more and more
(clicking on a link, input the engine’s
purchase history) accuracy increases

Content Based Filtering – An Example

If Sam buys DUFF consumer merchandise, content based filtering considers DUFF beer
can as an entity and recommends other DUFF merchandise such as a tee shirt to the
buyer

CBF Working: Step 1
Consider the same example of Movie Recommendation

Suppose we have watched the movie Inside out, Lets see how the
recommendation engine works and which are the movies that it
thinks we would like to go see

1. Generate a list of features about the movies like, Actors,

Directors, Themes etc

CBF Working: Step 2
2. Compare columns of each movies
with column of the movie Inside
out and see which of them
matches

Animated Yes Yes No No

Marvel No No Yes Yes

Super Villain No Yes Yes Yes

IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes

CBF Working: Step 3
3. The column with the most match
is of Minions, so the system will
recommend it to watch

Animated Yes Yes No No

Marvel No No Yes Yes

Super Villain No Yes Yes Yes

IMDB rating 8+ Yes No Yes No
Comedy Yes Yes No Yes
3
1
1
Copyright © edureka and/or its affiliates. All rights reserved.
Pros & Cons of Content Based Filtering
Only user data • No need for data on other users

No
• Able to recommend to users with unique tastes
Differentiation

No first rater
• Able to recommend new and unpopular item
problem

Find important • Finding the appropriate feature is hard. E.g.: For movies and
features images which all features are important

Over
• Never recommends items outside user’s content profile
specialization

No good
• Unable to exploit quality judgements of other users
judgements

Use-Case: E-Commerce Sites
Many of the largest commerce Web sites are already using
recommender systems to help their customers find products to
purchase

▪ The products can be recommended based on the top overall

sellers on a site or based on an analysis of the past buying
behavior of the customer as a prediction for future buying
behavior

Use-Case : Social Networks
Social networking sites employ recommendation systems in
contribution to providing better user experiences

▪ Facebook and LinkedIn focus on link recommendation where

friend recommendations are presented to users
▪ Most of the friend suggestion mechanism rely on pre-existing
user relationship to pick friend candidates

Building a Recommender System

Use Case: Scenario
▪ Consider the ratings dataset below, containing the data on: UserID, MovieID, Rating and Timestamp
▪ Each line of this file represents one rating of one movie by one user, and has the following format:
UserID::MovieID::Rating::Timestamp 196 242 3 881250949
186 302 3 891717742
▪ Ratings are made on a 5 star scale with half star increments 22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
UserID: represents ID of the user 115 265 2 881171488
253 465 5 891628467
MovieID: represents ID of the movie 305 451 3 886324817
Timestamp: represents seconds since midnight Coordinated Universal Time 6 86 3 883603013
62 257 2 879372434
(UTC) of January 1, 1970
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457

Data can be downloaded from the LMS

Use Case: Tasks To Do
Predict recommendations Estimate the user-movie
based on user movie 2 model validation using root
collaberative filtering mean squared error
3
1
4 Estimate the movie-movie
Predict recommendations
model validation using root
based on movie-movie
mean squared error
collaberative filtering

Use Case Solution: Step 1
Load the ‘Ratings’ movie data into pandas with labels

df = pd.read_csv('Recommend.csv',names=['user_id',
'movie_id', 'rating', 'timestamp'])
df

Use Case Solution: Step 2
Declare number of users and movies and create a train test split of 75/25

n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.25)

The data here now gets split as train data and test data, such that the train data is 75% of
the total data

Use Case Solution: Step 3
Populate the train matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating

train_data_matrix = np.zeros((n_users, n_movies))

for line in train_data.itertuples():
#[user_id index, movie_id index] = given rating.
train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix

Use Case Solution: Step 4
Populate the test matrix (user_id x movie_id), containing ratings such that [user_id index, movie_id index] =
given rating

test_data_matrix = np.zeros((n_users, n_movies))

for line in test_data.itertuples():
#[user_id index, movie_id index] = given rating.
test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix

Use Case Solution: Step 5
Creates cosine similarity matrices for users and movies and predict a user-movie recommendation model (based
on difference from mean rating as it’s a better indicator than absolute rating)
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
movie_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')
mean_user_rating = train_data_matrix.mean(axis=1)[:, np.newaxis]
ratings_diff = (train_data_matrix - mean_user_rating)
user_pred = mean_user_rating + user_similarity.dot(ratings_diff) /
np.array([np.abs(user_similarity).sum(axis=1)]).T
user_pred

Use Case Solution: Step 6
Predict the same for the movie based recommendation model (based on difference from mean rating as it’s a
better indicator than absolute rating)
movie_pred = train_data_matrix.dot(movie_similarity) /
np.array([np.abs(movie_similarity).sum(axis=1)])
movie_pred

Use Case Solution: Step 7
Define a root mean squared error (RMSE) function to check the validity of the user-based and movie-based
recommendation model

def rmse(pred, test):

pred = pred[test.nonzero()].flatten()
test = test[test.nonzero()].flatten()
return sqrt(mean_squared_error(pred, test))

Use Case Solution: Step 8
Pass the user-based model that you have recently created into the rmse function

rmse(user_pred, test_data_matrix)

The error so obtained is 3.1205151270317386 which is minimal and thus you can
conclude that the model is a good model

Use Case Solution: Step 9
Pass the movie-based model that you have recently created into the rmse function

rmse(movie_pred, test_data_matrix)

The error so obtained is 3.447038130496627 which is minimal and thus you can
conclude that the model is a good model

Summary
▪ Association Rule Mining

▪ Support, Confidence & Lift Evaluation

▪ Apriori Algorithm

▪ Implementing Market Basket Analysis

▪ Recommendation Engines

Chapter 5
No ratings yet
Chapter 5
34 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Unit 3
No ratings yet
Unit 3
36 pages
Pattern Mining
No ratings yet
Pattern Mining
36 pages
Chapter 3
No ratings yet
Chapter 3
23 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Session8 PDF
No ratings yet
Session8 PDF
94 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Unit-3 New
No ratings yet
Unit-3 New
75 pages
Association Rules
No ratings yet
Association Rules
33 pages
Options With Python
100% (1)
Options With Python
203 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Unit 4
No ratings yet
Unit 4
72 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Apriori Algorithm Example Problems
No ratings yet
Apriori Algorithm Example Problems
8 pages
APrior Algorithm
No ratings yet
APrior Algorithm
11 pages
Apriori Algorithm in Machine Learning
No ratings yet
Apriori Algorithm in Machine Learning
8 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
11 Association Rules Mining New
No ratings yet
11 Association Rules Mining New
32 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
U2 - Apriori - 5th Sem - DS
No ratings yet
U2 - Apriori - 5th Sem - DS
12 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
Multi-Objective Optimization For Football Team Member Selection
No ratings yet
Multi-Objective Optimization For Football Team Member Selection
13 pages
Introduction To The Apriori Algorithm
No ratings yet
Introduction To The Apriori Algorithm
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
APRIARI Algorithm
No ratings yet
APRIARI Algorithm
55 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Lecture - 11 - Sathya - Zainab
No ratings yet
Lecture - 11 - Sathya - Zainab
17 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Ex. 9 Association Rule Learning Using Apriori Algorithm
No ratings yet
Ex. 9 Association Rule Learning Using Apriori Algorithm
3 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Marketing Cloud Personalization Salesforce Updated Dumps
No ratings yet
Marketing Cloud Personalization Salesforce Updated Dumps
8 pages
Unit - 5 Machine Learning
No ratings yet
Unit - 5 Machine Learning
72 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
Tutorial IJCAI 2013
No ratings yet
Tutorial IJCAI 2013
144 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
ChatGPT For Internal Auditors
No ratings yet
ChatGPT For Internal Auditors
15 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
No ratings yet
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
9 pages
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
No ratings yet
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
11 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
154 pages
Deep Learning Based Context Aware Recommender System
No ratings yet
Deep Learning Based Context Aware Recommender System
70 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Ii Project Documentation Template
No ratings yet
Ii Project Documentation Template
86 pages
Research Project Last Sem
No ratings yet
Research Project Last Sem
66 pages
Arcade Lesson Quiz Answer Quicklab
No ratings yet
Arcade Lesson Quiz Answer Quicklab
304 pages
Unit Iii Big Data Analytics What Is Data?
No ratings yet
Unit Iii Big Data Analytics What Is Data?
36 pages
Lecture 8
No ratings yet
Lecture 8
13 pages
Unit V
No ratings yet
Unit V
22 pages
Contents
No ratings yet
Contents
59 pages
Case Study On Amazon
No ratings yet
Case Study On Amazon
8 pages
Apriori
No ratings yet
Apriori
34 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Final Updated Report 13
No ratings yet
Final Updated Report 13
64 pages
Oeldorf
No ratings yet
Oeldorf
21 pages
Association Rule
No ratings yet
Association Rule
27 pages
Comprehensive Guide To Business Analytics
No ratings yet
Comprehensive Guide To Business Analytics
10 pages
Univarite Hope
No ratings yet
Univarite Hope
103 pages
2.notes CS8080 - Information Retrieval Technique
No ratings yet
2.notes CS8080 - Information Retrieval Technique
164 pages
MachineLearning Algorithm - Hope
No ratings yet
MachineLearning Algorithm - Hope
125 pages
Research Paper: Leveraging Advanced Data Processing and Analytics Techniques To Revolutionize Customer Experience Technologies
No ratings yet
Research Paper: Leveraging Advanced Data Processing and Analytics Techniques To Revolutionize Customer Experience Technologies
25 pages
Module 4 Techniques in Big Data Analytics
No ratings yet
Module 4 Techniques in Big Data Analytics
46 pages
Alhajj. 2014
No ratings yet
Alhajj. 2014
63 pages
Song Recommdation
No ratings yet
Song Recommdation
18 pages
Jurnal Inggris (Handan Siswaningrum - 11150910000014)
No ratings yet
Jurnal Inggris (Handan Siswaningrum - 11150910000014)
10 pages
Key Fact Statement
No ratings yet
Key Fact Statement
7 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Feature Selection Engineering
No ratings yet
Feature Selection Engineering
72 pages
Virtual Fitness Trainer Using Artificial Intelligence
No ratings yet
Virtual Fitness Trainer Using Artificial Intelligence
11 pages
Movie Recommendation System Using TF-IDF Vectorization and Cosine Similarity
No ratings yet
Movie Recommendation System Using TF-IDF Vectorization and Cosine Similarity
9 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Koushik Katakam Resume
No ratings yet
Koushik Katakam Resume
2 pages
Google LLM Conversational Recs
No ratings yet
Google LLM Conversational Recs
24 pages
IEEE P7003TM Standard For Algorithmic Bias Considerations
No ratings yet
IEEE P7003TM Standard For Algorithmic Bias Considerations
4 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
13 pages
Battery Health Monitoring System - 240504 - 195808
No ratings yet
Battery Health Monitoring System - 240504 - 195808
2 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Project ch.1
No ratings yet
Project ch.1
5 pages
Module 4 String Data Structure
No ratings yet
Module 4 String Data Structure
9 pages
Ref Beplop
No ratings yet
Ref Beplop
1 page
SymphonyAI Overview
No ratings yet
SymphonyAI Overview
8 pages
MaheshKumar ApplicationForm
No ratings yet
MaheshKumar ApplicationForm
8 pages
CV 2022081018080367
No ratings yet
CV 2022081018080367
2 pages
Streaming Data Sample
No ratings yet
Streaming Data Sample
5 pages
Facerecognition Results Metrics
No ratings yet
Facerecognition Results Metrics
3 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Case Study 3
No ratings yet
Case Study 3
2 pages
Case Study 1
No ratings yet
Case Study 1
2 pages
Python For Data Science .
100% (4)
Python For Data Science .
112 pages
Chatgpt Prompt Engineering
0% (1)
Chatgpt Prompt Engineering
9 pages
Chatgpt Prompt Engineering
50% (2)
Chatgpt Prompt Engineering
12 pages
CompTIA Security+ SY0-601 Exam Practice Tests
From Everand
CompTIA Security+ SY0-601 Exam Practice Tests
CertSquad Professional Trainers
No ratings yet
Instant Citrix Security How-to
From Everand
Instant Citrix Security How-to
Carmel Jacob
No ratings yet
CompTIA Cloud+ (Plus) Certification Practice Questions, Answers and Master the Exam
From Everand
CompTIA Cloud+ (Plus) Certification Practice Questions, Answers and Master the Exam
Jake T Mills
No ratings yet
AZ-104 Azure Administrator Practice Paper 2: AZ-104 Azure Administrator, #2
From Everand
AZ-104 Azure Administrator Practice Paper 2: AZ-104 Azure Administrator, #2
Tech Interviews
No ratings yet