0% found this document useful (0 votes)
15 views

Experiment No. 9

The document analyzes a retail dataset using association rule mining. It loads and cleans the data, generates frequent itemsets and association rules. Key steps include transforming the transactional data, finding itemsets that meet a minimum support threshold, and generating the association rules from these frequent itemsets with metrics like confidence and lift.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Experiment No. 9

The document analyzes a retail dataset using association rule mining. It loads and cleans the data, generates frequent itemsets and association rules. Key steps include transforming the transactional data, finding itemsets that meet a minimum support threshold, and generating the association rules from these frequent itemsets with metrics like confidence and lift.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

# Importing the libraries

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt

data=pd.read_csv("retail_dataset.csv")
data.head()

0 1 2 3 4 5 6
0 Bread Wine Eggs Meat Cheese Pencil Diaper
1 Bread Cheese Meat Diaper Wine Milk Pencil
2 Cheese Meat Eggs Milk Wine NaN NaN
3 Cheese Meat Eggs Milk Wine NaN NaN
4 Meat Pencil Wine NaN NaN NaN NaN

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 315 entries, 0 to 314
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 315 non-null object
1 1 285 non-null object
2 2 245 non-null object
3 3 187 non-null object
4 4 133 non-null object
5 5 71 non-null object
6 6 41 non-null object
dtypes: object(7)
memory usage: 17.4+ KB

data.isnull().sum()

0 0
1 30
2 70
3 128
4 182
5 244
6 274
dtype: int64

print(data.shape[1])

types = data.dtypes
print(types)
0 object
1 object
2 object
3 object
4 object
5 object
6 object
dtype: object

#Count total number of classes in Data


items = (data['0'].unique())
items

array(['Bread', 'Cheese', 'Meat', 'Eggs', 'Wine', 'Bagel', 'Pencil',


'Diaper', 'Milk'], dtype=object)

#Create list
transactions = []
for i in range(0, data.shape[0]):
transactions.append([str(data.values[i,j]) for j in range(0, 7)])

transactions
print(i)

314

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
te_ary

array([[False, True, True, ..., True, True, False],


[False, True, True, ..., True, True, False],
[False, False, True, ..., False, True, True],
...,
[False, True, True, ..., True, True, False],
[False, False, True, ..., False, False, True],
[ True, True, False, ..., False, True, True]])

df = pd.DataFrame(te_ary, columns=te.columns_)
print(df)

Bagel Bread Cheese Diaper Eggs Meat Milk Pencil Wine


nan
0 False True True True True True False True True
False
1 False True True True False True True True True
False
2 False False True False True True True False True
True
3 False False True False True True True False True
True
4 False False False False False True False True True
True
.. ... ... ... ... ... ... ... ... ...
...
310 False True True False True False False False False
True
311 False False False False False True True True False
True
312 False True True True True True False True True
False
313 False False True False False True False False False
True
314 True True False False True True False False True
True

[315 rows x 10 columns]

df=df[['Bagel','Bread','Cheese','Diaper','Eggs','Meat','Milk','Pencil'
,'Wine']]

# df.drop(axis = 1)

Bagel Bread Cheese Diaper Eggs Meat Milk Pencil Wine


nan
0 False True True True True True False True True
False
1 False True True True False True True True True
False
2 False False True False True True True False True
True
3 False False True False True True True False True
True
4 False False False False False True False True True
True
.. ... ... ... ... ... ... ... ... ...
...
310 False True True False True False False False False
True
311 False False False False False True True True False
True
312 False True True True True True False True True
False
313 False False True False False True False False False
True
314 True True False False True True False False True
True
[315 rows x 10 columns]

print(df)

Bagel Bread Cheese Diaper Eggs Meat Milk Pencil Wine


0 False True True True True True False True True
1 False True True True False True True True True
2 False False True False True True True False True
3 False False True False True True True False True
4 False False False False False True False True True
.. ... ... ... ... ... ... ... ... ...
310 False True True False True False False False False
311 False False False False False True True True False
312 False True True True True True False True True
313 False False True False False True False False False
314 True True False False True True False False True

[315 rows x 9 columns]

# print(df)
freq_items = apriori(df, min_support=0.2, use_colnames=True)
freq_items

support itemsets
0 0.425397 (Bagel)
1 0.504762 (Bread)
2 0.501587 (Cheese)
3 0.406349 (Diaper)
4 0.438095 (Eggs)
5 0.476190 (Meat)
6 0.501587 (Milk)
7 0.361905 (Pencil)
8 0.438095 (Wine)
9 0.279365 (Bread, Bagel)
10 0.225397 (Milk, Bagel)
11 0.238095 (Bread, Cheese)
12 0.231746 (Bread, Diaper)
13 0.206349 (Meat, Bread)
14 0.279365 (Milk, Bread)
15 0.200000 (Pencil, Bread)
16 0.244444 (Bread, Wine)
17 0.200000 (Cheese, Diaper)
18 0.298413 (Cheese, Eggs)
19 0.323810 (Meat, Cheese)
20 0.304762 (Milk, Cheese)
21 0.200000 (Pencil, Cheese)
22 0.269841 (Cheese, Wine)
23 0.234921 (Diaper, Wine)
24 0.266667 (Meat, Eggs)
25 0.244444 (Milk, Eggs)
26 0.241270 (Eggs, Wine)
27 0.244444 (Meat, Milk)
28 0.250794 (Meat, Wine)
29 0.219048 (Milk, Wine)
30 0.200000 (Pencil, Wine)
31 0.215873 (Meat, Cheese, Eggs)
32 0.203175 (Meat, Cheese, Milk)

#support, confidence, list


rules = association_rules(freq_items, metric="confidence",
min_threshold=0.6)
rules

antecedents consequents antecedent support consequent support


\
0 (Bagel) (Bread) 0.425397 0.504762

1 (Eggs) (Cheese) 0.438095 0.501587

2 (Meat) (Cheese) 0.476190 0.501587

3 (Cheese) (Meat) 0.501587 0.476190

4 (Milk) (Cheese) 0.501587 0.501587

5 (Cheese) (Milk) 0.501587 0.501587

6 (Wine) (Cheese) 0.438095 0.501587

7 (Eggs) (Meat) 0.438095 0.476190

8 (Meat, Cheese) (Eggs) 0.323810 0.438095

9 (Meat, Eggs) (Cheese) 0.266667 0.501587

10 (Cheese, Eggs) (Meat) 0.298413 0.476190

11 (Meat, Cheese) (Milk) 0.323810 0.501587

12 (Meat, Milk) (Cheese) 0.244444 0.501587

13 (Milk, Cheese) (Meat) 0.304762 0.476190

support confidence lift leverage conviction


zhangs_metric
0 0.279365 0.656716 1.301042 0.064641 1.442650
0.402687
1 0.298413 0.681159 1.358008 0.078670 1.563203
0.469167
2 0.323810 0.680000 1.355696 0.084958 1.557540
0.500891
3 0.323810 0.645570 1.355696 0.084958 1.477891
0.526414
4 0.304762 0.607595 1.211344 0.053172 1.270148
0.350053
5 0.304762 0.607595 1.211344 0.053172 1.270148
0.350053
6 0.269841 0.615942 1.227986 0.050098 1.297754
0.330409
7 0.266667 0.608696 1.278261 0.058050 1.338624
0.387409
8 0.215873 0.666667 1.521739 0.074014 1.685714
0.507042
9 0.215873 0.809524 1.613924 0.082116 2.616667
0.518717
10 0.215873 0.723404 1.519149 0.073772 1.893773
0.487091
11 0.203175 0.627451 1.250931 0.040756 1.337845
0.296655
12 0.203175 0.831169 1.657077 0.080564 2.952137
0.524816
13 0.203175 0.666667 1.400000 0.058050 1.571429
0.410959

list(rules)

['antecedents',
'consequents',
'antecedent support',
'consequent support',
'support',
'confidence',
'lift',
'leverage',
'conviction',
'zhangs_metric']

print(len(rules))

14

freq_items['length'] = freq_items['itemsets'].apply(lambda x: len(x))


freq_items

support itemsets length


0 0.425397 (Bagel) 1
1 0.504762 (Bread) 1
2 0.501587 (Cheese) 1
3 0.406349 (Diaper) 1
4 0.438095 (Eggs) 1
5 0.476190 (Meat) 1
6 0.501587 (Milk) 1
7 0.361905 (Pencil) 1
8 0.438095 (Wine) 1
9 0.279365 (Bread, Bagel) 2
10 0.225397 (Milk, Bagel) 2
11 0.238095 (Bread, Cheese) 2
12 0.231746 (Bread, Diaper) 2
13 0.206349 (Meat, Bread) 2
14 0.279365 (Milk, Bread) 2
15 0.200000 (Pencil, Bread) 2
16 0.244444 (Bread, Wine) 2
17 0.200000 (Cheese, Diaper) 2
18 0.298413 (Cheese, Eggs) 2
19 0.323810 (Meat, Cheese) 2
20 0.304762 (Milk, Cheese) 2
21 0.200000 (Pencil, Cheese) 2
22 0.269841 (Cheese, Wine) 2
23 0.234921 (Diaper, Wine) 2
24 0.266667 (Meat, Eggs) 2
25 0.244444 (Milk, Eggs) 2
26 0.241270 (Eggs, Wine) 2
27 0.244444 (Meat, Milk) 2
28 0.250794 (Meat, Wine) 2
29 0.219048 (Milk, Wine) 2
30 0.200000 (Pencil, Wine) 2
31 0.215873 (Meat, Cheese, Eggs) 3
32 0.203175 (Meat, Cheese, Milk) 3

freq_items[ (freq_items['length'] == 2) &


(freq_items['support'] >= 0.3) ]

support itemsets length


19 0.323810 (Meat, Cheese) 2
20 0.304762 (Milk, Cheese) 2

plt.scatter(rules['support'], rules['confidence'], alpha=0.5)


plt.xlabel('support')
plt.ylabel('confidence')
plt.title('Support vs Confidence')
plt.show()
plt.scatter(rules['support'], rules['lift'], alpha=0.5)
plt.xlabel('support')
plt.ylabel('lift')
plt.title('Support vs lift')
plt.show()
import pickle
filename = "model9.sav"
pickle.dump(te_ary, open(filename, "wb"))

You might also like