0% found this document useful (0 votes)

22 views60 pages

Lecture 6 - Other Data Science Tasks and Techniques

The document discusses various data mining tasks and techniques, focusing on association analysis and the Apriori algorithm for mining frequent patterns and correlations in transaction data. It highlights the importance of association rule mining in business applications such as market basket analysis, cross-marketing, and product placement. The document also explains key concepts like support, confidence, and lift, which are essential for evaluating the strength of association rules.

Uploaded by

Le Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views60 pages

Lecture 6 - Other Data Science Tasks and Techniques

Uploaded by

Le Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Data Science for Business

Lecture 6 – Other Data Science Tasks

and Techniques

Assoc. Prof. Pham Quoc Trung

[email protected]
Data Mining Tasks and Machine Learning

Unsupervise
d Learning:
Association
Analysis

Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Transaction Database
Transaction ID Items bought
T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D 3
Association
Analysis
4
Association Analysis:
Mining Frequent Patterns,
Association and Correlations

• Association Analysis
• Mining Frequent Patterns
• Association and Correlations
• Apriori Algorithm

5
Source: Han & Kamber (2006)
Market Basket Analysis

6
Source: Han & Kamber (2006)
Association Rule Mining
• Apriori Algorithm

Raw Transaction Data One-item Itemsets Two-item Itemsets Three-item Itemsets

Transaction SKUs Itemset Itemset Itemset

Support Support Support
No (Item No) (SKUs) (SKUs) (SKUs)

1 1, 2, 3, 4 1 3 1, 2 3 1, 2, 4 3
1 2, 3, 4 2 6 1, 3 2 2, 3, 4 3
1 2, 3 3 4 1, 4 3
1 1, 2, 4 4 5 2, 3 4
1 1, 2, 3, 4 2, 4 5
1 2, 4 3, 4 3

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 7
Association Rule Mining
• A very popular DM method in business
• Finds interesting relationships (affinities) between
variables (items or events)
• Part of machine learning family
• Employs unsupervised learning
• There is no output variable
• Also known as market basket analysis
• Often used as an example to describe DM to
ordinary people, such as the famous “relationship
between diapers and beers!”

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 8
Association Rule Mining
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a laptop computer and a virus
protection software, also bought extended service plan 70
percent of the time."
• How do you use such a pattern/knowledge?
• Put the items next to each other for ease of finding
• Promote the items as a package (do not put one on sale if the other(s)
are on sale)
• Place items far apart from each other so that the customer has to
walk the aisles to search for it, and by doing so potentially seeing and
buying other items

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 9
Association Rule Mining
• A representative applications of association rule
mining include
• In business: cross-marketing, cross-selling, store design,
catalog design, e-commerce site design, optimization of
online advertising, product pricing, and sales/promotion
configuration
• In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes and
their functions (to be used in genomics projects)…

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 10
Association Rule Mining
• Are all association rules interesting and useful?
A Generic Rule: X  Y [S%, C%]
X, Y: products and/or services
X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
Example: {Laptop Computer, Antivirus Software} 
{Extended Service Plan} [30%, 70%]

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 11
Association Rule Mining

• Algorithms are available for generating association rules

• Apriori
• Eclat
• FP-Growth
• + Derivatives and hybrids of the three
• The algorithms help identify the frequent item sets, which are, then
converted to association rules

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 12
Association Rule Mining
• Apriori Algorithm
• Finds subsets that are common to at least a minimum number of the itemsets
• uses a bottom-up approach
• frequent subsets are extended one item at a time (the size of frequent subsets increases from
one-item subsets to two-item subsets, then three-item subsets, and so on), and
• groups of candidates at each level are tested against the data for minimum

Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 13
Basic Concepts: Frequent Patterns and Association
Rules
Transaction-id Items bought • Itemset X = {x1, …, xk}
10 A, B, D • Find all the rules X → Y with minimum
20 A, C, D support and confidence
30 A, D, E • support, s, probability that a
40 B, E, F
transaction contains X  Y
50 B, C, D, E, F • confidence, c, conditional
probability that a transaction
Customer Customer having X also contains Y
buys both buys diaper

Let supmin = 50%, confmin = 50%

Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3}
Association rules:
Customer A → D (60%, 100%)
buys beer D → A (60%, 75%)
A → D (support = 3/5 = 60%, confidence = 3/3 =100%)
D → A (support = 3/5 = 60%, confidence = 3/4 = 75%)
14
Source: Han & Kamber (2006)
Market basket analysis
• Example
• Which groups or sets of items are customers likely to purchase on a given trip to
the store?
• Association Rule
• Computer → antivirus_software
[support = 2%; confidence = 60%]
• A support of 2% means that 2% of all the transactions under analysis show that computer
and antivirus software are purchased together.
• A confidence of 60% means that 60% of the customers who purchased a computer also
bought the software.

15
Source: Han & Kamber (2006)
Association rules

• Association rules are considered interesting if they satisfy both

• a minimum support threshold and
• a minimum confidence threshold.

16
Source: Han & Kamber (2006)
Frequent Itemsets,
Closed Itemsets, and
Association Rules

Support (A→ B) = P(A  B)

Confidence (A→ B) = P(B|A)

17
Source: Han & Kamber (2006)
Support (A→ B) = P(A  B)
Confidence (A→ B) = P(B|A)

• The notation P(A  B) indicates the probability that a transaction

contains the union of set A and set B
• (i.e., it contains every item in A and in B).
• This should not be confused with P(A or B), which indicates the
probability that a transaction contains either A or B.

18
Source: Han & Kamber (2006)
Does diaper purchase predict beer purchase?

• Contingency tables

Beer Beer
Yes No Yes No

No 6 94 100 23 77
diapers

diapers
40 60 100 23 77

DEPENDENT (yes) INDEPENDENT (no predictability)

Source: Dickey (2012) http://www4.stat.ncsu.edu/~dickey/SAScode/Encore_2012.ppt
Support (A→ B) = P(A  B)

Confidence (A→ B) = P(B|A)

Conf (A → B) = Supp (A  B)/ Supp (A)

Lift (A → B) = Supp (A  B) / (Supp (A) x Supp (B))

Lift (Correlation)
Lift (A→B) = Confidence (A→B) / Support(B)

20
Source: Dickey (2012) http://www4.stat.ncsu.edu/~dickey/SAScode/Encore_2012.ppt
Lift
Lift = Confidence / Expected Confidence if Independent

Checking No Yes
Saving (1500) (8500) (10000)
No 500 3500 4000

Yes 1000 5000 6000

SVG=>CHKG Expect 8500/10000 = 85% if independent

Observed Confidence is 5000/6000 = 83%
Lift = 83/85 < 1.
Savings account holders actually LESS likely than others to
have checking account !!!

21
Source: Dickey (2012) http://www4.stat.ncsu.edu/~dickey/SAScode/Encore_2012.ppt
Support & Confidence

A B C A C D B C D A D E B C E

Rule Support Confidence

AD 2/5 2/3
CA 2/5 2/4
AC 2/5 2/3
B&CD 1/5 1/3
22
Source: SAS Enterprise Miner Course Notes, 2014, SAS
Support & Confidence & Lift

Checking Account
No Yes

No 500 3500 4,000

Saving
Account
Yes 1000 5000 6,000

10,000
Support(SVG  CK) = 50%=5,000/10,000
Confidence(SVG  CK) = 83%=5,000/6,000
Expected Confidence(SVG  CK) = 85%=8,500/10,000
Lift (SVG → CK) = Confidence/Expected Confidence = 0.83/0.85 < 1
23
Source: SAS Enterprise Miner Course Notes, 2014, SAS
Support (A→B)
Confidence (A→B)
Expected Confidence (A→B)
Lift (A→B)

24
Support (A→ B) = P(A  B)
Count(A&B)/Count(Total)
Confidence (A→ B) = P(B|A)
Conf (A → B) = Supp (A  B)/ Supp (A)
Count(A&B)/Count(A)
Expected Confidence (A→B) = Support(B)
Count(B)

Lift (A → B) = Confidence (A→B) / Expected Confidence (A→B)

Lift (A → B) = Supp (A  B) / (Supp (A) x Supp (B))
Lift (Correlation)
Lift (A→B) = Confidence (A→B) / Support(B)
25
Lift (A→B)
• Lift (A→B)
= Confidence (A→B) / Expected Confidence (A→B)
= Confidence (A→B) / Support(B)
= (Supp (A&B) / Supp (A)) / Supp(B)
= Supp (A&B) / Supp (A) x Supp (B)

26
Minimum Support and
Minimum Confidence
• Rules that satisfy both a minimum support threshold (min_sup) and a
minimum confidence threshold (min_conf) are called strong.
• By convention, we write support and confidence values so as to occur
between 0% and 100%, rather than 0 to 1.0.

27
Source: Han & Kamber (2006)
K-itemset
• itemset
• A set of items is referred to as an itemset.
• K-itemset
• An itemset that contains k items is a k-itemset.
• Example:
• The set {computer, antivirus software} is a 2-itemset.

28
Source: Han & Kamber (2006)
Absolute Support and
Relative Support
• Absolute Support
• The occurrence frequency of an itemset is the number of transactions that
contain the itemset
• frequency, support count, or count of the itemset
• Ex: 3
• Relative support
• Ex: 60%

29
Source: Han & Kamber (2006)
Frequent Itemset

• If the relative support of an itemset I satisfies a prespecified

minimum support threshold, then I is a frequent itemset.
• i.e., the absolute support of I satisfies the corresponding minimum support
count threshold
• The set of frequent k-itemsets is commonly denoted by LK

30
Source: Han & Kamber (2006)
Confidence

• the confidence of rule A→ B can be easily derived

from the support counts of A and A  B.
• once the support counts of A, B, and A  B are found,
it is straightforward to derive the corresponding
association rules A→B and B→A and check whether
they are strong.
• Thus the problem of mining association rules can be
reduced to that of mining frequent itemsets.

31
Source: Han & Kamber (2006)
Association rule mining:
Two-step process
1. Find all frequent itemsets
• By definition, each of these itemsets will occur at least as frequently as a
predetermined minimum support count, min_sup.
2. Generate strong association rules from the frequent itemsets
• By definition, these rules must satisfy minimum support and minimum
confidence.

32
Source: Han & Kamber (2006)
Efficient and Scalable
Frequent Itemset Mining Methods
• The Apriori Algorithm
• Finding Frequent Itemsets Using Candidate Generation

33
Source: Han & Kamber (2006)
Apriori Algorithm

• Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant

in 1994 for mining frequent itemsets for Boolean association rules.
• The name of the algorithm is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties, as we shall see
following.

34
Source: Han & Kamber (2006)
Apriori Algorithm

• Apriori employs an iterative approach known as a level-wise

search, where k-itemsets are used to explore (k+1)-itemsets.
• First, the set of frequent 1-itemsets is found by scanning the
database to accumulate the count for each item, and
collecting those items that satisfy minimum support. The
resulting set is denoted L1.
• Next, L1 is used to find L2, the set of frequent 2-itemsets,
which is used to find L3, and so on, until no more frequent k-
itemsets can be found.
• The finding of each Lk requires one full scan of the database.

35
Source: Han & Kamber (2006)
Apriori Algorithm

• To improve the efficiency of the level-wise generation of frequent

itemsets, an important property called the Apriori property.
• Apriori property
• All nonempty subsets of a frequent itemset must also be frequent.

36
Source: Han & Kamber (2006)
Apriori algorithm
(1) Frequent Itemsets
(2) Association Rules

37
Transaction Database
Transaction ID Items bought
T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D 38
Table 1 shows a database with 10 transactions.
Let minimum support = 20% and minimum confidence = 80%.
Please use Apriori algorithm for generating association rules
from frequent itemsets.
Table 1: Transaction Database
Transaction Items bought
ID
T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D 39
Transaction
ID
Items bought
Apriori Algorithm Step 1-1
T01
T02
A, B, D
A, C, D
C1 → L1
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D

C1 L1
Itemset Support minimum Itemset Support
Count support = 20% Count
= 2 / 10 A 6
A 6
Min. Support
B 7 Count = 2 B 7
C 6 C 6
D 7 D 7
E 3 E 3

40
Transaction
ID
Items
bought Apriori Algorithm Step 1-2
C2 → L2
T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D

C2
T08 B, D
T09
T10
A, C, E
B, D
L2
Itemset Support Itemset Support
Count Count

L1 A, B
A, C
3
4 minimum
A, B 3
A, C 4
Itemset Support support = 20%
Count A, D 3 = 2 / 10 A, D 3
A 6 A, E 2 Min. Support A, E 2
B 7 B, C 3 Count = 2
B, C 3
C 6 B, D 6 B, D 6
D 7 B, E 2 B, E 2
E 3 C, D 3 C, D 3
C, E 3 C, E 3
D, E 1 41
Transaction
ID
Items
bought Apriori Algorithm Step 1-3
C3 → L3
T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D

C3 L3
T08 B, D
T09 A, C, E
T10 B, D

Itemset Support minimum Itemset Support

Count support = 20% Count
A, B, C 1 = 2 / 10
L2 A, B, D 2
Min. Support
A, B, D
A, C, E
2
2
Itemset Support
Count = 2
Count A, B, E 1 B, C, D 2
A, B 3 A, C, D 1 B, C, E 2
A, C 4
A, C, E 2
A, D 3
A, E 2 B, C, D 2
B, C 3 B, C, E 2
B, D 6
B, E 2
C, D 3
C, E 3
42
Generating Association Rules 2-1
Transaction Items
ID bought Step
T01 A, B, D
T02 A, C, D
T03 B, C, D, E minimum confidence = 80%
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D

L2 Association Rules
L1
Itemset Support
Count Generated from L2
A, B 3 A→B: 3/6 B→A: 3/7
Itemset Support
Count
A, C 4 A→C: 4/6 C→A: 4/6
A 6
A, D 3 A→D: 3/6 D→A: 3/7
B 7
C 6 A, E 2 A→E: 2/6 E→A: 2/3
D 7 B, C 3 B→C: 3/7 C→B: 3/6
E 3 B→D: 6/7=85.7% * D→B: 6/7=85.7% *
B, D 6
B, E 2 B→E: 2/7 E→B: 2/3
C, D 3 C→D: 3/6 D→C: 2/7
C, E 3 C→E: 3/6 E→C: 3/3=100% * 43
Generating Association Rules 2-2
Transaction Items
ID bought Step
T01 A, B, D
T02 A, C, D
T03 B, C, D, E minimum confidence = 80%

Association Rules
T04 A, B, D
T05 A, B, C, E
T06 A, C

Generated from L3
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D
A→BD: 2/6 B→CD: 2/7
B→AD: 2/7 C→BD: 2/6
D→AB: 2/7 D→BC: 2/7
L1 L2 L3 AB→D: 2/3 BC→D: 2/3
Itemset Support Itemset Support
Count Count Itemset Support AD→B: 2/3 BD→C: 2/6
A 6 A, B 3 Count BD→A: 2/6 CD→B: 2/3
B 7 A, C 4 A, B, D 2 A→CE: 2/6 B→CE: 2/7
C 6 A, D 3 A, C, E 2 C→AE: 2/6 C→BE: 2/6
D 7 A, E 2
B, C, D 2 E→AC: 2/3 E→BC: 2/3
E 3 B, C 3
B, C, E 2 AC→E: 2/4 BC→E: 2/3
B, D 6
B, E 2 AE→C: 2/2=100%* BE→C: 2/2=100%*
C, D 3
CE→A: 2/3 CE→B: 2/3
C, E 3 44
Frequent Itemsets and Association Rules
Transaction Items
ID bought
T01 A, B, D
T02
T03
A, C, D
B, C, D, E
L1 L2 L3
T04 A, B, D
Itemset Support Itemset Support
T05 A, B, C, E
Count Count Itemset Support
T06 A, C
T07 B, C, D A 6 A, B 3 Count
T08 B, D
B 7 A, C 4
T09 A, C, E A, B, D 2
T10 B, D C 6 A, D 3
D 7 A, E 2 A, C, E 2
E 3 B, C 3
B, C, D 2
B, D 6
B, E 2 B, C, E 2
minimum support = 20% C, D 3
C, E 3
minimum confidence = 80%

Association Rules:
B→D (60%, 85.7%) (Sup.: 6/10, Conf.: 6/7)
D→B (60%, 85.7%) (Sup.: 6/10, Conf.: 6/7)
E→C (30%, 100%) (Sup.: 3/10, Conf.: 3/3)
AE→C (20%, 100%) (Sup.: 2/10, Conf.: 2/2)
BE→C (20%, 100%) (Sup.: 2/10, Conf.: 2/2)
45
Table 1 shows a database with 10 transactions.
Let minimum support = 20% and minimum confidence = 80%.
Please use Apriori algorithm for generating association rules from frequent itemsets.

Transaction ID Items bought

T01 A, B, D
T02 A, C, D
T03 B, C, D, E
T04 A, B, D
T05 A, B, C, E
T06 A, C
T07 B, C, D
T08 B, D
T09 A, C, E
T10 B, D

•Complexity control:
• Support of association
• Let’s say that we require rules to apply to at least 0.01% of all transactions
• Confidence or strength of the rule
• Let’s say that we require that 5% or more of the time, a buyer of A also buys B

•Measuring surprise:

𝑝(𝐴,𝐵)
•𝐿𝑖𝑓𝑡 𝐴, 𝐵 =
𝑝 𝐴 ∗𝑝(𝐵)
Example: Beer and Lottery Tickets

• We operate a small convenience store where people buy groceries,

liquor, lottery tickets, etc. We estimate that:
• 30% of all transactions involve beer,
• 40% of all transactions involve lottery tickets,
• and 20% of the transactions include both beer and lottery tickets.
Example: Beer and Lottery Tickets

• If the two products are unrelated:

• 𝑝 𝑏𝑒𝑒𝑟 × 𝑝 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 0.12

• Otherwise:
0.2
• 𝐿𝑖𝑓𝑡 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = ≈ 1.67
0.12

• 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 0.2 − 0.12 = 0.08

• 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 20%

• 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 𝑝 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 𝑏𝑒𝑒𝑟 =

67%
Profiling: Finding Typical Behavior

• Profiling attempts to characterize the typical behavior of an

individual, group, or population
• Profiling can essentially involve clustering, if there are subgroups of
the population with different behaviors
Profiling
Profiling
Profiling
Profiling
Link Prediction and Social Recommendation

• Sometimes, instead of predicting a property (target value) of a data

item, it is more useful to predict connections between data items
• A common example of this is predicting that a link should exist
between two individuals
• Link prediction can also estimate the strength of a link
Data Reduction and Latent Information

• Trade-off between the insight or manageability gained against the

information lost
Latent Information and Movie
Recommendation
Bias, Variance, and Ensemble Methods

• The errors a model makes can be characterized by three factors:

• 1. Inherent randomness,
• 2. Bias, and
• 3. Variance.
Thanks!

Q&A

ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
Association Rule Mining Overview
No ratings yet
Association Rule Mining Overview
14 pages
Association Rules in Data Mining
No ratings yet
Association Rules in Data Mining
68 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Rani 2
No ratings yet
Rani 2
98 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Overview of Association Analysis
No ratings yet
Overview of Association Analysis
36 pages
Data Cube Computation and Data Generation
No ratings yet
Data Cube Computation and Data Generation
54 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Understanding Association Rule Mining
100% (1)
Understanding Association Rule Mining
131 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
Mining Frequent Patterns
No ratings yet
Mining Frequent Patterns
108 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
45 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Mining:: Association Rules Techniques
No ratings yet
Data Mining:: Association Rules Techniques
14 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
64 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
29427association Rule
No ratings yet
29427association Rule
12 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
77 pages
Data Mining & Association Rules
No ratings yet
Data Mining & Association Rules
39 pages
ACCTG 6910 Building Enterprise & Business Intelligence Systems (E.bis)
No ratings yet
ACCTG 6910 Building Enterprise & Business Intelligence Systems (E.bis)
26 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
12-Association Rule Learning
No ratings yet
12-Association Rule Learning
25 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
16 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Contents
No ratings yet
Contents
59 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data Mining: Frequent Itemsets & Clustering
No ratings yet
Data Mining: Frequent Itemsets & Clustering
152 pages
Association Rules
No ratings yet
Association Rules
39 pages
Unit IV
No ratings yet
Unit IV
86 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Frequent Patterns and Association Rules
No ratings yet
Frequent Patterns and Association Rules
13 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Unit - III
No ratings yet
Unit - III
27 pages
Lec 2
No ratings yet
Lec 2
18 pages
Slides03 - Items and Association
No ratings yet
Slides03 - Items and Association
17 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
Mining Frequent Patterns Ubnit 3
No ratings yet
Mining Frequent Patterns Ubnit 3
25 pages
Raw Material Orders: (TIME) Time For Oder ?
No ratings yet
Raw Material Orders: (TIME) Time For Oder ?
1 page
Agrawal 2019
No ratings yet
Agrawal 2019
11 pages
VSM - An Hoa
No ratings yet
VSM - An Hoa
1 page
5 - Bond and Stock Valuation (Compatibility Mode)
No ratings yet
5 - Bond and Stock Valuation (Compatibility Mode)
58 pages
International Business Management Assignment
No ratings yet
International Business Management Assignment
2 pages
International Business Management 2021-22 S2 - Individual Assignment 3
No ratings yet
International Business Management 2021-22 S2 - Individual Assignment 3
2 pages
TASK#3 - FULL - Pira Meriana - 222019061
No ratings yet
TASK#3 - FULL - Pira Meriana - 222019061
29 pages
E-Marketing Strategies and Challenges
No ratings yet
E-Marketing Strategies and Challenges
31 pages
Specifications & IC Data
No ratings yet
Specifications & IC Data
3 pages
Unit 5
No ratings yet
Unit 5
16 pages
Output Log
No ratings yet
Output Log
1 page
Long Tail vs. Superstars in Video Sales
No ratings yet
Long Tail vs. Superstars in Video Sales
50 pages
Laboratory Exercise 3 - RGB LED Lights Circuit: An Application of Analog Signal and PWM
No ratings yet
Laboratory Exercise 3 - RGB LED Lights Circuit: An Application of Analog Signal and PWM
3 pages
Jazz Band Sheet Music
100% (1)
Jazz Band Sheet Music
14 pages
UGBS 303 - Computer Application in Management
No ratings yet
UGBS 303 - Computer Application in Management
22 pages
Manual Tecnico Daewoo DWC 300M
No ratings yet
Manual Tecnico Daewoo DWC 300M
52 pages
How To Create A Work Breakdown Structure
No ratings yet
How To Create A Work Breakdown Structure
3 pages
Jkbose 10th Math Guess Paper 2025-26
No ratings yet
Jkbose 10th Math Guess Paper 2025-26
7 pages
Multi-echelon Inventory Optimization
No ratings yet
Multi-echelon Inventory Optimization
22 pages
A Review of Federated Learning in Renewable Energy Applications
No ratings yet
A Review of Federated Learning in Renewable Energy Applications
14 pages
Introduction To Communication Discussion 2 V5
No ratings yet
Introduction To Communication Discussion 2 V5
2 pages
Reciprocal Cost Allocation Guide
No ratings yet
Reciprocal Cost Allocation Guide
3 pages
Life Fitness Discover Engage Inspire Console Service Manual
No ratings yet
Life Fitness Discover Engage Inspire Console Service Manual
45 pages
17-1 CNLM Web PDF
No ratings yet
17-1 CNLM Web PDF
66 pages
Color Code Insights for Designers
No ratings yet
Color Code Insights for Designers
1 page
SIWES Technical Report
83% (29)
SIWES Technical Report
68 pages
Estintore 6kg DATASHEET
No ratings yet
Estintore 6kg DATASHEET
5 pages
Paytm UPI Statement 17 Jun'25 - 16 Jul'25
No ratings yet
Paytm UPI Statement 17 Jun'25 - 16 Jul'25
9 pages
Metro Account Statement Summary
No ratings yet
Metro Account Statement Summary
3 pages
January 2017 (IAL) QP - S2 Edexcel
No ratings yet
January 2017 (IAL) QP - S2 Edexcel
12 pages
Conceptual Framework of Resource Allocation Procedures and Practices in Construction Projects
No ratings yet
Conceptual Framework of Resource Allocation Procedures and Practices in Construction Projects
8 pages
San Nicolas 2nd Dilg Jjwa Form 1 2 4
No ratings yet
San Nicolas 2nd Dilg Jjwa Form 1 2 4
2 pages
RCS Board Data Collection Report
No ratings yet
RCS Board Data Collection Report
16 pages
Syllabus For Half Yearly Examination 2025-26
No ratings yet
Syllabus For Half Yearly Examination 2025-26
2 pages
Short Plumb Bob Stories Short Plumb Bob Stories
No ratings yet
Short Plumb Bob Stories Short Plumb Bob Stories
12 pages
A High-Density 5kW 800V To 48VDC DC Converter For Vehicle Applications
No ratings yet
A High-Density 5kW 800V To 48VDC DC Converter For Vehicle Applications
5 pages

Lecture 6 - Other Data Science Tasks and Techniques

Uploaded by

Lecture 6 - Other Data Science Tasks and Techniques

Uploaded by

Data Science for Business

Lecture 6 – Other Data Science Tasks

Assoc. Prof. Pham Quoc Trung

Raw Transaction Data One-item Itemsets Two-item Itemsets Three-item Itemsets

Transaction SKUs Itemset Itemset Itemset

• Algorithms are available for generating association rules

Let supmin = 50%, confmin = 50%

• Association rules are considered interesting if they satisfy both

Support (A→ B) = P(A  B)

• The notation P(A  B) indicates the probability that a transaction

DEPENDENT (yes) INDEPENDENT (no predictability)

Confidence (A→ B) = P(B|A)

Lift (A → B) = Supp (A  B) / (Supp (A) x Supp (B))

Yes 1000 5000 6000

SVG=>CHKG Expect 8500/10000 = 85% if independent

Rule Support Confidence

No 500 3500 4,000

Lift (A → B) = Confidence (A→B) / Expected Confidence (A→B)

• If the relative support of an itemset I satisfies a prespecified

• the confidence of rule A→ B can be easily derived

• Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant

• Apriori employs an iterative approach known as a level-wise

• To improve the efficiency of the level-wise generation of frequent

Itemset Support minimum Itemset Support

Transaction ID Items bought

• We operate a small convenience store where people buy groceries,

• If the two products are unrelated:

• 𝑝 𝑏𝑒𝑒𝑟 × 𝑝 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 0.12

• 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 0.2 − 0.12 = 0.08

• 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 20%

• 𝑆𝑡𝑟𝑒𝑛𝑔𝑡ℎ 𝑏𝑒𝑒𝑟, 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 = 𝑝 𝑙𝑜𝑡𝑡𝑒𝑟𝑦 𝑡𝑖𝑐𝑘𝑒𝑡𝑠 𝑏𝑒𝑒𝑟 =

• Profiling attempts to characterize the typical behavior of an

• Sometimes, instead of predicting a property (target value) of a data

• Trade-off between the insight or manageability gained against the

• The errors a model makes can be characterized by three factors:

You might also like