0% found this document useful (0 votes)

90 views54 pages

Mining: Association Rules

The document discusses association rule mining and the Apriori algorithm. It provides the following key points: 1. Association rule mining is used to find frequent patterns and correlations among items in transactional databases. Rules have the form X => Y, where X and Y are sets of items. 2. The Apriori algorithm employs a level-wise search to efficiently find all frequent itemsets that meet a minimum support threshold. It generates candidate itemsets of size k from frequent itemsets of size k-1. 3. The algorithm performs two database scans: one to calculate support for candidates, and another to determine the frequent itemsets. It terminates when no larger candidate itemsets are found.

Uploaded by

anon_947471502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views54 pages

Mining: Association Rules

Uploaded by

anon_947471502

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

Mining Association

Rules

Md Tabrez Nafis
Department of Computer Science &
Engineering
JAMIA HAMDARD, New Delhi
Association Rules
 Mining Association Rules between Sets of
Items in Large Databases
(R. Agrawal, T. Imielinski & A. Swami) 1993.

 Fast Algorithms for Mining Association Rules

(R. Agrawal & R. Srikant) 1994.

What are Association Rules?
 Study of “what goes with what”
 “Customers who bought X also bought Y”
 What symptoms go with what diagnosis
 Transaction-based or event-based
 Also called “market basket analysis” and
“affinity analysis”
 Originated with study of customer
transactions databases to determine
associations among items purchased
Association rule mining
 It is an important data mining model studied
extensively by the database and data mining
community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to find
how items purchased by customers are related.
Used in many recommender
systems
Basket Data

Retail organizations, e.g., supermarkets,

collect and store massive amounts of sales
data, called basket data.
A record consist of
 transaction date
 items bought
Or, basket data may consist of items bought
by a customer over a period.
Generating Rules
Terms
“IF” part = antecedent
“THEN” part = consequent

“Item set” = the items (e.g., products)

comprising the antecedent or consequent

 Antecedent and consequent are disjoint

(i.e., have no items in common)
Tiny Example: Phone
Faceplates
Many Rules are Possible

For example: Transaction 1 supports several

rules, such as
 “If red, then white” (“If a red faceplate is
purchased, then so is a white one”)
 “If white, then red”
 “If red and white, then green”
 + several more
Frequent Item Sets

 Ideally, we want to create all possible

combinations of items
 Problem: computation time grows
exponentially as # items increases
 Solution: consider only “frequent item
sets”
 Criterion for frequent: support
Support

Support = # (or percent) of transactions that

include both the antecedent and the
consequent

Example: support for the item set {red,

white} is 4 out of 10 transactions, or 40%
Example Association Rule

90% of transactions that purchase bread and

butter also purchase milk

Antecedent: bread and butter

Consequent: milk
Confidence factor: 90%
Association rules
Support
Every association rule has a support and a confidence.
“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2,

…)

1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)

support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)
2. Association rules
Support

An itemset is called frequent if its support is equal or

greater than an agreed upon minimal value – the support
threshold

add to previous example:

if threshold 50%
then itemsets {8,12} and {1} called frequent
2. Association rules
Confidence
Every association rule has a support and a confidence.

An association rule is of the form: X => Y

 X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X

present in a transaction , Y will also be present.

Confidence measure, by definition:

Confidence(X=>Y) equals support(X,Y) / support(X)
2. Association rules
Confidence

We should only consider rules derived from

itemsets with high support, and that also have
high confidence.

“A rule with low confidence is not meaningful.”

Rules don’t explain anything, they just point out

hard facts in data volumes.
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ?

supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.

Conf ( {8} => {5} ) ? 57% Done.

Rule ( {5} => {8} ) more meaningful then

Rule ( {8} => {5} )
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {9} => {3} ) ?

supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
The model: data

 I = {i1, i2, …, im}: a set of items.

 Transaction t :
 t a set of items, and t  I.

 Transaction Database T: a set of transactions

T = {t1, t2, …, tn}.
Transaction data:
supermarket data
 Market basket transactions:

t1: {bread, cheese, milk}

t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}
 Concepts:
 An item: an item/article in a basket
 I: the set of all items sold in the store
 A transaction: items purchased in a basket; it may
have TID (transaction ID)
 A transactional dataset: A set of transactions
The model: rules
 A transaction t contains X, a set of items
(itemset) in I, if X  t.
 An association rule is an implication of the
form:
X  Y, where X, Y  I, and X Y = 

 An itemset is a set of items.

 E.g., X = {milk, bread, cereal} is an itemset.
 A k-itemset is an itemset with k items.
 E.g., {milk, bread, cereal} is a 3-itemset
Support and Confidence
 Support count: The support count of an
itemset X, denoted by X.count, in a data set
T is the number of transactions in T that
contain X. Assume T has n transactions.
 Then,
( X  Y ).count
support 
n
( X  Y ).count
confidence 
X .count
Goal and key features
 Goal: Find all rules that satisfy the user-
specified minimum support (minsup) and
minimum confidence (minconf).
 Key Features
 Completeness: find all rules.
 No target item(s) on the right-hand-side
 Mining with data on hard disk (not in memory)
Transaction data
representation
 A simplistic view of shopping baskets,
 Some important information not considered.
E.g,
 the quantity of each item purchased and
 the price paid.
Many mining algorithms
 There are a large number of them!!
 They use different strategies and data structures.
 Their resulting sets of rules are all the same.
 Given a transaction data set T, and a minimum support and
a minimum confident, the set of association rules existing in
T is uniquely determined.
 Any algorithm should find the same set of rules
although their computational efficiencies and
memory requirements may be different.
 We study only one: the Apriori Algorithm
The Apriori Algorithm
 The name, Apriori, is based on the fact that the algorithm
uses prior knowledge of frequent itemset properties
 Apriori employs an iterative approach known as a level-wise
search, where k-itemsets are used to explore (k+1)-
itemsets
 The first pass determines the frequent 1-itemsets

denoted L1
 A subsequence pass k consists of two phases
 First, the frequent itemsets Lk-1 are used to generate the
candidate itemsets Ck
 Next, the database is scanned and the support of candidates in
Ck is counted
 The frequent itemsets Lk are determined
Generating Frequent Item
Sets
For k products…
1. User sets a minimum support criterion
2. Next, generate list of one-item sets that
meet the support criterion
3. Use the list of one-item sets to generate
list of two-item sets that meet the support
criterion
4. Use list of two-item sets to generate list of
three-item sets
5. Continue up through k-item sets
The Apriori algorithm
 Probably the best known algorithm
 Two steps:
 Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
 Use frequent itemsets to generate rules.

 E.g., a frequent itemset

{Chicken, Clothes, Milk} [sup = 3/7]
and one rule from the frequent itemset
Clothes  Milk, Chicken [sup = 3/7, conf =
3/3]
The Apriori Algorithm — Example
 Min support =50%
 Database D itemset sup.
C
 L1 itemset sup.
TID Items 1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
 Scan D
200 235 {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
 C2 itemset sup C2 itemset

 L2 itemset sup {1 2} 1  Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
 C3 itemset  Scan D  L3 itemset sup
{2 3 5} {2 3 5} 2
The Apriori Algorithm—Example

 Let the minimum support be 2

The Apriori Algorithm—Example
Picture of A-Priori

 Item Frequent items

counts 

Counts of pairs
of
Frequent items

 Pass 1  Pass 2
A-Priori Using Triangular Matrix
Why would we even
mention the infrequent items?

 Old #’s New #’s

 Item counts 1. 1
2. -
3. 2
Counts of
pairs of
frequent
items

 Pass 1  Pass 2
Frequent Triples, Etc.

 For each size of itemsets k, we construct two

sets of k-sets (sets of size k):
 Ck = candidate k-sets = those that might be
frequent sets (support > s) based on information
from the pass for itemsets of size k – 1.
 Lk = the set of truly frequent k-sets.
Count All pairs Count
All Go on
the items of items the pairs
items from L1

Filter

Filter
C1 L1 Construct C2 L2 Construct C3

First Second
pass pass

Frequent Frequent
items pairs
Step 1: Mining all frequent
itemsets
 A frequent itemset is an itemset whose support

is ≥ minsup.
 Key idea: The apriori property (downward
closure property): any subsets of a frequent
itemset are also frequent itemsets
ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D
The Algorithm
 Iterative algo. (also called level-wise search):
Find all 1-item frequent itemsets; then all 2-item
frequent itemsets, and so on.
 In each iteration k, only consider itemsets that

contain some k-1 frequent itemset.

 Find frequent itemsets of size 1: F1
 From k = 2
 Ck = candidates of size k: those itemsets of size k
that could be frequent, given Fk-1
 Fk = those itemsets that are actually frequent, Fk
 Ck (need to scan the database once).
Dataset T TID Items
Example – minsup=0.5 T100 1, 3, 4
Finding frequent itemsets T200 2, 3, 5
T300 1, 2, 3, 5
T400 2, 5
itemset:count
1. scan T  C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3
 F1: {1}:2, {2}:3, {3}:3, {5}:3

 C2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}

2. scan T  C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2
 F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2

 C3: {2, 3,5}

3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}

Details: the algorithm
Algorithm Apriori(T)
C1  init-pass(T);
F1  {f | f  C1, f.count/n  minsup}; // n: no. of transactions in T
for (k = 2; Fk-1  ; k++) do
Ck  candidate-gen(Fk-1);
for each transaction t  T do
for each candidate c  Ck do
if c is contained in t then
c.count++;
end
end
Fk  {c  Ck | c.count/n  minsup}
end
return F  k Fk;
Apriori candidate
generation
 The candidate-gen function takes Fk-1 and
returns a superset (called the candidates)
of the set of all frequent k-itemsets. It has
two steps
 join step: Generate all possible candidate
itemsets Ck of length k
 prune step: Remove those candidates in Ck
that cannot be frequent.
Candidate-gen function
Function candidate-gen(Fk-1)
Ck  ;
forall f1, f2  Fk-1
with f1 = {i1, … , ik-2, ik-1}
and f2 = {i1, … , ik-2, i’k-1}
and ik-1 < i’k-1 do
c  {i1, …, ik-1, i’k-1}; // join f1 and f2
Ck  Ck  {c};
for each (k-1)-subset s of c do
if (s  Fk-1) then
delete c from Ck; // prune
end
end
return Ck;
An example
 F3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4},
{1, 3, 5}, {2, 3, 4}}

 After join
 C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}
 After pruning:
 C4 = {{1, 2, 3, 4}}
because {1, 4, 5} is not in F3 ({1, 3, 4, 5} is removed)
Step 2: Generating rules from
frequent itemsets
 Frequent itemsets  association rules
 One more step is needed to generate
association rules
 For each frequent itemset X,
For each proper nonempty subset A of X,
 Let B = X - A
 A  B is an association rule if
 Confidence(A  B) ≥ minconf,

support(A  B) = support(AB) = support(X)

confidence(A  B) = support(A  B) / support(A)
Generating rules: an example
 Suppose {2,3,4} is frequent, with sup=50%
 Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with
sup=50%, 50%, 75%, 75%, 75%, 75% respectively
 These generate these association rules:
 2,3  4, confidence=100%
 2,4  3, confidence=100%
 3,4  2, confidence=67%
 2  3,4, confidence=67%
 3  2,4, confidence=67%
 4  2,3, confidence=67%
 All rules have support = 50%
Generating rules: summary
 To recap, in order to obtain A  B, we need
to have support(A  B) and support(A)
 All the required information for confidence
computation has already been recorded in
itemset generation. No need to see the data
T any more.
 This step is not as time-consuming as
frequent itemsets generation.
On Apriori Algorithm
Seems to be very expensive
 Level-wise search

 K = the size of the largest itemset

 It makes at most K passes over data

 In practice, K is bounded (10).

 The algorithm is very fast. Under some conditions,

all rules can be found in linear time.

 Scale up to large data sets
More on association rule
mining
 Clearly the space of all association rules is
exponential.
 The mining exploits sparseness of data, and
high minimum support and high minimum
confidence values.
 Still, it always produces a huge number of
rules, thousands, tens of thousands, millions,
...
Advantage of Apriori
Algorithm
(Summary)
 The Apriori Algorithm calculates more sets of

frequent items.
Limitations of Apriori
Algorithm
(Summary)
 Needs several iterations of the data

 The candidate generation could be

extremely slow (pairs, triplets, etc.
 Uses a uniform minimum support
threshold
 Difficulties to find rarely occuring events
 Alternative methods (other than appriori)
can address this by using a non-uniform
minimum support thresold
Contd…

 The counting method iterates through all of

the transactions each time.
 Constant items make the algorithm a lot
heavier.
 Huge memory consumption
 End

Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Association Rules in Data Mining
No ratings yet
Association Rules in Data Mining
68 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rules
No ratings yet
Association Rules
33 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rules
No ratings yet
Association Rules
24 pages
Understanding Association Rule Mining
100% (1)
Understanding Association Rule Mining
131 pages
Data Mining for Retail Insights
No ratings yet
Data Mining for Retail Insights
44 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
17 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rule Mining Basics
No ratings yet
Association Rule Mining Basics
45 pages
AI & ML: Association Rule Mining
No ratings yet
AI & ML: Association Rule Mining
46 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Association Rules & Clustering Techniques
No ratings yet
Association Rules & Clustering Techniques
13 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Data Mining: Association Rules Guide
No ratings yet
Data Mining: Association Rules Guide
18 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
91 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Market Basket Analysis with Association Rules
No ratings yet
Market Basket Analysis with Association Rules
54 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Apriori Algorithm in Association Analysis
No ratings yet
Apriori Algorithm in Association Analysis
32 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Contents
No ratings yet
Contents
59 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Enhancing Apriori Algorithm Efficiency
No ratings yet
Enhancing Apriori Algorithm Efficiency
27 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
Chapter 7
No ratings yet
Chapter 7
8 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
Frequent Patterns and Association Rules
No ratings yet
Frequent Patterns and Association Rules
13 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
15 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
15 pages
DWDM Unit IV Mining - FP Association Rules
No ratings yet
DWDM Unit IV Mining - FP Association Rules
82 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
CRM Survey Tool Implementation Report
No ratings yet
CRM Survey Tool Implementation Report
5 pages
H 7 KLMF 3 J
No ratings yet
H 7 KLMF 3 J
3 pages
Cs 511 Final Complete Book
100% (1)
Cs 511 Final Complete Book
273 pages
Digital Systems Exam Guide
No ratings yet
Digital Systems Exam Guide
4 pages
ViewSonic Monitor Driver Updates
No ratings yet
ViewSonic Monitor Driver Updates
1,087 pages
PE-1 (Unix Programming) - Unit - 4 - Process and Signals
No ratings yet
PE-1 (Unix Programming) - Unit - 4 - Process and Signals
47 pages
Crack Windows 2
No ratings yet
Crack Windows 2
2 pages
Process Flowchart Introduction
93% (14)
Process Flowchart Introduction
20 pages
Recursive Subdivision in Flash
No ratings yet
Recursive Subdivision in Flash
15 pages
Business Process Reengineering 2
No ratings yet
Business Process Reengineering 2
33 pages
Dot Net Lab
No ratings yet
Dot Net Lab
77 pages
Role of IT in Supply Chain Management
No ratings yet
Role of IT in Supply Chain Management
8 pages
Leadthrough Programming and Motion Interpolation
60% (5)
Leadthrough Programming and Motion Interpolation
18 pages
Fundamentals of IT Governance Based On I
No ratings yet
Fundamentals of IT Governance Based On I
4 pages
User Defined Functions
No ratings yet
User Defined Functions
31 pages
Google’s Global Search Challenges
No ratings yet
Google’s Global Search Challenges
5 pages
Fix Machine Error 46 on Brother Printers
No ratings yet
Fix Machine Error 46 on Brother Printers
2 pages
Computer-Studies-Talk 2022 by Masega
No ratings yet
Computer-Studies-Talk 2022 by Masega
14 pages
Enterprise Storage Solutions
0% (1)
Enterprise Storage Solutions
4 pages
PicMonkey Photo Editing Guide
No ratings yet
PicMonkey Photo Editing Guide
72 pages
++++how IBPS TWSS Is Calculated++++
No ratings yet
++++how IBPS TWSS Is Calculated++++
2 pages
Volume Problem Solving Guide
No ratings yet
Volume Problem Solving Guide
17 pages
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
4 pages
Ucce Lab Design d3
No ratings yet
Ucce Lab Design d3
20 pages
GIT - Quick Pack (English Medium)
No ratings yet
GIT - Quick Pack (English Medium)
35 pages
PLC vs DCS: Key Differences Explained
100% (2)
PLC vs DCS: Key Differences Explained
11 pages
Computer Science Paper 1 HL
No ratings yet
Computer Science Paper 1 HL
6 pages
Parking Management System
No ratings yet
Parking Management System
50 pages
COSO ERM Framework Overview
No ratings yet
COSO ERM Framework Overview
65 pages
Binomial Expansion
No ratings yet
Binomial Expansion
2 pages

Mining: Association Rules

Uploaded by

Mining: Association Rules

Uploaded by

Mining Association

 Fast Algorithms for Mining Association Rules

(R. Agrawal & R. Srikant) 1994.

Retail organizations, e.g., supermarkets,

“Item set” = the items (e.g., products)

 Antecedent and consequent are disjoint

For example: Transaction 1 supports several

 Ideally, we want to create all possible

Support = # (or percent) of transactions that

Example: support for the item set {red,

90% of transactions that purchase bread and

Antecedent: bread and butter

Example: Database with transactions ( customer_# : item_a1, item_a2,

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)

An itemset is called frequent if its support is equal or

add to previous example:

An association rule is of the form: X => Y

 X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X

Confidence measure, by definition:

We should only consider rules derived from

“A rule with low confidence is not meaningful.”

Rules don’t explain anything, they just point out

Conf ( {5} => {8} ) ?

Conf ( {5} => {8} ) ? 80% Done.

Rule ( {5} => {8} ) more meaningful then

Conf ( {9} => {3} ) ?

 I = {i1, i2, …, im}: a set of items.

 Transaction Database T: a set of transactions

t1: {bread, cheese, milk}

 An itemset is a set of items.

 E.g., a frequent itemset

 Let the minimum support be 2

 Item Frequent items

 Old #’s New #’s

 For each size of itemsets k, we construct two

contain some k-1 frequent itemset.

 C2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}

 C3: {2, 3,5}

3. scan T  C3: {2, 3, 5}:2  F3: {2, 3, 5}

support(A  B) = support(AB) = support(X)

 K = the size of the largest itemset

 It makes at most K passes over data

 In practice, K is bounded (10).

 The algorithm is very fast. Under some conditions,

all rules can be found in linear time.

 The candidate generation could be

 The counting method iterates through all of

You might also like