0% found this document useful (0 votes)

53 views37 pages

New Association Rule

Uploaded by

pradibirajdar57

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views37 pages

New Association Rule

Uploaded by

pradibirajdar57

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Data Mining:

Association
Mining Association Rules
in Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
What Is Association Mining?

 Association rule mining:

 Finding frequent patterns, associations,

correlations, or causal structures among sets of
items or objects in transaction databases,
relational databases, and other information
repositories.

 Applications:
 Basket data analysis, cross-marketing, catalog
design, loss-leader analysis, clustering,
classification, etc.
Example:

Computer=>antivirus_software [support=2%, confidence=60%]

A support of 2% for above rule

means that 2% of all the
transactions under analysis show
that computer and antivirus software
are purchased together.

A confidence of 60% means that

60% of the customers who
purchased a computer also bought
Association Rule: Basic Concepts

 Given: (1) database of transactions, (2) each

transaction is a list of items (purchased by a
customer in a visit)

 Find: all rules that correlate the presence of one set

of items with that of another set of items

 E.g., 98% of people who purchase tires and auto

accessories also get automotive services done
Association Rule Mining: A
Road Map
 Boolean vs. quantitative associations (Based
on the types of values handled)

 buys(x, “SQLServer”) ^ buys(x, “DMBook”)

® buys(x, “DBMiner”) [0.2%, 60%]

 age(x, “30..39”) ^ income(x, “42..48K”) ®

buys(x, “PC”) [1%, 75%]
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Mining Association Rules—An Example

Transaction ID Items Bought Min. support 50%

2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C: {A,C} 50%
support = support({A C}) = 50%
confidence = support({A C})/support({A}) =
66.6%
The Apriori principle:
Any subset of a frequent itemset must be
Mining Frequent
Itemsets: the Key Step
 Find the frequent itemsets: the sets of
items that have minimum support
 A subset of a frequent itemset must also be a
frequent itemset
 i.e., if {AB} is a frequent itemset, both {A} and
{B} should be a frequent itemset
 Iteratively find frequent itemsets with
cardinality from 1 to k (k-itemset)
 Use the frequent itemsets to generate
association rules.
The Apriori Algorithm
 Join Step: C is generated by joining L with
k k-1

itself
 Prune Step: Any (k-1)-itemset that is not
frequent cannot be a subset of a frequent k-
itemset
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
The Apriori Algorithm — Example

Database D itemset sup.

L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
The Apriori Algorithm — Example

 Example 2:
How to Generate Candidates?
 Suppose the items in Lk-1 are listed in an
order
 Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2,
p.itemk-1 < q.itemk-1

 Step 2: pruning
forall itemsets c in Ck do
How to Count Supports of
Candidates?
 Why counting supports of candidates a problem?
 The total number of candidates can be very
huge
 One transaction may contain many
candidates
 Method:
 Candidate itemsets are stored in a hash-tree
 Leaf node of hash-tree contains a list of
itemsets and counts
 Interior node contains a hash table
 Subset function: finds all the candidates
Example of Generating
Candidates

 L3={abc, abd, acd, ace, bcd}

 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace

 Pruning:
 acde is removed because ade is not in L3

 C4={abcd}
Methods to Improve Apriori’s
Efficiency

 Hash-based itemset counting: A k-itemset whose

corresponding hashing bucket count is below the
threshold cannot be frequent
 Transaction reduction: A transaction that does not contain
any frequent k-itemset is useless in subsequent scans
 Partitioning: Any itemset that is potentially frequent in DB
must be frequent in at least one of the partitions of DB
 Sampling: mining on a subset of given data, lower support
threshold + a method to determine the completeness
 Dynamic itemset counting: add new candidate itemsets
only when all of their subsets are estimated to be frequent
Visualization of Association Rule Using Plane Graph
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Multiple-Level Association
Rules
Food
 Items often form
hierarchy. milk bread
 Items at the lower level
are expected to have skim 2% wheat white
lower support.
 Rules regarding itemsets Fraser Sunset
at
appropriate levels could TID Items
be quite useful. T1 {111, 121, 211, 221}
 Transaction database can T2 {111, 211, 222, 323}
be encoded based on T3 {112, 122, 221, 411}
dimensions and levels T4 {111, 121}
 We can explore shared T5 {111, 122, 211, 221, 413}
Mining Multi-Level
Associations
 A top_down, progressive deepening approach:
 First find high-level strong rules:
milk ® bread [20%, 60%].
 Then find their lower-level “weaker” rules:
2% milk ® wheat bread [6%, 50%].
 Variations at mining multiple-level association rules.
 Level-crossed association rules:
2% milk ® Wonder wheat bread
 Association rules with multiple, alternative
hierarchies:
2% milk ® Wonder bread
Multi-level Association: Uniform
Support vs. Reduced Support
 Uniform Support: the same minimum support for all
levels
 + One minimum support threshold. No need to examine
itemsets containing any item whose ancestors do not have
minimum support.
 – Lower level items do not occur as frequently. If support
threshold
 too high  miss low level associations
 too low  generate too many high level associations
 Reduced Support: reduced minimum support at
lower levels
 There are 4 search strategies:
 Level-by-level independent
 Level-cross filtering by k-itemset
 Level-cross filtering by single item
 Controlled level-cross filtering by single item
Uniform Support
Multi-level mining with uniform support

Level 1 Milk
min_sup = 5%
[support = 10%]

Level 2 2% Milk Skim Milk

min_sup = 5% [support = 6%] [support = 4%]

Back
Reduced Support
Multi-level mining with reduced support

Level 1 Milk
min_sup = 5%
[support = 10%]

Level 2 2% Milk Skim Milk

min_sup = 3% [support = 6%] [support = 4%]

Back
Multi-level Association:
Redundancy Filtering

 Some rules may be redundant due to

“ancestor” relationships between items.
 Example
 milk  wheat bread [support = 8%, confidence =
70%]
 2% milk  wheat bread [support = 2%, confidence =
72%]
 We say the first rule is an ancestor of the
second rule.
 A rule is redundant if its support is close to
the “expected” value, based on the rule’s
Multi-Level Mining:
Progressive Deepening
 A top-down, progressive deepening
approach:
 First mine high-level frequent items:
milk (15%), bread (10%)
 Then mine their lower-level “weaker”
frequent itemsets:
2% milk (5%), wheat bread
(4%)
 Different min_support threshold across
multi-levels lead to different algorithms:
 If adopting the same min_support across
multi-levels
then toss t if any of t’s ancestors is infrequent.
 If adopting reduced min_support at lower
Progressive Refinement of
Data Mining Quality
 Why progressive refinement?
 Mining operator can be expensive or cheap, fine
or rough
 Trade speed with quality: step-by-step
refinement.
 Superset coverage property:
 Preserve all the positive answers—allow a
positive false test but not a false negative test.
 Two- or multi-step mining:
 First apply rough/cheap operator (superset
coverage)
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Multi-Dimensional
Association: Concepts
 Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
 Multi-dimensional rules:  2 dimensions or predicates
 Inter-dimension association rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)
 hybrid-dimension association rules (repeated predicates)
age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
 Categorical Attributes
 finite number of possible values, no ordering among values
 Quantitative Attributes
 numeric, implicit ordering among values
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Interestingness
Measurements
 Objective measures
Two popular measurements:
 support; and
 confidence

 Subjective measures (Silberschatz &

Tuzhilin, KDD95)
A rule (pattern) is interesting if
 it is unexpected (surprising to the user);
and/or
 actionable (the user can do something
with it)
Criticism to Support and
Confidence
 Example 1: (Aggarwal & Yu, PODS98)
 Among 5000 students
 3000 play basketball
 3750 eat cereal
 2000 both play basket ball and eat cereal
 play basketball  eat cereal [40%, 66.7%] is misleading
because the overall percentage of students eating cereal
is 75% which is higher than 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is far more
accurate, although with lower support and confidence
basketball not basketball sum(row)
cereal 2000 1750 3750
not cereal 1000 250 1250
sum(col.) 3000 2000 5000
Criticism to Support and
Confidence (Cont.)
 Example 2:
 X and Y: positively X 1 1 1 1 0 0 0 0
correlated, Y 1 1 0 0 0 0 0 0
 X and Z, negatively related Z 0 1 1 1 1 1 1 1
 support and confidence of
X=>Z dominates
 We need a measure of
Rule Support Confidence
dependent or A B)
P(correlated X=>Y 25% 50%
corr
eventsA, B 
P( A) P( B) X=>Z 37.50% 75%


Other Interestingness Measures:
Interest
 Interest (correlation, lift)P( A  B)
P ( A) P ( B )
 taking both P(A) and P(B) in consideration
 P(A^B)=P(B)*P(A), if A and B are independent
events
 A and B negatively correlated, if the value is less
than 1; otherwise A and B positively
Itemset Support correlated
Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Constraint-Based Mining
 Interactive, exploratory mining giga-bytes of data?
 Could it be real? — Making good use of constraints!
 What kinds of constraints can be used in mining?
 Knowledge type constraint: classification, association,
etc.
 Data constraint: SQL-like queries
 Find product pairs sold together in Vancouver in
Dec.’98.
 Dimension/level constraints:
 in relevance to region, price, brand, customer
category.
 Rule constraints
 small sales (price < $10) triggers big sales (sum >
$200).
 Interestingness constraints:
 strong rules (min_support  3%, min_confidence 
Mining Association Rules in
Large Databases
 Association rule mining
 Mining single-dimensional Boolean association
rules from transactional databases
 Mining multilevel association rules from
transactional databases
 Mining multidimensional association rules from
transactional databases and data warehouse
 From association mining to correlation analysis
 Constraint-based association mining
 Summary

Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
41 pages
6 Asso
No ratings yet
6 Asso
37 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
UNIT 5 Frequent Pattern Mining
No ratings yet
UNIT 5 Frequent Pattern Mining
42 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
50 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Lecture 2.3.5 2.3.6
No ratings yet
Lecture 2.3.5 2.3.6
19 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Data Mining for Computer Science Students
No ratings yet
Data Mining for Computer Science Students
45 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit - III
No ratings yet
Unit - III
27 pages
Mining Association Rules in Databases
No ratings yet
Mining Association Rules in Databases
77 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
DWM
No ratings yet
DWM
66 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Unit 5
No ratings yet
Unit 5
40 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
Inbound 5799672056943946753
No ratings yet
Inbound 5799672056943946753
47 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
91 pages
Contents
No ratings yet
Contents
59 pages
Rani 2
No ratings yet
Rani 2
98 pages
Association Rule Mining Guide
No ratings yet
Association Rule Mining Guide
30 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Association Rule Mining Explained
No ratings yet
Association Rule Mining Explained
16 pages
Topic 03 - Mining Association Rules
No ratings yet
Topic 03 - Mining Association Rules
12 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
72 pages
Dwdmunit2 Assoc
No ratings yet
Dwdmunit2 Assoc
55 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Data Mining: Association Rules
No ratings yet
Data Mining: Association Rules
43 pages
6asso ST
No ratings yet
6asso ST
77 pages
Mining Association Rules Overview
No ratings yet
Mining Association Rules Overview
81 pages
Data Mining & Association Rules
No ratings yet
Data Mining & Association Rules
39 pages
DM Association
No ratings yet
DM Association
43 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Association Rule Mining Overview
No ratings yet
Association Rule Mining Overview
14 pages
Retail Market Basket Analysis
No ratings yet
Retail Market Basket Analysis
43 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Unit3mining Association Rules
No ratings yet
Unit3mining Association Rules
21 pages
DSTBD 9-DMassrules
No ratings yet
DSTBD 9-DMassrules
98 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Big Book of Data Science 2ndedition (Dragged) 3
No ratings yet
Big Book of Data Science 2ndedition (Dragged) 3
1 page
IT Professional's Resume
No ratings yet
IT Professional's Resume
3 pages
Human Culture and Science Fiction: A Review of The Literature, 1980-2016
No ratings yet
Human Culture and Science Fiction: A Review of The Literature, 1980-2016
15 pages
AIA G202 2013 Free Sample Preview
No ratings yet
AIA G202 2013 Free Sample Preview
5 pages
Yr 8 MATHDairy Herd Data TCH Guide
No ratings yet
Yr 8 MATHDairy Herd Data TCH Guide
8 pages
Methodology of The Literature Review: Chapter 3 Roadmap
No ratings yet
Methodology of The Literature Review: Chapter 3 Roadmap
17 pages
Movie Recommendation System with NLTK
No ratings yet
Movie Recommendation System with NLTK
53 pages
Ccs341 DW Qa (Final)
No ratings yet
Ccs341 DW Qa (Final)
77 pages
BCA Data Science
No ratings yet
BCA Data Science
247 pages
Powerprotect DD Implementation With Application Software: Participant Guide
No ratings yet
Powerprotect DD Implementation With Application Software: Participant Guide
65 pages
Database Security Checklist
No ratings yet
Database Security Checklist
3 pages
Advanced Database Systems: Lab Material (Part I)
100% (5)
Advanced Database Systems: Lab Material (Part I)
21 pages
CS614 Quiz
No ratings yet
CS614 Quiz
18 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
4 pages
BDC Call vs. Session Method in ABAP
No ratings yet
BDC Call vs. Session Method in ABAP
5 pages
Ad Hoc and Sensor Networks Chapter 12: Data-Centric and Content-Based Networking
No ratings yet
Ad Hoc and Sensor Networks Chapter 12: Data-Centric and Content-Based Networking
26 pages
NoSQL Databases: CAP, Sharding, MongoDB
No ratings yet
NoSQL Databases: CAP, Sharding, MongoDB
17 pages
Oracle 1Z0-082 Exam Q&A Demo
No ratings yet
Oracle 1Z0-082 Exam Q&A Demo
7 pages
Ebook Data Intelligence Platform For Communications v5 120523 Final
No ratings yet
Ebook Data Intelligence Platform For Communications v5 120523 Final
21 pages
Arc Hydro Tools - Tutorial
100% (1)
Arc Hydro Tools - Tutorial
85 pages
Advanced Database Management Techniques
No ratings yet
Advanced Database Management Techniques
17 pages
Business Analyst Roadmap: With Learning Resources
No ratings yet
Business Analyst Roadmap: With Learning Resources
6 pages
Qualitative Data Analysis Guide
100% (1)
Qualitative Data Analysis Guide
25 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
8.RAID Concept For RHEL7
No ratings yet
8.RAID Concept For RHEL7
14 pages
DBMS Course Syllabus
No ratings yet
DBMS Course Syllabus
219 pages
Rockwell Micro850 Free Tag Names
No ratings yet
Rockwell Micro850 Free Tag Names
8 pages
Dev Core
No ratings yet
Dev Core
7 pages
Export PDF Form Fields to Excel
No ratings yet
Export PDF Form Fields to Excel
2 pages
Class 12 CS Practical Details
No ratings yet
Class 12 CS Practical Details
2 pages

New Association Rule

Uploaded by

New Association Rule

Uploaded by

Data Mining:

 Association rule mining:

 Finding frequent patterns, associations,

Computer=>antivirus_software [support=2%, confidence=60%]

A support of 2% for above rule

A confidence of 60% means that

 Given: (1) database of transactions, (2) each

 Find: all rules that correlate the presence of one set

 E.g., 98% of people who purchase tires and auto

 buys(x, “SQLServer”) ^ buys(x, “DMBook”)

 age(x, “30..39”) ^ income(x, “42..48K”) ®

Transaction ID Items Bought Min. support 50%

Database D itemset sup.

 L3={abc, abd, acd, ace, bcd}

 Hash-based itemset counting: A k-itemset whose

Level 2 2% Milk Skim Milk

Level 2 2% Milk Skim Milk

 Some rules may be redundant due to

 Subjective measures (Silberschatz &

You might also like