0% found this document useful (0 votes)

10 views

Chap5-Association Analysis

data mining

Uploaded by

Bareeq Nope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Chap5-Association Analysis

data mining

Uploaded by

Bareeq Nope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Data Mining

Chapter 5
Association Analysis: Basic Concepts

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

02/14/2018 Introduction to Data Mining, 2 nd Edition 1

Association Rule Mining

 Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper}  {Tea},
1 Bread, Milk
{Milk, Bread}  {Eggs, Coke},
2 Bread, Diaper, Tea, Eggs {Tea, Bread}  {Milk},
3 Milk, Diaper, Tea, Coke
4 Bread, Milk, Diaper, Tea Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!

02/14/2018 Introduction to Data Mining, 2 nd Edition 2

Definition: Frequent Itemset
 Itemset
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset TID Items
 An itemset that contains k items 1 Bread, Milk
 Support count () 2 Bread, Diaper, Tea, Eggs
– Frequency of occurrence of an itemset 3 Milk, Diaper, Tea, Coke
– E.g. ({Milk, Bread, Diaper}) = 2 4 Bread, Milk, Diaper, Tea
 5 Bread, Milk, Diaper, Coke
Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
 Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold

02/14/2018 Introduction to Data Mining, 2 nd Edition 3

Definition: Association Rule
 Association Rule
TID Items
– An implication expression of the form
X  Y, where X and Y are itemsets 1 Bread, Milk
2 Bread, Diaper, Tea, Eggs
– Example:
{Milk, Diaper}  {Tea} 3 Milk, Diaper, Tea, Coke
4 Bread, Milk, Diaper, Tea

5 Bread, Milk, Diaper, Coke
Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain Example:
both X and Y
{Milk , Diaper}  {Beer}
Tea
– Confidence (c)
 Measures how often items in Y  (Milk , Diaper, Beer
Tea ) 2
appear in transactions that s   0.4
contain X
|T| 5
 (Milk, Diaper, Beer
Tea ) 2
c   0.67
 (Milk , Diaper ) 3
02/14/2018 Introduction to Data Mining, 2 nd Edition 4
Association Rule Mining Task

 Given a set of transactions T, the goal of

association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

 Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
 Computationally prohibitive!

02/14/2018 Introduction to Data Mining, 2 nd Edition 5

Computational Complexity
 Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:

 d   d  k 
R        
d 1 d k

 k   j 
k 1 j 1

 3  2 1d d 1

If d=6, R = 602 rules

02/14/2018 Introduction to Data Mining, 2 nd Edition 6

Mining Association Rules

TID Items Example of Rules:

1 Bread, Milk {Milk, Diaper}  {Tea} (s=0.4, c=0.67)
2 Bread, Diaper, Tea, Eggs {Milk, Tea}  {Diaper} (s=0.4, c=1.0)
3 Milk, Diaper, Tea, Coke {Diaper, Tea}  {Milk} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Tea {Tea}  {Milk, Diaper} (s=0.4, c=0.67)
5 Bread, Milk, Diaper, Coke {Diaper}  {Milk, Tea} (s=0.4, c=0.5)
{Milk}  {Diaper, Tea} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Tea}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements

02/14/2018 Introduction to Data Mining, 2 nd Edition 7

Mining Association Rules

 Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

 Frequent itemset generation is still

computationally expensive

02/14/2018 Introduction to Data Mining, 2 nd Edition 8

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
02/14/2018 Introduction to Data Mining, 2 nd Edition 9
Frequent Itemset Generation
 Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
02/14/2018 Introduction to Data Mining, 2 nd Edition 10
Frequent Itemset Generation Strategies
 Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M
 Reduce the number of transactions (N)
– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms
 Reduce the number of comparisons (NM)
– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction

02/14/2018 Introduction to Data Mining, 2 nd Edition 11

Reducing Number of Candidates

 Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent

 Apriori principle holds due to the following property

of the support measure:

X , Y : ( X  Y )  s( X )  s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support

02/14/2018 Introduction to Data Mining, 2 nd Edition 12

Illustrating Apriori Principle

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
02/14/2018 Introduction to Data Mining, 2 nd Edition 13
Illustrating Apriori Principle

TID Items Items (1-itemsets)

1 Bread, Milk
Item Count
2 Tea, Bread, Diaper, Eggs
Bread 4
3 Tea, Coke, Diaper, Milk Coke 2
4 Tea, Bread, Diaper, Milk Milk 4
Tea 3
5 Bread, Coke, Diaper, Milk
Diaper 4
Eggs 1

Minimum Support = 3

02/14/2018 Introduction to Data Mining, 2 nd Edition 14

Illustrating Apriori Principle

TID Items Items (1-itemsets)

1 Bread, Milk
Item Count
2 Tea, Bread, Diaper, Eggs
Bread 4
3 Tea, Coke, Diaper, Milk Coke 2
4 Tea, Bread, Diaper, Milk Milk 4
5 Bread, Coke, Diaper, Milk Tea 3
Diaper 4
Eggs 1

Minimum Support = 3

02/14/2018 Introduction to Data Mining, 2 nd Edition 15

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Tea 3 {Bread,Milk} 3
Diaper 4 {Bread,Tea} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Tea} 2 or Eggs)
{Milk,Diaper} 3
{Tea,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered, Itemset Count
6
C1 + 6C2 + 6C3
{Bread, Diaper, Milk} 2
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 1 = 13

02/14/2018 Introduction to Data Mining, 2 nd Edition 16

Support Counting of Candidate Itemsets

 Scan the database of transactions to determine the

support of each candidate itemset
– Must match every candidate itemset against every transaction,
which is an expensive operation

TID Items
Itemset
1 Bread, Milk
{ Beer, Diaper, Milk}
2 Beer, Bread, Diaper, Eggs { Beer,Bread,Diaper}
3 Beer, Coke, Diaper, Milk {Bread, Diaper, Milk}
{ Beer, Bread, Milk}
4 Beer, Bread, Diaper, Milk
5 Bread, Coke, Diaper, Milk

02/14/2018 Introduction to Data Mining, 2 nd Edition 17

Apriori Algorithm

– Fk: frequent k-itemsets

– Lk: candidate k-itemsets
 Algorithm
– Let k=1
– Generate F1 = {frequent 1-itemsets}
– Repeat until Fk is empty
 Candidate Generation: Generate Lk+1 from Fk
 Candidate Pruning: Prune candidate itemsets in Lk+1
containing subsets of length k that are infrequent
 Support Counting: Count the support of each candidate in
Lk+1 by scanning the DB
 Candidate Elimination: Eliminate candidates in Lk+1 that are
infrequent, leaving only those that are frequent => F k+1
02/14/2018 Introduction to Data Mining, 2 nd Edition 18
The Apriori Algorithm—An Example

Supmin = 2 Itemset sup

Database Itemset sup
{A} 2 F1
L1 {A} 2
Tid Items {B} 3
{B} 3
10 A, C, D {C} 3
20 B, C, E
1st scan {C} 3
{D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
L2 Itemset sup L2
2 nd Itemset
F2 {A, B} 1
Itemset sup {A, B}
{A, C} 2 scan
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
L3 Itemset
3rd scan F3 Itemset sup
{B, C, E} {B, C, E} 2
02/14/2018 Introduction to Data Mining, 2 nd Edition 19
Candidate Generation: Fk-1 x Fk-1 Method

02/14/2018 Introduction to Data Mining, 2 nd Edition 20

Candidate Generation: Fk-1 x Fk-1 Method

 Merge two frequent (k-1)-itemsets if their first (k-2) items

are identical

 F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
– Merge(ABC, ABD) = ABCD
– Merge(ABC, ABE) = ABCE
– Merge(ABD, ABE) = ABDE

– Do not merge(ABD,ACD) because they share only

prefix of length 1 instead of length 2

02/14/2018 Introduction to Data Mining, 2 nd Edition 21

Candidate Pruning

 Let F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} be
the set of frequent 3-itemsets

 L4 = {ABCD,ABCE,ABDE} is the set of candidate

4-itemsets generated (from previous slide)

 Candidate pruning
– Prune ABCE because ACE and BCE are infrequent
– Prune ABDE because ADE is infrequent

 After candidate pruning: L4 = {ABCD}

02/14/2018 Introduction to Data Mining, 2 nd Edition 22
Rule Generation

 Given a frequent itemset L, find all non-empty

subsets f  L such that f  L – f satisfies the
minimum confidence requirement
– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC  BD, AD  BC, BC AD,
BD AC, CD AB,

 If |L| = k, then there are 2k – 2 candidate

association rules (ignoring L   and   L)

02/14/2018 Introduction to Data Mining, 2 nd Edition 23

Rule Generation

 In general, confidence does not have an anti-

monotone property
c(ABC D) can be larger or smaller than c(AB D)

 But confidence of rules generated from the same

itemset has an anti-monotone property
– E.g., Suppose {A,B,C,D} is a frequent 4-itemset:

c(ABC  D)  c(AB  CD)  c(A 

BCD)

– Confidence is anti-monotone w.r.t. number of items

on the RHS
02/14/2018
of the rule
Introduction to Data Mining, 2 Edition
nd
24
Rule Generation for Apriori Algorithm

Lattice of rules
ABCD=>{ }
Low
Confidence
Rule
BCD=>A ACD=>B ABD=>C ABC=>D

CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned
Rules

02/14/2018 Introduction to Data Mining, 2 nd Edition 25

Factors Affecting Complexity of Apriori
 Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of frequent itemsets
 Dimensionality (number of items) of the data set
– more space is needed to store support count of each item
– if number of frequent items also increases, both computation and I/O costs may
also increase
 Size of database
– since Apriori makes multiple passes, run time of algorithm may increase with
number of transactions
 Average transaction width
– transaction width increases with denser data sets
– This may increase max length of frequent itemset

02/14/2018 Introduction to Data Mining, 2 nd Edition 26

Construct FP-tree from a Transaction Database

TID items Items bought (ordered) frequent

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 3
300 {b, f, h, j, o, w} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, construct m:2 b:1
FP-tree
F-list = f-c-a-b-m-p p:2 m:1
27
Partition Patterns and Databases

 Frequent patterns can be partitioned into subsets

according to f-list
 F-list = f-c-a-b-m-p

 Patterns containing p

 Patterns having m but no p

 …

 Patterns having c but no a nor b, m, p

 Pattern f

28
Find Patterns Having P From P-conditional Database

 Starting at the frequent item header table in the FP-tree

 Traverse the FP-tree by following the link of each frequent item p
 Accumulate all of transformed prefix paths of item p to form p’s
conditional pattern base

{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
29

Anatomy Volume 2 1
100% (3)
Anatomy Volume 2 1
163 pages
BS EN 15237 2007 Vertical Drainage
No ratings yet
BS EN 15237 2007 Vertical Drainage
60 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
104 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
3AR
No ratings yet
3AR
62 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Unit 3- Asso Rule Mining
No ratings yet
Unit 3- Asso Rule Mining
27 pages
Association Rule Mining Task
No ratings yet
Association Rule Mining Task
40 pages
DSTBD_9-DMassrules
No ratings yet
DSTBD_9-DMassrules
98 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
dmunit2
No ratings yet
dmunit2
85 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
DM Association
No ratings yet
DM Association
43 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
association rule
No ratings yet
association rule
22 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Unit 2
No ratings yet
Unit 2
14 pages
Unit 5
No ratings yet
Unit 5
40 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
CS2202_AssociationRuleMining
No ratings yet
CS2202_AssociationRuleMining
59 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
BD25
No ratings yet
BD25
19 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association
No ratings yet
Association
67 pages
Lect 6
No ratings yet
Lect 6
74 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
DMDW Unit 4 Association 29.12.2020
No ratings yet
DMDW Unit 4 Association 29.12.2020
31 pages
DM -Unit 2-PPT
No ratings yet
DM -Unit 2-PPT
49 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Association Analysis: Basic Concepts and Algorithms
38 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
DM Mod3 PDF
No ratings yet
DM Mod3 PDF
96 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Ratio Session-II - Basic To Advanced - CAT & OMETs Preparation Cross Multipication Methord
No ratings yet
Ratio Session-II - Basic To Advanced - CAT & OMETs Preparation Cross Multipication Methord
1 page
Apple Production .... Peehu
No ratings yet
Apple Production .... Peehu
43 pages
The Ant and The Dove
No ratings yet
The Ant and The Dove
3 pages
Development of Enamel
No ratings yet
Development of Enamel
19 pages
A Hybrid Approach To Implement The Digital Twin Concept Into A Damage Evolution Prediction For Composite Structures
No ratings yet
A Hybrid Approach To Implement The Digital Twin Concept Into A Damage Evolution Prediction For Composite Structures
141 pages
Spring Boot Interview Questions
No ratings yet
Spring Boot Interview Questions
24 pages
3069 10431 3 PB PDF
No ratings yet
3069 10431 3 PB PDF
34 pages
Full Download Basic Gas Chromatography Third Edition Nicholas H. Snow PDF
100% (6)
Full Download Basic Gas Chromatography Third Edition Nicholas H. Snow PDF
25 pages
Claims Reserving Manual
No ratings yet
Claims Reserving Manual
229 pages
Brain Tumor Classification Project Report
No ratings yet
Brain Tumor Classification Project Report
39 pages
PDF Exercises Say and Tell As Introductory Verbs Compress
No ratings yet
PDF Exercises Say and Tell As Introductory Verbs Compress
8 pages
Notes On Attitudes and Job Behaviors
No ratings yet
Notes On Attitudes and Job Behaviors
2 pages
Diagnostic Manual DI3200 BSIII
100% (2)
Diagnostic Manual DI3200 BSIII
41 pages
Macleod 1995
No ratings yet
Macleod 1995
6 pages
Statement of Account As On 10/04/24: Capital - Unit - Gain
No ratings yet
Statement of Account As On 10/04/24: Capital - Unit - Gain
3 pages
Agenda Style
No ratings yet
Agenda Style
48 pages
Transmission Line Theory: Dr. Ray Kwok
No ratings yet
Transmission Line Theory: Dr. Ray Kwok
42 pages
Mail Management Service: How To Design An Activity-Based Costing System For Service Firms
No ratings yet
Mail Management Service: How To Design An Activity-Based Costing System For Service Firms
11 pages
Project Proposal
No ratings yet
Project Proposal
9 pages
Self Study Report: Adv - Ramkrishnaji Rathi Law College, Washim
No ratings yet
Self Study Report: Adv - Ramkrishnaji Rathi Law College, Washim
141 pages
WEEK 8.1. Lesson Plan Solution Linear Equation
No ratings yet
WEEK 8.1. Lesson Plan Solution Linear Equation
7 pages
Without Armor Size Type Jumlah Satuan Diameter Cable Dimensi Cable Gland (MIN-MAX)
No ratings yet
Without Armor Size Type Jumlah Satuan Diameter Cable Dimensi Cable Gland (MIN-MAX)
8 pages
1.1 Review On Functions
No ratings yet
1.1 Review On Functions
24 pages
Application of Computer-Simulation-Aided Architecture in Environment Design With Chinese Elements
No ratings yet
Application of Computer-Simulation-Aided Architecture in Environment Design With Chinese Elements
6 pages
CALT Investor Presentation
No ratings yet
CALT Investor Presentation
25 pages
2.2 Absorption Costing
No ratings yet
2.2 Absorption Costing
3 pages
Kyle J. Weary Curriculum Vitae
No ratings yet
Kyle J. Weary Curriculum Vitae
9 pages
Teori Computational Thinking I PDF
No ratings yet
Teori Computational Thinking I PDF
41 pages

Chap5-Association Analysis

Uploaded by

Chap5-Association Analysis

Uploaded by

Data Mining

Introduction to Data Mining, 2nd Edition

02/14/2018 Introduction to Data Mining, 2 nd Edition 1

 Given a set of transactions, find rules that will predict the

02/14/2018 Introduction to Data Mining, 2 nd Edition 2

02/14/2018 Introduction to Data Mining, 2 nd Edition 3

 Given a set of transactions T, the goal of

02/14/2018 Introduction to Data Mining, 2 nd Edition 5

If d=6, R = 602 rules

02/14/2018 Introduction to Data Mining, 2 nd Edition 6

TID Items Example of Rules:

02/14/2018 Introduction to Data Mining, 2 nd Edition 7

 Frequent itemset generation is still

02/14/2018 Introduction to Data Mining, 2 nd Edition 8

ABCD ABCE ABDE ACDE BCDE

02/14/2018 Introduction to Data Mining, 2 nd Edition 11

 Apriori principle holds due to the following property

02/14/2018 Introduction to Data Mining, 2 nd Edition 12

ABCD ABCE ABDE ACDE BCDE

TID Items Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 14

TID Items Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 15

Item Count Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 16

 Scan the database of transactions to determine the

02/14/2018 Introduction to Data Mining, 2 nd Edition 17

– Fk: frequent k-itemsets

Supmin = 2 Itemset sup

02/14/2018 Introduction to Data Mining, 2 nd Edition 20

 Merge two frequent (k-1)-itemsets if their first (k-2) items

– Do not merge(ABD,ACD) because they share only

02/14/2018 Introduction to Data Mining, 2 nd Edition 21

 L4 = {ABCD,ABCE,ABDE} is the set of candidate

 After candidate pruning: L4 = {ABCD}

 Given a frequent itemset L, find all non-empty

 If |L| = k, then there are 2k – 2 candidate

02/14/2018 Introduction to Data Mining, 2 nd Edition 23

 In general, confidence does not have an anti-

 But confidence of rules generated from the same

c(ABC  D)  c(AB  CD)  c(A 

– Confidence is anti-monotone w.r.t. number of items

CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

02/14/2018 Introduction to Data Mining, 2 nd Edition 25

02/14/2018 Introduction to Data Mining, 2 nd Edition 26

TID items Items bought (ordered) frequent

 Frequent patterns can be partitioned into subsets

 Patterns having m but no p

 Patterns having c but no a nor b, m, p

 Starting at the frequent item header table in the FP-tree

You might also like