0% found this document useful (0 votes)

42 views35 pages

Frequent Item Mining

The document discusses frequent itemset mining. It begins by defining key concepts like itemsets, support count, and frequent itemsets. It then explains the Apriori algorithm for generating frequent itemsets and how it uses the Apriori property to reduce the number of candidates. The document also discusses challenges with frequent itemset mining and methods to more efficiently enumerate maximal and closed frequent itemsets, including alternative representations of the transaction database like ECLAT and FP-growth.

Uploaded by

Anh Kiet Duong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views35 pages

Frequent Item Mining

Uploaded by

Anh Kiet Duong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Frequent Item Mining 

What is data mining? 
•  =Pa6ern Mining? 
•  What pa6erns? 
•  Why are they useful?  
Deﬁni>on: Frequent Itemset 
•  Itemset 
–  A collec>on of one or more items 
•  Example: {Milk, Bread, Diaper} 
–  k‐itemset 
•  An itemset that contains k items 
•  Support count (σ) 
–  Frequency of occurrence of an itemset 
–  E.g.   σ({Milk, Bread,Diaper}) = 2  
•  Support 
–  Frac>on of transac>ons that contain an itemset 
–  E.g.   s({Milk, Bread, Diaper}) = 2/5 
•  Frequent Itemset 
–  An itemset whose support is greater than or 
equal to a minsup threshold 

3 
Frequent Itemsets Mining 
TID Transactions •  Minimum support level 
100 { A, B, E } 50% 
200 { B, D } –  {A},{B},{C},{A,B}, {A,C} 
300 { A, B, E }
400 { A, C }
500 { B, C }
600 { A, C }
700 { A, B }
800 { A, B, C, E }
900 { A, B, C }
1000 { A, C, E }
Frequent Pa6ern Mining 
A  E  A  E  B 

A  B  A  B 

E  F  E  A  A 
B  A  B  C 

D  C  F  D  F  C 
D  C 
A 

D  F  D  C 

A  B 

D  C 
Beyond Itemsets   
•  Sequence Mining 
–  Finding frequent subsequences from a collec>on of sequences  
•  Graph Mining 
–  Finding frequent (connected) subgraphs from a collec>on of 
graphs 
•  Tree Mining 
–  Finding frequent (embedded) subtrees from a set of trees/
graphs 
•  Geometric Structure Mining 
–  Finding frequent substructures from 3‐D or 2‐D geometric 
graphs 
•  Among others… 
Why Frequent Pa6ern Mining is So 
Important? 
•  Applica>on Domains 
–  Business, biology, chemistry, WWW, computer/networing security, … 
•  Summarizing the underlying datasets, providing key insights 
•  Basic tools for other data mining tasks 
–  Assoca>on rule mining 
–  Classiﬁca>on 
–  Clustering 
–  Change Detec>on 
–  etc… 
Network motifs: recurring patterns that
occur significantly more than in
randomized nets

•  Do mo>fs have speciﬁc roles in the network? 

•  Many possible dis>nct subgraphs 
The 13 three-node connected
subgraphs
199 4-node directed connected subgraphs

And it grows fast for larger subgraphs : 9364 5‐node subgraphs, 
1,530,843 6‐node… 
Finding network motifs –
an overview
•  Genera>on of a suitable random ensemble (reference 
networks) 
•  Network mo>fs detec>on process:  

  Count how many times each subgraph

appears
  Compute statistical significance for each
subgraph – probability of appearing in
random as much as in real network
(P-val or Z-score)
Ensemble of 
networks 

Real = 5               Rand=0.5±0.6  

       Zscore (#Standard DeviaPons)=7.5 
Three Diﬀerent Views of FIM 
•  Transac>onal Database 
–  How we do store a transac>onal 
database? 
•  Horizontal, Ver>cal, Transac>on‐Item 
Pair 
•  Binary Matrix 
•  Bipar>te Graph 

•  How does the FIM formulated in 
these diﬀerent se`ngs? 
13
Frequent Itemset Genera>on 

Given d items, there are
 2d possible candidate
 itemsets 
14 
Frequent Itemset Genera>on 
•  Brute‐force approach:  
–  Each itemset in the la`ce is a candidate frequent itemset 
–  Count the support of each candidate by scanning the 
database 

–  Match each transac>on against every candidate 
–  Complexity ~ O(NMw) => Expensive since M = 2d !!!  15 
Reducing Number of Candidates 
•  Apriori principle: 
–  If an itemset is frequent, then all of its subsets must also 
be frequent 

•  Apriori principle holds due to the following property 
of the support measure: 

–  Support of an itemset never exceeds the support of its 
subsets 
–  This is known as the an>‐monotone property of support 
16 
Illustra>ng Apriori Principle 

Found to be
 Infrequent 

Pruned
 supersets  17 
Illustra>ng Apriori Principle 
Items (1-itemsets)

Pairs (2-itemsets)

(No need to generate

candidates involving Coke
or Eggs)

Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered,

6C + 6C + 6C = 41
1 2 3
With support-based pruning,
6 + 6 + 1 = 13

18 
Apriori 

R. Agrawal and R. Srikant.

Fast algorithms for mining association rules.
VLDB, 487-499, 1994
How to Generate Candidates? 
•  Suppose the items in Lk‐1 are listed in an order 
•  Step 1: self‐joining Lk‐1  
insert into Ck 
select p.item1, p.item2, …, p.itemk‐1, q.itemk‐1 
from Lk‐1 p, Lk‐1 q 
where p.item1=q.item1, …, p.itemk‐2=q.itemk‐2, p.itemk‐1 < q.itemk‐1 

•  Step 2: pruning 
forall itemsets c in Ck do 
forall (k‐1)‐subsets s of c do 
if (s is not in Lk‐1) then delete c from Ck 

20 
Challenges of Frequent Itemset Mining 
•  Challenges 
–  Mul>ple scans of transac>on database 
–  Huge number of candidates 
–  Tedious workload of support coun>ng for candidates 

•  Improving Apriori: general ideas 
–  Reduce passes of transac>on database scans 
–  Shrink number of candidates 
–  Facilitate support coun>ng of candidates 

21 
Compact Representa>on of Frequent Itemsets 

•  Some itemsets are redundant because they have iden>cal support 
as their supersets 

•  Number of frequent itemsets 

•  Need a compact representa>on 

22 
Maximal Frequent Itemset 
An itemset is maximal frequent if none of its immediate supersets is
 frequent 

Maximal
 Itemsets 

Border 
Infrequent
 Itemsets 
23 
Closed Itemset 
•  An itemset is closed if none of its immediate supersets has the 
same support as the itemset 

24 
Maximal vs Closed Itemsets 
TransacPon Ids 

Not supported by
 any transacPons  25 
Maximal vs Closed Frequent Itemsets 
Minimum support = 2  Closed but
 not maximal 

Closed and
 maximal 

# Closed = 9 
# Maximal = 4 

26 
Maximal vs Closed Itemsets 

27 
Research Ques>ons 
•  How to eﬃciently enumerate Maximal 
Frequent Itemsets? 

•  How about Closed Frequent Itemsets? 

28 
Alterna>ve Methods for Frequent Itemset 
Genera>on 
•  Representa>on of Database 
–  horizontal vs ver>cal data layout 

29 
ECLAT 
•  For each item, store a list of transac>on ids 
(>ds) 

TID‐list  30 
ECLAT 
•  Determine support of any k‐itemset by intersec>ng >d‐lists of 
two of its (k‐1) subsets. 

∧  → 

•  3 traversal approaches:  
–  top‐down, bo6om‐up and hybrid 
•  Advantage: very fast support coun>ng 
•  Disadvantage: intermediate >d‐lists may become too large for 
memory 
31 
FP‐growth Algorithm 
•  Use a compressed representa>on of the 
database using an FP‐tree 

•  Once an FP‐tree has been constructed, it uses 
a recursive divide‐and‐conquer approach to 
mine the frequent itemsets 

32 
FP‐tree construc>on 
null
A]er reading TID=1: 

A:1

B:1

A]er reading TID=2: 
null

A:1 B:1

B:1 C:1

33 
D:1
FP‐Tree Construc>on 
TransacPon
 Database 
null

A:7 B:3

B:5 C:3
C:1 D:1

Header table  D:1
C:3 E:1
D:1 E:1
D:1
E:1
D:1
Pointers are used to assist
 frequent itemset generaPon 
34 
FP‐growth 
CondiPonal Pa`ern base for
null
 D:  
     P = {(A:1,B:1,C:1), 
A:7 B:1  (A:1,B:1),  
             (A:1,C:1), 
             (A:1),  
B:5 C:1              (B:1,C:1)} 
C:1 D:1
Recursively apply FP‐growth
D:1  on P 
C:3
D:1
D:1 Frequent Itemsets found
 (with sup > 1): 
D:1    AD, BD, CD, ACD, BCD 

BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Module 3
No ratings yet
Module 3
136 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
3 FrequentItemsetMining
No ratings yet
3 FrequentItemsetMining
63 pages
Chap5 Frequent Itemset
No ratings yet
Chap5 Frequent Itemset
70 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Bu I 11 FIM Apriori
No ratings yet
Bu I 11 FIM Apriori
72 pages
Unit 3
No ratings yet
Unit 3
62 pages
CH 4
No ratings yet
CH 4
51 pages
Unit 2
No ratings yet
Unit 2
65 pages
Week 3
No ratings yet
Week 3
56 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Survey - Itemset - Mining
No ratings yet
Survey - Itemset - Mining
41 pages
DM 2
No ratings yet
DM 2
71 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Apriori
No ratings yet
Apriori
33 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Chap 6
No ratings yet
Chap 6
77 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Tine and The Faraway Mountain: Author: Shikha Tripathi Illustrator: Ogin Nayam
No ratings yet
Tine and The Faraway Mountain: Author: Shikha Tripathi Illustrator: Ogin Nayam
24 pages
Unit I Research Design
100% (1)
Unit I Research Design
15 pages
Parts IR5000-IR6000
No ratings yet
Parts IR5000-IR6000
256 pages
132 Tech-Tips From Computer Geeks - Com R20071221A
100% (5)
132 Tech-Tips From Computer Geeks - Com R20071221A
499 pages
Effectiveness of Machine Harvester (I3)
No ratings yet
Effectiveness of Machine Harvester (I3)
42 pages
Mathematical Tools DPP 09 (Of Lec-10) Yakeen NEET 2026
No ratings yet
Mathematical Tools DPP 09 (Of Lec-10) Yakeen NEET 2026
4 pages
Unit 2 CH 4 - Supply and Demand-3
No ratings yet
Unit 2 CH 4 - Supply and Demand-3
69 pages
Craig Storrs Appeal State Central Committee Sixth District
No ratings yet
Craig Storrs Appeal State Central Committee Sixth District
43 pages
3.1 Hydraulic Pump
100% (1)
3.1 Hydraulic Pump
22 pages
2024 25 Work Immersion Work Plan
No ratings yet
2024 25 Work Immersion Work Plan
8 pages
The Role of Education in Shaping Future Generations
No ratings yet
The Role of Education in Shaping Future Generations
7 pages
Fractal Robot
100% (1)
Fractal Robot
13 pages
Al Aber Profile
No ratings yet
Al Aber Profile
21 pages
Liang 2020
No ratings yet
Liang 2020
11 pages
Travel Ex
0% (1)
Travel Ex
3 pages
Laboratory Details Central Instrument Room
No ratings yet
Laboratory Details Central Instrument Room
8 pages
Me3781 Set 1-1-1
No ratings yet
Me3781 Set 1-1-1
2 pages
Derecho de Contratos en Rusia PDF
No ratings yet
Derecho de Contratos en Rusia PDF
26 pages
Strategic Management & Leadership of Apex PDF
No ratings yet
Strategic Management & Leadership of Apex PDF
18 pages
Calculation of Area
No ratings yet
Calculation of Area
13 pages
Ipr Research Paper D
No ratings yet
Ipr Research Paper D
10 pages
FPT University OSG202
No ratings yet
FPT University OSG202
8 pages
Question Bank
No ratings yet
Question Bank
12 pages
Michael Jackson Resume 1
No ratings yet
Michael Jackson Resume 1
1 page
Proposal For Improvement of Infant Toddler Weighing Scale
No ratings yet
Proposal For Improvement of Infant Toddler Weighing Scale
6 pages
Superior Packaging Corporation - Digest
100% (1)
Superior Packaging Corporation - Digest
2 pages
Dante Quintana - Aristotle and Dante Discover The Secrets of The Universe (Duology) MBTI - Personality Database ™
No ratings yet
Dante Quintana - Aristotle and Dante Discover The Secrets of The Universe (Duology) MBTI - Personality Database ™
1 page
How To Reduce, Reuse and Recycle - Instruction Essay
100% (2)
How To Reduce, Reuse and Recycle - Instruction Essay
2 pages
Webinar: Definition, Basics, and Possible Uses: What Is A Webinar?
No ratings yet
Webinar: Definition, Basics, and Possible Uses: What Is A Webinar?
2 pages
When Vendor Is Not Bound To Deliver The Thing Sold
No ratings yet
When Vendor Is Not Bound To Deliver The Thing Sold
1 page
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet

Frequent Item Mining

Uploaded by

Frequent Item Mining

Uploaded by

Frequent Item Mining

 Count how many times each subgraph

Real = 5 Rand=0.5±0.6

(No need to generate

If every subset is considered,

R. Agrawal and R. Srikant.

You might also like

Frequent Item Mining 

  Count how many times each subgraph

Real = 5               Rand=0.5±0.6