0% found this document useful (0 votes)

71 views24 pages

FP Growth (Tree)

The document discusses challenges with the Apriori algorithm for frequent pattern mining and proposes an improved approach called FP-Growth. [END SUMMARY]

Uploaded by

Waseeque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views24 pages

FP Growth (Tree)

The document discusses challenges with the Apriori algorithm for frequent pattern mining and proposes an improved approach called FP-Growth. [END SUMMARY]

Uploaded by

Waseeque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Frequent Pattern(FP) Growth

(FP Tree)
Challenges of Frequent Pattern Mining

 Challenges
 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates
Bottleneck of Frequent-pattern Mining

 Multiple database scans are costly

 Mining long patterns needs many passes of
scanning and generates lots of candidates
 To find frequent itemset i1i2…i100
 # of scans: 100
 Bottleneck: candidate-generation-and-test
 Can we avoid candidate generation?
Methods to Improve Apriori’s Efficiency

 Transaction reduction
 A transaction that does not contain any frequent k-itemset is useless in
subsequent scans

 Partitioning
 Any itemset that is potentially frequent in DB must be frequent in at
least one of the partitions of DB

4
Methods to Improve Apriori’s Efficiency

 Sampling
 mining on a subset of given data, lower support
threshold + a method to determine the
completeness.

5
Mining Frequent Patterns Without Candidate
Generation

 Compress a large database into a compact, Frequent-

Pattern tree (FP-tree) structure
 highly condensed, but complete for frequent pattern mining
 avoid costly database scans
 Develop an efficient, FP-tree-based frequent pattern
mining method
 A divide-and-conquer methodology: decompose mining tasks into
smaller ones
 Avoid candidate generation: sub-database test only

6
Mining Frequent Patterns Without
Candidate Generation

 Grow long patterns from short ones using local

frequent items

 “abc” is a frequent pattern

 Get all transactions having “abc”: DB|abc

 “d” is a local frequent item in DB|abc  abcd is

a frequent pattern
Steps

 1) Find frequency table of each item

 2) Order frequent itemset in desc order(consider
only those whose support > = minm support
 3) Draw FP Tree
 4) Find frequent pattern from FP tree
Example FP growth
 TID Items bought
 100 {f, a, c, d, g, i, m, p}
 200 {a, b, c, f, l, m, o}
 300 {b, f, h, j, o, w}
 400 {b, c, k, s, p}
 500 {a, f, c, e, l, p, m, n}

Minimum Support = 3
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {}
Header Table
1. Scan DB once, find
frequent 1-itemset Item frequency head f:4 c:1
(single item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, m:2 b:1
construct FP-tree
F-list=f-c-a-b-m-p p:2 m:1
Benefits of the FP-tree Structure

 Reduce irrelevant info—infrequent items are gone

 Items in frequency descending order: the more
frequently occurring, the more likely to be shared
Partition Patterns and Databases

 Frequent patterns can be partitioned into subsets

according to f-list
 F-list=f-c-a-b-m-p

 Patterns containing p

 Patterns having m but no p

 …

 Patterns having c but no a nor b, m, p

 Pattern f
Find Patterns Having P From P-conditional Database

 Starting at the frequent item header table in the FP-tree

 Traverse the FP-tree by following the link of each frequent item p
 Accumulate all of transformed prefix paths of item p to form p’s
conditional pattern base

{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a:3 p:1 a fc:3
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1

p:2 m:1 p fcam:2, cb:1

From Conditional Pattern-bases to Conditional FP-trees

 For each pattern-base

 Accumulate the count for each item in the base

 Construct the FP-tree for the frequent items of the

pattern base

m-conditional pattern base:

{} fca:2, fcab:1
Header Table
Item frequency head All frequent
f:4 c:1 patterns relate to m
f 4 {}
c 4 c:3 b:1 b:1 m,

a 3 f:3  fm, cm, am,
b 3 a:3 p:1 fcm, fam, cam,
m 3 c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
Recursion: Mining Each Conditional FP-tree
{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}

Cond. pattern base of “cam”: (f:3) f:3

cam-conditional FP-tree
Mining Frequent Patterns With FP-trees

 Idea: Frequent pattern growth

 Recursively grow frequent patterns by pattern and

database partition
 Method
 For each frequent item, construct its conditional

pattern-base, and then its conditional FP-tree

 Repeat the process on each newly created conditional

FP-tree
 Until the resulting FP-tree is empty, or it contains only

one path—single path will generate all the

combinations of its sub-paths, each of which is a
frequent pattern
Why Is FP-Growth the Winner?

 Divide-and-conquer:
 decompose both the mining task and DB according to
the frequent patterns obtained so far
 leads to focused search of smaller databases
 Other factors
 no candidate generation, no candidate test
 compressed database: FP-tree structure
 no repeated scan of entire database
 basic ops—counting local freq items and building sub
FP-tree, no pattern search and matching
From association mining to
correlation analysis
Interestingness Measurements

 Objective measures-
 Two popular measurements
 support
 confidence

 Subjective measures-
A rule (pattern) is interesting if
*it is unexpected (surprising to the user); and/or
*actionable (the user can do something with it)
Criticism to Support and Confidence
 Example

 Among 5000 students

 3000 play basketball

 3750 eat cereal

 2000 both play basket ball and eat cereal

 play basketball  eat cereal [40%, 66.7%] is misleading

because the overall percentage of students eating cereal is 75%
which is higher than 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is far more
accurate, although with lower support and confidence

basketball not basketball sum(row)

cereal 2000 1750 3750
not cereal 1000 250 1250
sum(col.) 3000 2000 5000
Other Interestingness Measures: Interest
 Interest (correlation, lift) P( A  B)
P ( A) P( B)
 taking both P(A) and P(B) in consideration

 A and B negatively correlated, if the value is less than 1;

otherwise A and B positively correlated

X 1 1 1 1 0 0 0 0 Itemset Support Interest

Y 1 1 0 0 0 0 0 0 X,Y 25% 2
X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57
Criticism to Support and Confidence

 Example
 X and Y: positively correlated,

 X and Z, negatively correlated

X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
We need a measure of dependent or
Z 0 1 1 1 1 1 1 1


correlated events

P( A B) Itemset Support Interest

corrA, B  X,Y 25% 2
P( A) P( B) X,Z 37.50% 0.9
Y,Z 12.50% 0.57

Rule Support Confidence

X=>Y 25% 50%
X=>Z 37.50% 75%
Interestingness Measure: Correlations (Lift)
 play basketball  eat cereal [40%, 66.7%] is misleading
 The overall % of students eating cereal is 75% > 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is more accurate,
although with lower support and confidence
 Measure of dependent/correlated events: lift
Basketball Not basketball Sum (row)

P( A  B) Cereal 2000 1750 3750

Not cereal 1000 250 1250

P ( A) P ( B ) Sum(col.) 3000 2000 5000

2000 / 5000 1000 / 5000

lift ( B, C )   0.89 lift ( B, C )   1.33
3000 / 5000 * 3750 / 5000 3000 / 5000 *1250 / 5000

15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
fpgrowth
No ratings yet
fpgrowth
11 pages
DM Unit2_1 Association Mining 19I504
No ratings yet
DM Unit2_1 Association Mining 19I504
86 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
FP Tree
No ratings yet
FP Tree
54 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
FPTree-09
No ratings yet
FPTree-09
45 pages
Unit 3
No ratings yet
Unit 3
62 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
18-FP-Growth algorithm-12-02-2025
No ratings yet
18-FP-Growth algorithm-12-02-2025
24 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Association Rules
No ratings yet
Association Rules
48 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
2 unit dm k raj kuamr
No ratings yet
2 unit dm k raj kuamr
26 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Unit II
No ratings yet
Unit II
22 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
DM UNIT-2
No ratings yet
DM UNIT-2
14 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
FP GROWTH ALG
No ratings yet
FP GROWTH ALG
17 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Association Rule Mining: FP Growth
No ratings yet
Association Rule Mining: FP Growth
22 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
U3 - FP Trees - 5th Sem - DS
No ratings yet
U3 - FP Trees - 5th Sem - DS
9 pages
CAMC Trucks Spare Parts Catalog PDF
No ratings yet
CAMC Trucks Spare Parts Catalog PDF
664 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
FP Growth
No ratings yet
FP Growth
21 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
Association Rule: Frequent Pattern Approach
No ratings yet
Association Rule: Frequent Pattern Approach
16 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
No ratings yet
Single-Pass Interesting Frequent Pattern Mining: Without Support Threshold
2 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
11.3 Session Hijacking Tools
100% (1)
11.3 Session Hijacking Tools
20 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
SRAN HW Wiring 4G Overlay
No ratings yet
SRAN HW Wiring 4G Overlay
19 pages
F
No ratings yet
F
106 pages
C# Research
No ratings yet
C# Research
2 pages
Advantag_9100
No ratings yet
Advantag_9100
51 pages
Rhino-rendering-tutorial
No ratings yet
Rhino-rendering-tutorial
13 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
1 s2.0 S2352152X2400776X Main1
No ratings yet
1 s2.0 S2352152X2400776X Main1
21 pages
MS Excel
No ratings yet
MS Excel
15 pages
Interview Prep Guide
No ratings yet
Interview Prep Guide
4 pages
Apc200 Ecm
No ratings yet
Apc200 Ecm
54 pages
Project Report On Solar Module Manufacturing Unit (10 Mw/year)
67% (3)
Project Report On Solar Module Manufacturing Unit (10 Mw/year)
6 pages
BIWS Excel Shortcuts
No ratings yet
BIWS Excel Shortcuts
3 pages
Precious Sourcing Agents 1 Clement Hill Road, Nakasero, Kampala, Uganda Phone: +256-709-777-595 Email
No ratings yet
Precious Sourcing Agents 1 Clement Hill Road, Nakasero, Kampala, Uganda Phone: +256-709-777-595 Email
1 page
Swift Training Finplus Saa 57522 v3 0
No ratings yet
Swift Training Finplus Saa 57522 v3 0
2 pages
Magnetic Stirrer MW - 05 - 230220 - UTM - Dr. Mohamad Ikhwan Bin Jamaludin - Ms. Renatha Jiffrin (6509) - Segregated
No ratings yet
Magnetic Stirrer MW - 05 - 230220 - UTM - Dr. Mohamad Ikhwan Bin Jamaludin - Ms. Renatha Jiffrin (6509) - Segregated
3 pages
The Gartner Report: MS Guides
No ratings yet
The Gartner Report: MS Guides
19 pages
How To Install SAP IDES For Practice
No ratings yet
How To Install SAP IDES For Practice
56 pages
Cheville "Spit Fix" Male: ALSTOM Power HGST901025
No ratings yet
Cheville "Spit Fix" Male: ALSTOM Power HGST901025
2 pages
aADAPTIvE Light
No ratings yet
aADAPTIvE Light
28 pages
AVR Simulation Using SimulID
No ratings yet
AVR Simulation Using SimulID
12 pages
E-Catalog Borcad Gracie
No ratings yet
E-Catalog Borcad Gracie
1 page
Muzaffarpur Thermal Power Project (2 X 195 MW) : Sr. Description by by No Vendor BHEL A Mechanical
No ratings yet
Muzaffarpur Thermal Power Project (2 X 195 MW) : Sr. Description by by No Vendor BHEL A Mechanical
3 pages
Challenges, Opportunities and Future Perspectives in Including Children With Disabilities in The Design of Interactive Technology
No ratings yet
Challenges, Opportunities and Future Perspectives in Including Children With Disabilities in The Design of Interactive Technology
4 pages
GIT BUSINESS CASE ASSIGNMENT GUIDE Final Requirement
No ratings yet
GIT BUSINESS CASE ASSIGNMENT GUIDE Final Requirement
6 pages
SGP Fusion en
No ratings yet
SGP Fusion en
4 pages
DEEE Newsletter Issue0
No ratings yet
DEEE Newsletter Issue0
4 pages
SASHA JENSEN: Hedge Fund Recruitment
No ratings yet
SASHA JENSEN: Hedge Fund Recruitment
4 pages
JImmy Quote
No ratings yet
JImmy Quote
2 pages
Chap 3 HW Assignment
No ratings yet
Chap 3 HW Assignment
3 pages
The Recursive Book of Recursion: Ace the Coding Interview with Python and JavaScript
From Everand
The Recursive Book of Recursion: Ace the Coding Interview with Python and JavaScript
Al Sweigart
No ratings yet
Python for Finance - Second Edition
From Everand
Python for Finance - Second Edition
Yuxing Yan
2.5/5 (3)

FP Growth (Tree)

Uploaded by

FP Growth (Tree)

Uploaded by

Frequent Pattern(FP) Growth

 Multiple database scans are costly

 Compress a large database into a compact, Frequent-

 Grow long patterns from short ones using local

 “abc” is a frequent pattern

 Get all transactions having “abc”: DB|abc

 “d” is a local frequent item in DB|abc  abcd is

 1) Find frequency table of each item

TID Items bought (ordered) frequent items

 Reduce irrelevant info—infrequent items are gone

 Frequent patterns can be partitioned into subsets

 Patterns having m but no p

 Patterns having c but no a nor b, m, p

 Starting at the frequent item header table in the FP-tree

p:2 m:1 p fcam:2, cb:1

 For each pattern-base

 Construct the FP-tree for the frequent items of the

m-conditional pattern base:

{} Cond. pattern base of “am”: (fc:3) f:3

Cond. pattern base of “cam”: (f:3) f:3

 Idea: Frequent pattern growth

pattern-base, and then its conditional FP-tree

one path—single path will generate all the

 Among 5000 students

 3750 eat cereal

 2000 both play basket ball and eat cereal

 play basketball  eat cereal [40%, 66.7%] is misleading

basketball not basketball sum(row)

 A and B negatively correlated, if the value is less than 1;

X 1 1 1 1 0 0 0 0 Itemset Support Interest

 X and Z, negatively correlated

P( A B) Itemset Support Interest

Rule Support Confidence

P( A  B) Cereal 2000 1750 3750

Not cereal 1000 250 1250

2000 / 5000 1000 / 5000

You might also like