0% found this document useful (0 votes)

7 views10 pages

FDS Unit - 3

Frequent pattern mining identifies recurring relationships in datasets, particularly in transactional and relational databases, to discover associations and correlations among items. Market basket analysis is a key example, helping retailers understand customer buying habits and optimize marketing strategies. The document also discusses various mining methods, including the Apriori algorithm and FP-growth, for extracting frequent itemsets and generating association rules.

Uploaded by

kousalyadvg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

FDS Unit - 3

Uploaded by

kousalyadvg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Fundamentals of Data Science

UNIT-3

Mining Frequent pattern

Basic concepts:

Frequent pattern mining searches for recurring relationships in a given data set. This section
introduces the basic concepts of frequent pattern mining for the discovery of interesting associations
and correlations between item sets in transactional and relational databases.

Frequent pattern mining in data mining is the process of identifying patterns or associations within a
dataset that occur frequently. This is typically done by analysing large datasets to find items or sets of
items that appear together frequently.

Example of market basket analysis, the earliest form of frequent pattern mining for association rules.

Definition of Frequent Patterns: Frequent patterns refer to combinations of items, sequences, or

substructures that occur frequently in a dataset. For example, in a retail dataset, a frequent
patterncouldbetheassociationbetweencertainproductsthatareoftenpurchasedtogether,like bread and
butter.

Mining frequent patterns in data science involves identifying recurring associations or relationships
within a dataset.

Market Basket Analysis: A Motivating Example

Frequent itemset mining leads to the discovery of associations and correlations among items in large
transactional or relational data sets. With massive amounts of data continuously being collected and
stored, many industries are becoming interested in mining such patterns from their databases.

The discovery of interesting correlation relationships among huge amounts of business transaction
records can help in many business decision-making processes such as catalog design, cross-
marketing, and customer shopping behaviour analysis.

A typical example of frequent itemset mining is market basket analysis. This process analyzes
customer buying habits by finding associations between the different items that customers place in
their “shopping baskets” (Figure). The discovery of these associations can help retailers develop
marketing strategies by gaining insight into which items are frequently purchased together by
customers.

For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of
bread) on the same trip to the supermarket? This information can lead to increased sales by helping
retailers do selective marketing and plan their shelf space.

Sheetal G Naik, DVG Page 1

Fundamentals of Data Science

Figure: Market basket analysis.

Frequent Itemset Mining Methods:

In this section, you will learn methods for mining the simplest form of frequent pat terns such as
those discussed for market basket analysis.

AprioriAlgorithm:

Finding Frequent Itemsets Using Candidate Generation: The Apriori Algorithm

 Apriori is a seminal algorithm proposed by R.Agrawal and R.Srikanthin1994

Srikanthin1994 for mining
frequent itemsets for Boolean association rules.

 The name of the algorithm is based on the fact that the algorithm uses prior knowledge of
frequent item set properties.

 Apriori employs an iterative approach known as a level-wise search, where k-itemsets are
used to explore (k+1)-itemsets.
itemsets.

 1 itemsets is found by scanning the database to accumulate the count

First, the set of frequent 1-itemsets
for each item, and collecting those items that satisfy minimum support. The resulting set is
denoted L1.Next, L1 itemsets,which is used to find
1 is used to find L2, the set of frequent2-itemsets,which
L3,and soon, until no more frequent k-item sets can be found.

 The finding of each Lk requires one full scan of the database.

 A two-step process is followed in Apriori consisting of join and prune action.

Sheetal G Naik, DVG Page 2

Fundamentals of Data Science

Example:
TID ListofitemIDs
T10 I1,I2, I5
0
T20 I2,I4
0
T30 I2,I3
0
T4 I1,I2, I4
00
T5 I1,I3
00
T6 I2,I3
00
T7 I1,I3
00
T8 I1,I2,I3,I5
00
T9 I1,I2, I3
00

There are n in e-transactions in this database, that is, |D| =9.

Steps:

1. In the first iteration of the algorithm, each item is a member of the set of candidate1-itemsets,
C1. The algorithm simply scans all of the transactions in order to count the number of
occurrences of each item.

2. Suppose that the minimum support count required is 2, that is, min sup = 2. The set of
frequent1-itemsets,L1, can then be determined. It consists of the candidate1-itemsets satisfying
minimum support. In our example, all of the candidates in C1satisfy minimum support.

3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 on L1 to generate
a candidate set of 2-itemsets, C2.No candidates are removed from C2 during the prune step
because each subset of the candidates is also frequent.

4. Next, the transactions in D are scanned and the support count of each candidate itemset In C2 is
accumulated.

5. The set of frequent2-itemsets, L2, is then determined, consisting of those candidate 2 - itemsets
in C2 having minimum support.

6. The generation of the set of candidate3-itemsets,C3,From the join step, we first get
C3=L2xL2=({I1,I2,I3},{I1,I2,I5},{I1,I3,I5},{I2,I3,I4},{I2,I3,I5},{I2,I4,I5}. Based on the
Apriori property that all subsets of a frequent itemset must also be frequent, we can determine
that the four latter candidates cannot possibly be frequent.
Sheetal G Naik, DVG Page 3
Fundamentals of Data Science

7. The transactions in D are scanned in order to determine L3, consisting of those candidate3-
itemsets in C3having minimum support.

8. The algorithm uses L3xL3 to generate a candidate set of 4-itemsets, C4.

Sheetal G Naik, DVG Page 4

Fundamentals of Data Science

FP-growth (finding frequent itemsets with out candidate generation)

 Were-examine the mining of transaction database, D, of Table5.1 in Example5.3 using the

frequent pattern growth approach.

 The first scan of the database is the same as Apriori, which derives the set of frequent items (1-
itemsets) and their support counts (frequencies). Let the minimum support count be 2. The set
of frequent items is sorted in the order of descending support count. This resulting set or list is
denoted L.

 An FP-tree is then constructed as follows. First, create the root of the tree, labeled with “null.”
Scan database D a second time. The items in each transaction are processed in L order (i.e.,
sorted according to descending support count), and a branch is created for each transaction.

Sheetal G Naik, DVG Page 5

Fundamentals of Data Science

 For example, the scan of the first transaction, “T100: I1, I2,I5,” which contains three items (I2,
I1, I5 in L order), leads to the construction of the first branch of the tree with three nodes, hI2:
1i, hI1:1i, and hI5: 1i, where I2 is linked as a child of the root, I1 is linked to I2, and I5 is
linkedtoI1.

 The second transaction, T200, contains the items I2 and I4 in L order, which would result in a
branch where I2 is linked to the root and I4 is linked to I2. However, this branch would share a
common prefix, I2, with the existing path for T100.

 Therefore, we instead increment the count of the I2 node by 1, and create a new node,
hI4:1i,which is linked as a child of hI2: 2i. In general, when considering the branch to be added
for a transaction, the count of each node along a common prefix is incremented by 1, and nodes
for the items following the prefix are created and linked accordingly.

 To facilitate tree traversal, an item header table is built so that each item points to its
occurrences in the tree via a chain of node-links. The tree obtained after scanning all of the
transactions is shown in Figure 5.7 with the associated node-links. In this way, the problem of
mining frequent patterns in databases is transformed to that of mining the FP-tree.

Sheetal G Naik, DVG Page 6

Fundamentals of Data Science

The FP-tree is mined as follows.

 Start from each frequent length-1 pattern (as an initial suffix pattern), construct its conditional
pattern base (a “sub database,” which consists of the set of prefix paths in the FP-tree co-
occurring with the suffix pattern), then construct its (conditional) FP-tree, and perform mining
recursively on such a tree. The pattern growth is achieved by the concatenation of the suffix
pattern with the frequent patterns generated from a conditional FP-tree.

 Mining of the FP-tree is summarized in Table 5.2 and detailed as follows. We first consider I5,
which is the last item in L, rather than the first. The reason for starting at the end of the list will
become apparent as we explain the FP-tree mining process. I5 occurs in two branches of the
FP-tree of Figure 5.7. (The occurrences of I5 can easily be found by following its chain of
node-links.) The paths formed by these branches are hI2, I1, I5: 1i and hI2, I1,I3,I5:1i.

 Therefore, considering I5 as a suffix, its corresponding two prefix paths are hI2, I1: 1i and hI2,
I1, I3: 1i, whichformitsconditionalpatternbase.ItsconditionalFP-treecontainsonlyasinglepath,
hI2:2,I1:2i;I3 is not included because its support count of 1is less than the minimum support
count.

 The single path generates all the combinations of frequent patterns: fI2,I5:2g,
fI1,I5:2g,fI2,I1,I5:2g.

Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found, it is straight forward
to generate strong association rules from the

Sheetal G Naik, DVG Page 7

Fundamentals of Data Science

Compact Representation of Frequent ItemSet

 Form any applications, it is difficult to find strong associations among data items at low or
primitive levels of abstraction due to the sparsity of data at those levels.
 Strong associations discovered at high levels of abstraction may represent common sense
knowledge.
 Therefore, data mining systems should provide capabilities for mining association rules at
multiple levels of abstraction, with sufficient flexibility for easy traversal among different
abstraction spaces.
 Association rules generated from mining data at multiple levels of abstraction are called
multiple-level or multilevel association rules.
 Multilevel association rules can be mined efficiently using concept hierarchies under a
support-confidence frame work.
 In general, a top-down strategy is employed, where counts are accumulated for the calculation
of frequent itemsets at each concept level, starting at the concept level 1 and working
downward in the hierarchy toward the more specific concept levels, until no more frequent
itemsets can be found.
 A concept hierarchy defines a sequence of mappings from a set of low-level concepts to
higher level, more general concepts. Data can be generalized by replacing low-level concepts
within the data by their higher-level concepts, orancestors, from a concept hierarchy.

Sheetal G Naik, DVG Page 8

Fundamentals of Data Science

The concept hierarchy has five levels, respectively referred to as levels 0 to 4, starting
with level 0 at the rootnode for all.
 Here, Level 1 includes computer, software, printer&camera, and computer accessory.
 Level2 includes laptop computer, desktop computer, office software, antivirus software.
 Level3 includes IBM desktop computer, Microsoft office software, and so on.
 Level4 is the most specific abstraction level of this hierarchy.

Approaches For Mining Multilevel Association Rules

1. Uniform MinimumSupport:
 The same minimum support threshold is used when mining at each level of abstraction. When
a uniform minimum support threshold is used, the search procedure is simplified. The method
is also simple in that users are required to specify only one minimum support threshold.

 The uniform support approach, however, has some difficulties. It is unlikely that items at
lower levels of abstraction will occur as frequently as those at higher levels of abstraction.

 If the minimum support threshold is set too high, it could miss some meaningful associations
occurring at low abstraction levels. If the threshold is set too low, it may generate any
uninteresting associations occurring at high abstraction levels.

2. Reduced Minimum Support:

 Each level of abstraction has its own minimum support threshold.
 The deeper the level of abstraction, the smaller the corresponding threshold is.
For example, the minimum support thresholds for levels1 and 2 are 5% and %,
respectively. In this way, ―computer, ―laptop computer, and ―desktop computer are all
considered frequent.

Sheetal G Naik, DVG Page 9

Fundamentals of Data Science

3. Group-Based Minimum Support:

 Because users or experts often have insight as to which groups are more important than
others, it is sometimes more desirable to set up user-specific, item, or group based
minimal support thresholds when mining multilevel rules.

 For example, a user could set up the minimum support thresholds based on product price,
or on items of interest, such as by setting particularly low support thresholds for laptop
computers and flash drives in order to pay particular attention to the association patterns
containing items in these categories.

Sheetal G Naik, DVG Page 10

System Manual: Motoeye SF
No ratings yet
System Manual: Motoeye SF
65 pages
950F Prueba y Ajuste Transmision Sistema Electrico
100% (1)
950F Prueba y Ajuste Transmision Sistema Electrico
38 pages
BOM F1-Pro v1.1
No ratings yet
BOM F1-Pro v1.1
1 page
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Unısphere For Vmax
No ratings yet
Unısphere For Vmax
894 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Bvu 950
No ratings yet
Bvu 950
92 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
Computer Organization Hamacher
100% (1)
Computer Organization Hamacher
171 pages
Unit 2
No ratings yet
Unit 2
65 pages
SANS Lightweight Python Based Malware Analysis Pipeline
No ratings yet
SANS Lightweight Python Based Malware Analysis Pipeline
99 pages
UNIT-3 DM
No ratings yet
UNIT-3 DM
9 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Computer Systems: Lecture 7: Virtual Memory
No ratings yet
Computer Systems: Lecture 7: Virtual Memory
103 pages
3 FrequentItemsetMining
No ratings yet
3 FrequentItemsetMining
63 pages
Unit 3
No ratings yet
Unit 3
62 pages
Week 3
No ratings yet
Week 3
56 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM 2
No ratings yet
DM 2
71 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
Data Mining
No ratings yet
Data Mining
41 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
WSN 1
No ratings yet
WSN 1
41 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
Unit-1-Database System Architecture
No ratings yet
Unit-1-Database System Architecture
38 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Report Intern (Newest)
100% (1)
Report Intern (Newest)
48 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Slide Deck M1 2
No ratings yet
Slide Deck M1 2
28 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Edge Detection
No ratings yet
Edge Detection
33 pages
Association
No ratings yet
Association
40 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
LP Day 1
100% (1)
LP Day 1
5 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Chapter 1 Introducing Knowledge Management
100% (2)
Chapter 1 Introducing Knowledge Management
11 pages
Aimo Sample Paper: ×C 2010 Australian Mathematics Trust
No ratings yet
Aimo Sample Paper: ×C 2010 Australian Mathematics Trust
26 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
ThinkPad E14 Datasheet
No ratings yet
ThinkPad E14 Datasheet
4 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
Unit II
No ratings yet
Unit II
22 pages
Cisco Catalyst 3550 Series Switches Datasheet
No ratings yet
Cisco Catalyst 3550 Series Switches Datasheet
15 pages
Train Embankment Loads in DeepEX
No ratings yet
Train Embankment Loads in DeepEX
8 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
PowerPoint Size
No ratings yet
PowerPoint Size
6 pages
3.2.4 Reference Document-LTE 3G GUTI & P-TMSI Mapping
No ratings yet
3.2.4 Reference Document-LTE 3G GUTI & P-TMSI Mapping
3 pages
PIP-5048MGX Manual-20211006
No ratings yet
PIP-5048MGX Manual-20211006
63 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Nordic ID Stix Datasheet v12
No ratings yet
Nordic ID Stix Datasheet v12
1 page
Commerce Act
No ratings yet
Commerce Act
5 pages
DI02000101 Syllabus
No ratings yet
DI02000101 Syllabus
6 pages
Stevenson Chapter 10 - Control
No ratings yet
Stevenson Chapter 10 - Control
38 pages
Paras Manchanda: Software Engineer
No ratings yet
Paras Manchanda: Software Engineer
2 pages
Training Break Up - SAP ECC To S4HANA Migration With Fiori
No ratings yet
Training Break Up - SAP ECC To S4HANA Migration With Fiori
3 pages
Continous Deployment
No ratings yet
Continous Deployment
2 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet

FDS Unit - 3

Uploaded by

FDS Unit - 3

Uploaded by

Fundamentals of Data Science

Mining Frequent pattern

Definition of Frequent Patterns: Frequent patterns refer to combinations of items, sequences, or

Market Basket Analysis: A Motivating Example

Sheetal G Naik, DVG Page 1

Figure: Market basket analysis.

Frequent Itemset Mining Methods:

Finding Frequent Itemsets Using Candidate Generation: The Apriori Algorithm

 Apriori is a seminal algorithm proposed by R.Agrawal and R.Srikanthin1994

 1 itemsets is found by scanning the database to accumulate the count

 The finding of each Lk requires one full scan of the database.

 A two-step process is followed in Apriori consisting of join and prune action.

Sheetal G Naik, DVG Page 2

There are n in e-transactions in this database, that is, |D| =9.

8. The algorithm uses L3xL3 to generate a candidate set of 4-itemsets, C4.

Sheetal G Naik, DVG Page 4

FP-growth (finding frequent itemsets with out candidate generation)

 Were-examine the mining of transaction database, D, of Table5.1 in Example5.3 using the

Sheetal G Naik, DVG Page 5

Sheetal G Naik, DVG Page 6

The FP-tree is mined as follows.

Generating Association Rules from Frequent Itemsets

Sheetal G Naik, DVG Page 7

Compact Representation of Frequent ItemSet

Sheetal G Naik, DVG Page 8

Approaches For Mining Multilevel Association Rules

2. Reduced Minimum Support:

Sheetal G Naik, DVG Page 9

3. Group-Based Minimum Support:

Sheetal G Naik, DVG Page 10

You might also like