Frequent Pattern Growth Algorithm

The FP-Growth (Frequent Pattern Growth) algorithm efficiently mines frequent itemsets from large transactional datasets. Unlike the Apriori algorithm which suffers from high computational cost due to candidate generation and multiple database scans. FP-Growth avoids these inefficiencies by compressing the data into an FP-Tree (Frequent Pattern Tree) and extracts patterns directly from it.

How FP-Growth Works

Data Compression: First FP-Growth compresses the dataset into a smaller structure called the Frequent Pattern Tree (FP-Tree). This tree stores information about item sets (collections of items) and their frequencies without need to generate candidate sets like Apriori does.
Mining the Tree: The algorithm then examines this tree to identify patterns that appear frequently based on a minimum support threshold. It does this by breaking the tree down into smaller "conditional" trees for each item making the process more efficient.
Generating Patterns: Once the tree is built and analyzed the algorithm generates the frequent patterns (itemsets) and the rules that describe relationships between items.

Imagine you’re organizing a party and want to know popular food combinations without asking every guest repeatedly.

List food items each guest brought transactions.
Count items and remove infrequent ones filter by support.
Group items in order of popularity and create a tree where paths represent common combinations.
Instead of repeatedly asking guests you explore this tree to discover patterns. For example, you might find that pizza and pasta often come together or that cake and pasta are also a common pair.

This is exactly how FP-Growth finds frequent patterns efficiently.

Working of FP- Growth Algorithm

Problem Statement: Consider a small grocery store transaction dataset. Each entry shows the set of items purchased together by a customer:

We apply the FP-Growth algorithm to identify frequent itemsets (groups of items frequently bought together), using a minimum support count of 2.

Step 1: Compute Item Frequencies

Scan the entire dataset one time to determine how often each item appears.

All items meet the minimum support threshold (≥ 2), so none are removed.

Step 2: Order Items in Each Transaction by Frequency

Next, sort every transaction in descending order based on how frequently the items appear overall.

Sorting by frequency helps group common items together, allowing the FP-tree to be compressed efficiently.

Step 3: Construct the FP-Tree

Insert the sorted transactions one by one into the FP-tree, merging any shared prefixes.

FP-Tree Insertions:

From T1: Bread → Milk → Butter
From T2: Bread → Butter
From T3: Bread → Milk
From T4: Milk → Butter (new branch since it doesn’t begin with Bread)
From T5: Bread

Visual FP-Tree:

Each node shows: Item (Count)

Step 4: Determine Conditional Pattern Bases

A conditional pattern base contains all prefix paths leading to a specific item. Let’s examine the paths ending with Butter.

Paths that end with Butter:

Bread → Milk → Butter (1 occurrence)
Bread → Butter (1 occurrence)
Milk → Butter (1 occurrence)

Thus, the conditional pattern base for Butter is:

[ (Bread, Milk): 1, (Bread): 1, (Milk): 1 ]

Step 5: Build Conditional FP-Trees

Using the conditional pattern base, construct a smaller FP-tree for each item to identify frequent patterns involving that item. Butter’s conditional FP-tree input:

(Bread, Milk): 1
(Bread): 1
(Milk): 1

Count all items:

Bread: 2
Milk: 2

Since both meet the support threshold (≥2), we can now generate frequent patterns:

{Butter, Bread}
{Butter, Milk}
{Butter, Bread, Milk}

Repeat the process for Milk and Bread as needed.

Step 6: Extract All Frequent Itemsets

From the FP-tree and conditional trees, we get these frequent itemsets:

{Bread}
{Milk}
{Butter}
{Bread, Milk}
{Bread, Butter}
{Milk, Butter}
{Bread, Milk, Butter}

All of these appear at least 2 times in the transactions.

Summary Table

Why FP-Growth is Efficient

It scans the database only twice.
It avoids generating all combinations of items.
It stores data in a compact tree, reducing redundancy.
It uses conditional trees to mine deeper patterns efficiently.

FP-Growth vs. Apriori Algorithm

Feature	FP-Growth	Apriori
Candidate generation	Not required	Needed
Data scans	2	Several
Speed	Efficient for large datasets	Slower because of repeated scans
Memory use	Higher (requires a tree structure)	Lower
Complexity	More difficult to implement	Simpler to grasp and build

Applications

Market Basket Analysis: Identifying items that are frequently purchased together.
Recommendation Systems: Suggesting products based on frequent item patterns.
Customer Behavior Analysis: Understanding buying habits in retail and e-commerce.
Web Usage Mining: Finding common browsing or click paths among users.
Text Mining: Detecting frequently occurring words or phrases in documents.

Advantages

No Candidate Generation: FP-Growth avoids generating candidate sets, improving efficiency.
Minimal Data Scans: Only two database scans are required.
High Performance on Large Data: Works very well for big and dense datasets.
Data Compression: Uses an FP-tree to store transactions compactly.
Scalable and Fast: Capable of mining complex and large frequent patterns quickly.

Limitations

High Memory Usage: FP-tree can become large, consuming significant memory.
Complex Implementation: Harder to understand and implement than Apriori.
Heavy Conditional Mining: Building conditional FP-trees can be computationally costly.
Poor for Diverse Transactions: Not ideal when transactions have many unique items.
Not Incremental: Cannot easily update hence tree must be rebuilt when data changes.