ML | ECLAT Algorithm

Last Updated : 10 Mar, 2025

ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular and efficient technique used for association rule mining. It is an improved alternative to the Apriori algorithm, offering better scalability and computational efficiency. Unlike Apriori, which follows a horizontal database layout and employs a breadth-first search (BFS) approach, ECLAT adopts a vertical database representation and uses depth-first search (DFS).

This vertical approach significantly reduces the number of database scans, making ECLAT faster and more memory-efficient, especially for large datasets.

Key Differences between ECLAT and Apriori

Apriori Algorithm: Uses a horizontal database layout and follows BFS, requiring multiple database scans.
ECLAT Algorithm: Uses a vertical database layout and follows DFS, reducing the number of database scans.

For example, in Apriori, frequent single-item sets are identified first, followed by expansion to larger itemsets, requiring multiple database scans. ECLAT solves this by storing transactions in a vertical format (TID sets), which allows efficient intersection operations.

How ECLAT Algorithm Works

Let’s walk through an example to better understand how the ECLAT algorithm works. Consider the following transaction dataset represented in a Boolean matrix:

The core idea of the ECLAT algorithm is based on the intersection of datasets to calculate the support of itemsets, avoiding the generation of subsets that are not likely to exist in the dataset. Here’s a breakdown of the steps:

Step 1: Create the Tidset

The first step is to generate the tidset for each individual item. A tidset is simply a list of transaction IDs where the item appears. For example:

k = 1, minimum support = 2

Item	Tidset
Bread	{T1, T4, T5, T7, T8, T9}
Butter	{T1, T2, T3, T4, T6, T8, T9}
Milk	{T3, T5, T6, T7, T8, T9}
Coke	{T2, T4}
Jam	{T1, T8}

Step 2: Calculate the Support of Itemsets by Intersecting Tidsets

ECLAT then proceeds by recursively combining the tidsets. The support of an itemset is determined by the intersection of tidsets. For example:

k = 2

Item	Tidset
{Bread, Butter}	{T1, T4, T8, T9}
{Bread, Milk}	{T5, T7, T8, T9}
{Bread, Coke}	{T4}
{Bread, Jam}	{T1, T8}
{Butter, Milk}	{T3, T6, T8, T9}
{Butter, Coke}	{T2, T4}
{Butter, Jam}	{T1, T8}
{Milk, Jam}	{T8}

Step 3: Recursive Call and Generation of Larger Itemsets

The algorithm continues recursively by combining pairs of itemsets (k-itemsets) checking the support by intersecting the tidsets. The recursion continues until no further frequent itemsets can be generated.

k = 3

Item	Tidset
{Bread, Butter, Milk}	{T8, T9}
{Bread, Butter, Jam}	{T1, T8}

Step 4: Stop When No More Frequent Itemsets Can Be Found

The algorithm stops once no more itemset combinations meet the minimum support threshold.

k = 4

Item	Tidset
{Bread, Butter, Milk, Jam}	{T8}

We stop at k = 4 because there are no more item-tidset pairs to combine. Since minimum support = 2, we conclude the following rules from the given dataset:-

Items Bought	Recommended Products
Bread	Butter
Bread	Milk
Bread	Jam
Butter	Milk
Butter	Coke
Butter	Jam
Bread and Butter	Milk
Bread and Butter	Jam

Advantages of the ECLAT Algorithm

Efficient in Dense Datasets: Performs better than Apriori in datasets with frequent co-occurrences.
Memory Efficient: Uses vertical representation, reducing redundant scans.
Fast Itemset Intersection: Computing itemset support via TID-set intersections is faster than scanning transactions repeatedly.
Better Scalability: Can handle larger datasets due to its depth-first search mechanism.

Disadvantages of the ECLAT Algorithm

High Memory Requirement: Large TID sets can consume significant memory.
Not Suitable for Sparse Data: Works better in dense datasets, but performance drops for sparse datasets where intersections result in small itemsets.
Sensitive to Large Transactions: If a transaction has too many items, its corresponding TID-set intersections can be expensive.

Applications of ECLAT Algorithm

Market Basket Analysis: Identifying frequently purchased items together.
Recommendation Systems: Suggesting products based on past purchase patterns.
Medical Diagnosis: Finding co-occurring symptoms in medical records.
Web Usage Mining: Analyzing web logs to understand user behavior.
Fraud Detection: Discovering frequent patterns in fraudulent activities.

ML - Candidate Elimination Algorithm

AlindGupta

Improve

Article Tags :

Practice Tags :