The Eclat Algorithm
Mining Ideas for Today and Tomorrow
The Eclat Algorithm
Presented by
Islam Nader Desokey
Sherif Yehia Abd ELghany
Presented to
Prof. Dr. Hanafy Ismail
ECLAT Algorithm
-
ECLAT Algorithm is the first algorithm for frequent itemsets with depth-first.
The Eclat algorithm is used to perform item-set mining. Item-set mining let
us find frequent patterns in data like if a consumer buys milk, he also buys
bread. This type of pattern is called association rules and is used in many
application domains.
The basic idea for the eclat algorithm is use tid-set intersections to
compute the support of a candidate item-set avoiding the generation of
subsets that does not exist in the prefix tree
Take the advantage of the Apriori property in the generation of candidate
(k+1)-itemset from k-itemsets
Algorithm definition
The Eclat algorithm is defined recursively.
The initial call uses all the single items with their Tid-sets. In each recursive
call, the function Intersect Tid-sets verifies each (item-set Tid-set) pair
{X,t(X)} with all the others pairs {Y,t(Y)} to generate new candidates
N_XY. If the new candidate is frequent, it is added to the set P_X.
Then, recursively, it finds all the frequent itemsets in the X branch. The
algorithm searches in a DFS manner to find all the frequent sets.
ECLAT: FP Mining with Vertical Data Format
Both Apriori and FP-growth use horizontal data format
TID
List of item IDS
T100
I1,I2,I5
T200
I2,I4
T300
I2,I3
T400
I1,I2,I4
T500
I1,I3
T600
I2,I3
T700
I1,I3
T800
I1,I2,I3,I5
T900
I1,I2,I3
itemset
TID_set
I1
{T100,T400,T500,T700,T800,T900}
I2
{T100,T200,T300,T400,T600,T800,T900}
I3
{T300,T500,T600,T700,T800,T900}
I4
{T200,T400}
I5
{T100,T800}
Alternatively data can also be represented in vertical format
ECLAT Algorithm by Example
Transform the horizontally formatted data to the vertical
format by scanning the database once
TID
List of item IDS
T100
I1,I2,I5
T200
I2,I4
T300
I2,I3
T400
I1,I2,I4
T500
I1,I3
T600
I2,I3
T700
I1,I3
T800
I1,I2,I3,I5
T900
I1,I2,I3
itemset
TID_set
I1
{T100,T400,T500,T700,T800,T900}
I2
{T100,T200,T300,T400,T600,T800,T900}
I3
{T300,T500,T600,T700,T800,T900}
I4
{T200,T400}
I5
{T100,T800}
The support count of an itemset is simply the length of the
TID_set of the itemset
ECLAT Algorithm by Example
Frequent 1-itemsets in vertical format
itemset
TID_set
I1
{T100,T400,T500,T700,T800,T900}
I2
{T100,T200,T300,T400,T600,T800,T900}
I3
{T300,T500,T600,T700,T800,T900}
I4
{T200,T400}
I5
{T100,T800}
min_sup=2
The frequent k-itemsets can be used to construct the candidate
(k+1)-itemsets based on the Apriori property
ECLAT Algorithm by Example
The frequent k-itemsets can be used to construct the candidate
(k+1)-itemsets based on the Apriori property
Frequent 2-itemsets in vertical format
itemset
TID_set
{I1,I2}
{T100,T400,T800,T900}
{I1,I3}
{T500,T700,T800,T900}
{I1,I4}
{T400}
{I1,I5}
{T100,T800}
{I2,I3}
{T300,T600,T800,T900}
{I2,I4}
{T200,T400}
{I2,I5}
{T100,T800}
{I3,I5}
{T800}
ECLAT Algorithm by Example
Frequent 3-itemsets in vertical format
itemset
TID_set
{I1,I2,I3}
{T800,T900}
{I1,I2,I5}
{T100,T800}
min_sup=2
This process repeats, with k incremented by 1 each time, until no
frequent items or no candidate itemsets can be found
Example (2): Eclat Algorithm
First algorithm for frequent itemsets with depth-first
1
2
3
6
7
8
1
2
3
5
6
9
10
1
2
4
7
9
1
3
5
8
10
3
4
5
6
7
8
9
10
10
Example (2): Eclat algorithm
Step1:
transform to vertical format
DB
TID
Items
Step2:
a, b, c ,d
a, b, c
Depth-first traversed
Left to right
a, b ,d ,e
c ,e
b ,d ,e
a, b, e
a, c, e
a ,d ,e
b ,c ,e
10
b ,d ,e
(d)
(e)
1
3
3
6
Support =2
1
2
Da
1
2
3
6
1
2
7
1
3
8
3
6
7
8
Dab
Dabc
(d)
(e)
1
2
3
6
7
8
1
2
3
5
6
9
10
1
2
4
7
9
1
3
5
8
10
3
4
5
6
7
8
9
10
Db
Dac
Dabd
Dd
(d)
1
2
9
1
3
5
10
3
5
6
9
10
4
7
9
3
5
8
10
Dad
e
3
8
Dc
Dbc
(d)
(e)
Dbd
3
5
10
11
ECLAT Algorithm Properties
Properties of mining with vertical data format
Take the advantage of the Apriori property in the generation of candidate (k+1)itemset from k-itemsets
No need to scan the database to find the support of (k+1) itemsets, for k>=1
The TID_set of each k-itemset carries the complete information required for
counting such support
The TID-sets can be quite long, hence expensive to manipulate
It uses diffset technique to optimize the support count computation.
Diffset: storing the difference between tid-list of k-itemsets and k-1-itemsets