0% found this document useful (0 votes)
70 views

Comparison of Two Association Rule Mining Algorith PDF

This document compares the Matrix Apriori and FP-Growth algorithms for mining association rules from transactional databases without candidate generation. It analyzes the performance of the two algorithms in two phases: 1) building the data structure and 2) finding frequent itemsets. The algorithms are tested on two datasets with different characteristics. The findings show that algorithm performance depends on dataset characteristics and minimum support threshold. Matrix Apriori outperforms FP-Growth for thresholds below 10%. While building the matrix data structure has higher cost, Matrix Apriori finds itemsets faster.

Uploaded by

crana10
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Comparison of Two Association Rule Mining Algorith PDF

This document compares the Matrix Apriori and FP-Growth algorithms for mining association rules from transactional databases without candidate generation. It analyzes the performance of the two algorithms in two phases: 1) building the data structure and 2) finding frequent itemsets. The algorithms are tested on two datasets with different characteristics. The findings show that algorithm performance depends on dataset characteristics and minimum support threshold. Matrix Apriori outperforms FP-Growth for thresholds below 10%. While building the matrix data structure has higher cost, Matrix Apriori finds itemsets faster.

Uploaded by

crana10
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/267959080

Comparison of two association rule mining


algorithms without candidate generation

Article January 2010

CITATIONS READS

10 459

2 authors, including:

Belgin Ergen
Izmir Institute of Technology
22 PUBLICATIONS 68 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

DFIS: Dynamic Itemset Mining and Hiding under Multiple Support Thresholds View project

Linked Data Query Federation View project

All content following this page was uploaded by Belgin Ergen on 26 June 2015.

The user has requested enhancement of the downloaded file.


B. Yildiz and B. Ergenc. "Comparison of Two Association Rule Mining Algorithms without Candidate Generation". In the 10th IASTED International Conference on
Artificial Intelligence and Applications (AIA 2010), Innsbruck, Austria, pp.450 457, Feb 15 17, 2010

COMPARISON OF TWO ASSOCIATION RULE MINING ALGORITHMS


WITHOUT CANDIDATE GENERATION

Bar Yldz, Belgin Ergen


Department of Computer Engineering
Izmir Institute of Technology
Gulbahce Koyu, 35430 Urla
Izmir, Turkey
[email protected], [email protected]

ABSTRACT associations or casual structures among sets of items in


Association rule mining techniques play an important role the transactional databases or other data repositories [1].
in data mining research where the aim is to find If you consider market basket data, the purchasing of one
interesting correlations among sets of items in databases. product(X) when another product(Y) is purchased
Although the Apriori algorithm of association rule mining represents an association rule [2] and displayed as XY.
is the one that boosted data mining research, it has a Association rule mining process consists of two steps:
bottleneck in its candidate generation phase that requires finding frequent itemsets and generating rules. The rules
multiple passes over the source data. FP-Growth and are generated from frequent itemsets. An itemset is a set
Matrix Apriori are two algorithms that overcome that of items in the database. Frequent itemset is an itemset of
bottleneck by keeping the frequent itemsets in compact which support value (percentage of transactions in the
data structures, eliminating the need of candidate database that contain both X and Y) is above the threshold
generation. To our knowledge, there is no work to defined as minimum support. The main concentration of
compare those two similar algorithms focusing on their most association rule mining algorithms is to find
performances in different phases of execution. In this frequent itemsets in an efficient way to reduce the overall
study, we compare Matrix Apriori and FP-Growth cost of the process.
algorithms. Two case studies analyzing the algorithms are Association rule mining was first introduced by [3], and in
carried out phase by phase using two synthetic datasets [4] the popular Apriori algorithm was proposed. It
generated in order i) to see their performance with computes the frequent itemsets in the database through
datasets having different characteristics, ii) to understand several iterations. Each iteration has two steps: candidate
the causes of performance differences in different phases. generation and candidate selection [5]. Database is
Our findings are i) performances of algorithms are related scanned at each iteration. Apriori algorithm uses large
to the characteristics of the given dataset and threshold itemset property: any subset of a large itemset must be
value, ii) Matrix Apriori outperforms FP-Growth in total large. Candidate itemsets are generated as supersets of
performance for threshold values below 10%, iii) although only large itemsets found at previous iteration. This
building matrix data structure has higher cost, finding reduces the candidate itemset number. Among many
itemsets is faster. versions of Apriori [6 and 7], FP-Growth has been
proposed in association rule mining research with the idea
KEY WORDS of finding frequent itemsets without candidate generation
Data Mining, Association Rule Mining, FP-Growth, [8]. FP-Growth uses tree data structure and scans database
Matrix Apriori only twice showing notable impact on the efficiency of
itemset generation phase. Lately an approach named
Matrix Apriori is introduced with the claim of combining
1. Introduction positive properties of Apriori and FP-Growth algorithms
[9]. In this approach, database is scanned twice as in the
Data mining defined as finding hidden information from case of FP-Growth and matrix structure used is simpler to
large data sources has become a popular way to discover maintain. Although it is claimed to perform better than
strategic knowledge. Direct mail marketing, web site FP-Growth, performance comparison of both algorithms
personalization, bioinformatics, credit card fraud are not shown in that work.
detection and market basket analysis are some examples In this study, we analyzed performances of FP-Growth
where data mining techniques are commonly used. and Matrix Apriori algorithms which are similar in the
Association rule mining is one of the most important and way of overcoming the bottleneck of candidate generation
well researched techniques of data mining. It aims to by the help of compact data structures to find the frequent
extract interesting correlations, frequent patterns, itemsets. The algorithms are compared not only with their
total runtimes but also with their performances related fragment growth mining method is developed based on
with the phases. Finding frequent items and building the this tree. FP-growth algorithm scans database only twice.
data structure is analyzed as first phase and finding It uses a divide and conquer strategy. Algorithm relies on
frequent itemsets is analyzed as second phase. Test runs Depth First Search scans while in Apriori Breath First
are carried on two datasets with different characteristics Search scan is used [14]. It is stated in [8] that FP-growth
representing the needs of different domains. Impact of the is at least an order of magnitude faster than Apriori.
frequent items and the number of frequent itemsets on the In several extensions for both Apriori and FP-growth
performance of algorithms are also observed. accuracy of results is sacrificed for better speed. Matrix
The following of this paper is organized as follows. Apriori proposed in [9], combines positive properties of
Section 2 reviews the association rule mining research. these two algorithms. Algorithm employs two simple
Description of the FP-Growth and Matrix Apriori structures: A matrix of frequent items called MFI and a
algorithms are given in Section 3, including our vector storing the support of candidates called STE.
implementation steps. In Section 4, phase by phase Matrix Apriori consists of three procedures. First builds
performance analysis of the evaluated algorithms is given matrix MFI and populates vector STE. Second modifies
with discussion on the results. Finally, we conclude the matrix MFI to speed up frequent pattern search. Third
paper in Section 5. identifies frequent patterns using matrix MFI and vector
STE.
Detailed studies for comparing performances of Apriori
2. Related Work and FP-Growth algorithms can be found in [8, 14 and 15].
These studies reveal out that FP-Growth perform better
The progress in bar-code and computer technology has than Apriori when minimum support value is decreased.
made it possible to collect data about sales and store as Matrix Apriori algorithm combining the advantages of
transactions which is called basket data. This stored data Apriori and FP-Growth was proposed as a faster and
attracted researches to apply data mining to basket data. simpler alternative to these algorithms but there is no
As a result association rules mining came into prominence work showing its performance. However, a study
which is mentioned as synonymous to market basket concentrating on the weaknesses and strengths of Matrix
analysis. Association rule mining, which was first Apriori and FP-Growth would be worth considering since
mention by [3], is one of the most popular data mining they eliminate the disadvantage of candidate generation of
approaches. Not only in market business but also in Apriori-like algorithms.
variety of areas association rule mining is used efficiently.
In [10], Apriori algorithm is used on a diabetic database
and developed application is used to discover social status 3. Description and Implementation of
of diabetics. In a report [11], association rules are listed Algorithms
in the success stories part and in a survey [12] the Apriori
algorithm is listed in top 10 data mining algorithms. In this part of the paper, firstly FP-Growth and Matrix
The proposed algorithm in [3] makes multiple passes over Apriori algorithms are described to be self contained and
database. In each pass, beginning from one element following that the implementation steps of the algorithms
itemsets, the support values of itemsets are counted. These are given.
itemsets are called candidate itemsets which are extended
from the frontier sets delivered from previous pass. If a 3.1 Description
candidate itemset is measured as frequent then it is added
to frontier sets for the next pass. In this section, two association rule mining algorithms are
The Apriori algorithm proposed in [4] boosted data explained: FP-Growth and Matrix Apriori. Both
mining research with its simple way of implementation. algorithms are proposed as a better alternative to Apriori
The algorithm generates candidate itemsets to be counted algorithm in terms of overcoming its bottleneck as several
in a pass by using only the itemsets found large in scans of database for candidate generation and testing.
previous pass without considering all of the transactions FP-Growth was introduced to overcome candidate
in the database. So too many unnecessary candidate generation and testing problem. The database is scanned
generation and support counting is avoided. Apriori is only twice. Matrix Apriori is another approach of which
characterized as a level-wise complete search algorithm database scan strategy is similar to FP-Growth. Although
using anti-monotonicity of itemsets, if an itemset is not the scan strategy is similar, the data structures to keep
frequent, any of its superset is never frequent [12 and itemsets are different. In following two subsections, the
13]. approaches of the algorithms are explained with an
There have been many improvements like [6 and 7] for example. After description section, in implementation
Apriori algorithm but the most significant one is FP- section the steps we go through while implementing the
Growth algorithm proposed in [8]. The main objective is algorithms are given.
to skip candidate generation and test step which is the
bottleneck of the Apriori like methods. The algorithm
uses a compact data structure called FP-tree and pattern
3.1.1 FP-Growth First, a scan of database derives a list of frequent items in
descending order (see Figure 1a). Then FP-tree is
The FP-Growth methods adopts a divide and conquer constructed as follows. Create the root of the tree and scan
strategy as follows: compress the database representing the database second time. The items in each transaction
frequent items into a frequent-pattern tree, but retain the are processed in the order of frequent items list and a
itemset association information, and then divide such a branch is created for each transaction. When considering
compressed database into a set of condition databases, the branch to be added for a transaction, the count of each
each associated with one frequent item, and mine each node along a common prefix is incremented by 1. In
such database [8]. Figure 1b, we can see the transactions and the tree
In Figure 1 FP-Growth algorithm is visualized for an constructed.
example database with minimum support value 2 (50%).

Figure 1. FP-Growth example

After constructing the tree the mining proceeds as In Figure 2, Matrix Apriori algorithm is demonstrated.
follows. Start from each frequent length-1 pattern, The example database is the same database used in
construct its conditional pattern base, then construct its previous section and minimum support value is again 2
conditional FP-tree and perform mining recursively on (%50). Firstly, a database scan to determine frequent
such a tree. The support of a candidate (conditional) items is executed and a frequent items list is obtained. The
itemset is counted traversing the tree. The sum of count list is in descending order (see Figure 2a). Following this,
values at least frequent items nodes gives the support a second scan on database is executed. During the scan
value. The frequent pattern generation process is the MFI and STE is built as follows. Each transaction is
demonstrated in Figure 1c. read. If the transaction has any item that is in the frequent
item list then it is represented as 1 and otherwise 0.
3.1.2 Matrix Apriori This pattern is added as a row to MFI matrix and its
occurrence is set to 1 in STE vector. While reading
Matrix Apriori [9] is similar to FP-Growth in the database remaining transactions if the transaction is already
scan step. However, the data structure build for Matrix included in MFI then in STE its occurrence is
Apriori is a matrix representing frequent items (MFI) and incremented. Otherwise it is added to MFI and its
a vector holding support of candidates (STE). The search occurrence in STE is set to 1. After reading transactions
for frequent patterns is executed on this two structures, the MFI matrix is modified to speed up frequent pattern
which are easier to build and use compared to FP-tree. search. For each column of MFI, beginning from the first
row, the value of a cell is set to the row number in which candidate itemsets and count its support value. The
the item is 1. If there is not any 1 in remaining rows support value of an itemset is the sum of the items at STE
then the value of the cell is set to 1 which means down of which index are rows where all the items of the
to the bottom of the matrix, no row contains this item (see candidate itemset are included in MFIs related row.
Figure 2b). Frequent itemsets found can be seen in Figure 2c.
After constructing the MFI matrix finding patterns is
simple. Beginning from the least frequent item, create

Figure 2. Matrix Apriori example

It will be beneficial to give a short comparison of given Finding patterns for both algorithms need producing
algorithms with an example to show the execution of the candidate itemsets and control. This is called conditional
algorithms. First scans of both algorithms are carried out pattern base in FP-Growth and there is no specific name
in the same way. Frequent items are found and listed in for Matrix Apriori. Counting support value is easy to
order. During second scan, FP-Growth adds transactions handle in Matrix Apriori. However, in FP-Growth
to tree structure and Matrix Apriori to matrix structure. traversing the tree is complex.
Addition of a transaction to the tree structure needs less
control compared to matrix structure. For example, 3.2 Implementation
consider 2nd and 3rd transactions. Second transaction is
added as a branch to the tree and as a row to the matrix. In this section, we give brief information about the
But addition of third transaction shows the difference. For implementation of algorithms. Algorithms, which are
tree structure we need to control only the branch that has explained in previous section, are coded as they are
the same prefix with our transaction. So addition of a new understood from related papers [8 and 9]. For both
branch to node E is enough. On the other hand, for the algorithms the dataset file is read in order to take the
matrix structure we need to control all the items of rows. information about number of transactions, number of
If we find the same pattern then we increase the related items and name of items to create a temporary file and
item of STE. Otherwise we need to scan matrix till we data mining process is carried out on this file. In this
find the same pattern. If we cannot find then a new row is paper, database term is used for the temporary file.
added to matrix. It seems that building matrix needs more
control and time, however, management of matrix
structure is easier compared to tree structure.
3.2.1 FP-Growth the causes of performance differences in different phases.
In order to keep the system state similar for all test runs,
The implementation of FP-Growth is divided into three we assured all back-ground jobs which consume system
steps. resources were inactive. It is also ensured that test runs
Step 1: Database is read and the count of items is found. give close results when repeated.
According to the minimum support threshold,
frequent items are selected and sorted. 4.1 Simulation Environment
Step 2: Initialization of the FP-tree is done. From the
frequent items a node list is created which will The test runs are performed on a computer with 2.4 GHz
be connected to nodes of the tree. After dual core processor and 3 GB memory. At each run, both
initialization the database is read again. This programs give results about data mining process. These
time, if an item in a transaction is selected as are
frequent then it is added to the tree structure. time cost for first scan of database,
Step 3: Beginning from the least frequent item, a number of frequent items found at first scan of
frequent pattern finder procedure is called database,
recursively. The support count of the patterns are time cost for second scan of database and
found and displayed if they are frequent. building the data structure,
time cost for finding frequent itemsets,
3.2.2 Matrix Apriori number of frequent itemsets found after mining
process,
The implementation of Matrix Apriori is divided into four total time cost for whole data mining process.
steps. Procedures for first database scan are same for both
Step 1: This is carried out in the same way as step 1 of algorithms so time cost is identical. During our case
FP-growth. studies we will call first phase as first scan of database
Step 2: Initialization of MFI is done. According to and second scan performed for building the specific data
frequent items list first row of MFI is created. structure. Second phase will be traversing the data
After initialization, database is read again. Each structures created by the first phase in order to find
transaction is converted to an array. This array is frequent itemsets.
at length of MFIs one row. If in MFI there is a Although real life data has different characteristics from
pattern same as the array of transaction then its synthetically generated data as mentioned in [15], we used
occurrence is increased. Otherwise a new row is synthetic data since the control of parameters were easily
added to MFI. manageable. In [16], the drawbacks of using real world
Step 3: MFI is modified. This modification will speed data and synthetic data and comparison of some dataset
up the pattern search and support counting generators are given. Our aim was to have datasets with
process. different characteristics as representing different domain
Step 4: Similar to FP-Growth beginning from the least needs.
frequent item a procedure is called recursively Synthetic databases are generated using ARtool software
and support values of patterns are counted. [17]. ARtool generates a database according to defined
The implementation steps of the algorithms are explained parameters as number of items, number of transactions,
above. Step 1 of both implementations is identical. In step average size of transactions, number of patterns, average
2, the procedures used for database reading are the same size of patterns. Two data sets are generated varying the
in both algorithm codes. Building the data structure for parameters number of items and average size of
the algorithms is different from each other and additional patterns in order to have different dataset characteristics
step for modifying MFI is needed for Matrix Apriori. of different domains. One data set is characterized with
Candidate generation procedures for both algorithms are long patterns and low diversity of items and other with
equal but counting support is clearly different. short patterns and high diversity of items. These
differences affect the size of the specific data structures of
the algorithms and so the run times.
4. Performance Evaluation In the following subsections, performance analysis on the
algorithms for two case studies is given. For the generated
In this section, we compare Matrix Apriori and FP- data sets, we aimed to observe how change of minimum
Growth algorithms based on the publications discussed in support affects the performance of algorithms. The
previous chapter. Both algorithms are coded using algorithms are compared for six minimum support values
Lazarus IDE (02.96.2) which uses Pascal programming in the range of 15% and 2,5%.
language. ARTool (1.1.2) dataset generator is used for our
synthetic datasets. Two case studies analyzing the
algorithms are carried out step by step using two synthetic
datasets generated in order i) to see their performance on
datasets having different characteristics, ii) to understand
4.2 Case1: Database of Long Patterns with Low 16000 1600000
Matrix Apriori 1400000 Matrix Apriori
14000
Diversity of Items FP-Growth 1200000 FP-Growth
12000
10000 1000000
A database is generated for having long patterns and low 8000 800000
diversity of items where number of items=10000, number 6000 600000

time(ms)
4000 400000

time(ms)
of transactions=30000, average size of transactions=20, 2000 200000
average size of patterns=10. Number of frequent items is 0 0
given in Figure 3a and number of frequent itemsets is
given in Figure 3b while minimum support value is
varied. It is clear that decrease in minimum support minimum support (%) minimum support (%)
increases the number of frequent items from 16 to 240 and (a) (b)
the number of frequent itemsets from 1014 to 198048. Figure 5. (a)First phase performance for Case1 (b)
Second phase performance for Case1
300 250000
250 200000 The second phase of evaluation is finding frequent
item count itemset count
200 itemsets. As displayed in Figure 5b Matrix Apriori is
150000
150 faster at minimum support values below 10%. Although at
100000
100 10% threshold, FP-Growth is 20% faster, Matrix Apriori
count

50000
count

50 is 240% faster at 2,5% threshold. As its expected,


0 0 performance of second phases are related to number of
frequent itemsets (see Figure 3b).
Our first case study showed that Marix Apriori performed
minimum support (%) minimum support (%)
better with decreasing threshold values for given database.
(a) (b)
Figure 3. (a) Number of frequent items, (b) Number of 4.3 Case2: Database of Short Patterns with High
frequent itemsets for Case1 Diversity of Items

The total performance of Matrix Apriori and FP-Growth A database is generated for short patterns and high
is demonstrated in Figure 4. It is seen that their diversity of items using the parameters where number of
performance is identical for minimum support values items=30000, number of transaction=30000, average size
above 7,5%. On the other hand below 7,5% minimum of transactions=20, average size of patterns=5. The
support value Matrix Apriori performs clearly better such change of frequent items and itemsets count is given in
that at 2,5% threshold it is 230% faster. Figure 6a and Figure 6b consecutively. Frequent items
found changes from 58 to 127 and frequent itemsets found
1600000
1400000 Matrix Apriori changes from 254 to 71553 with decreasing minimum
1200000 FP-Growth support values.
1000000
800000 140 80000
600000 120 70000
400000 itemset count
100 60000
time(ms)

200000 50000
80
0 40000
60
item count 30000
40 20000
20 10000
minimum support (%)
count

count

0 0
Figure 4. Total performance for Case1

The reason of FP-Growths falling behind at total minimum support (%) minimum support (%)
performance can be understood by looking at the (a) (b)
performance of phases of evaluation. First phase Figure 6. a) Number of frequent items, (b) Number of
performances of algorithms demonstrated in Figure 5a frequent itemsets for Case2
showed that building matrix data structure of Matrix
Apriori needs 20% to 177% more time compared to The total performance of both algorithms is given in
building tree data structure of FP-Growth. First phase of Figure 7. Increase in minimum support decreases runtime
Matrix Apriori shows similar pattern with the number of for both algorithms. For minimum support values 12,5%
frequent items demonstrated in Figure 3a. and 15% FP-Growth performed faster by up to 56%.
However, for lower minimum support values Matrix
Apriori performed better up to 150%.
400000 faster leading to better total performance of Matrix
350000 Matrix Apriori
FP-Growth Apriori.
300000
250000 Our second case study is performed on a database of short
200000 patterns with high diversity of items. It is seen that at
150000 12,5%-15% minimum support values, performances of
100000
time(ms)
50000
both algorithms are close. However, below 12,5% value,
0 the performance gap between the algorithms becomes
larger in favor of Matrix Apriori. It is seen that the
impacts of having more items and less average pattern
minimum support (%) length caused both algorithms to have more runtime
Figure 7. Total performance for Case2 values compared to first case study. At 15% at first case
study 1014 itemsets are found in 1031-1078 ms however
First phase performance of algorithms is demonstrated in at second case study 254 itemsets are found in 12172-
Figure 7a. FP-Growth is observed to have better first 19030 ms. In addition, for all threshold values first phase
phase performance. runtime values are higher in second case study.
Common points in both case studies are i) Matrix Apriori
30000 350000 is faster at finding itemset phase compared to FP-Growth
Matrix Apriori
25000 300000
FP-Growth
and slower at building data structure phase, ii) for
20000 250000 threshold values below 10% Matrix Apriori is more
200000
15000 efficient by up to 230%, iii) first phase performance of
150000
10000 100000
Matrix Apriori is correlated with number of frequent
Matrix Apriori items, iv) second phase performance of FP-Growth is
time(ms)

5000
time(ms)

50000
FP-Growth
0 0 correlated with number of frequent itemsets.

minimum support (%) minimum support (%) 5. Conclusion


(a) (b)
Figure 7. (a)First phase performance for Case2 (b) In this paper, we benchmark and explain the FP-Growth
Second phase performance for Case2 and the Matrix Apriori association rule mining algorithms
that work without candidate generation. Since the
The second phase evaluation of algorithms as it is given in characteristics of data repositories of different domains
Figure 7b shows that Matrix Apriori performed better at vary, each algorithm is analyzed using two different
all threshold values and the performance gap increases synthetic databases consisting of different characteristics,
with decreasing threshold. This difference varies between i.e., one database has long patterns with a low diversity of
71% and 185%. Second phase performances of algorithms items and the other database has short patterns with a high
are related to number of frequent itemsets found like it diversity of items.
was in first case study. Our case studies indicate that the performances of the
algorithms are related to the characteristics of the given
4.4 Discussion on Results data set and the minimum support threshold applied.
When the performances of the various algorithms are
In this section, we analyzed the performance of FP- considered, we noticed that in constructing a matrix data
Growth and Matrix Apriori algorithms phase by phase, structure, the Matrix Apriori takes more time in
when minimum support threshold is changed. Two comparison to constructing the tree structure for the FP-
databases with different characteristics are used for our Growth. On the other hand, during finding itemsets phase
case studies. In both case studies, performances of we discovered that the matrix data structure is
algorithms are observed between minimum support values considerably faster than the FP-Growth at finding
of 2,5% and 15%. frequent itemsets--thus retrieving and presenting the
First case study is carried out on a database of long results in a more efficient manner.
patterns with low diversity of items. It is seen that at 10%- We conclude, based on each of the case study that the
15% minimum support values, performances of both Matrix Apriori algorithm performs better as the threshold
algorithms are close. However, below 10% value, the decreases compared to FP-Growth, indicating that the
performance gap between the algorithms becomes larger former provides a better overall performance. To enhance
in favor of Matrix Apriori. Another point is that first our current findings, as our next step we plan to conduct
phase of Matrix Apriori is affected from minimum a study that will help us to propose a new association rule
support change more than FP-Growth. This is a result of mining algorithm that combines the strengths of both the
increase in frequent items count. This increment affects Matrix Apriori and the FP-Growth algorithms.
building data structure step of Matrix Apriori
dramatically. On the other hand, matrix data structure is
References [16] A. Omari, R. Langer & S. Conrad, TARtool: A
Temporal Dataset Generator for Market Basket Analysis.
[1] S. Kotsiantis, D. Kanellopoulos, Association Rules In Proceedings of the 4th international Conference on
Mining: A Recent Overview, International Transactions Advanced Data Mining and Applications, Chengdu,
on Computer Science and Engineering, 32(1), 2006, 71- China, 2008, 400-410.
82. [17] L. Cristofor, ARtool Project. University of
[2] M.H. Dunham, Data Mining Introductory and Massachusetts, Boston, available at
Advanced Topics (New Jersey Pearson Education, 2003). http://www.cs.umb.edu/~laur/ARtool/
[3] R. Agrawal, T. Imieliski & A. Swami, Mining
Association Rules Between Sets of Items in Large
Databases, ACM SIGMOD Record, 22(2), 1993, 207-216
[4] R. Agrawal & R. Srikant, Fast Algorithms for Mining
Association Rules in Large Databases. In Proceedings of
the 20th international Conference on Very Large Data
Bases, San Francisco, CA, United States, 1994, 487-499
[5] M. Kantardzic, Data Mining Concepts, Models,
Methods, and Algorithms (New Jersey, IEEE Press,
2003).
[6] A. Savasere, E. Omiecinski & S.B. Navathe, An
Efficient Algorithm for Mining Association Rules in
Large Databases. In Proceedings of the 21th international
Conference on Very Large Data Bases, San Francisco,
CA, United States, 1995, 432-444
[7] H. Toivonen, Sampling Large Databases for
Association Rules. In Proceedings of the 22th
international Conference on Very Large Data Bases, San
Francisco, CA, United States, 1996, 134-145
[8] J. Han, J. Pei & Y. Yin, Mining Frequent Patterns
without Candidate Generation, ACM SIGMOD Record,
29(2), 2000, 1-12
[9] J. Pavn, S. Viana & S. Gmez, Matrix Apriori:
Speeding up the Search for Frequent Patterns. In
Proceedings of the 24th IASTED international
Conference on Database and Applications, Innsbruck,
Austria, 2006, 75-82
[10] N. Duru, An Application of Apriori Algorithm on a
Diabetic Database. In Proceedings of the 9th International
Conference Knowledge-Based Intelligent Information and
Engineering Systems, Melbourne, Australia, 2005, 398-
404
[11] R. Grossman, S. Kasif, R. Moore, D. Rocke & J.
Ullman, Data Mining Research: Opportunities and
Challenges. A Report of three NSF Workshops on Mining
Large, Massive, and Distributed Data, available at
http://www.rgrossman.com/dl/misc-001.pdf, 1998.
[12] X. Wu, V. Kumar & J. Ross Quinlan, Top 10
Algorithms in Data Mining, Knowl. Inf. Syst. 14(1), 2007,
1-37
[13] J. Han, M. Kamber, Data Mining Concepts and
Techniques (San Diego, Academic Press, 2001).
[14] J. Hipp, U. Gntzer & G. Nakhaeizadeh, Algorithms
for Association Rule Mining A General Survey and
Comparison. SIGKDD Explor. Newsl. 2(1), 2000, 58-64.
[15] Z. Zheng, R. Kohavi & L. Mason, Real World
Performance of Association Rule Algorithms. In
Proceedings of the Seventh ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
San Francisco, California, United States, 2001, 401-406.

View publication stats

You might also like