0% found this document useful (0 votes)
9 views23 pages

Lecture 2.3.1 2.3.2

The document outlines the course objectives and outcomes for a Data Mining and Warehousing course, focusing on mining single-dimensional Boolean association rules using the Apriori algorithm. It covers key concepts, methods for analyzing data, and the application of various data mining techniques. Additionally, it includes a detailed syllabus, examples of association rule mining, and references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Lecture 2.3.1 2.3.2

The document outlines the course objectives and outcomes for a Data Mining and Warehousing course, focusing on mining single-dimensional Boolean association rules using the Apriori algorithm. It covers key concepts, methods for analyzing data, and the application of various data mining techniques. Additionally, it includes a detailed syllabus, examples of association rule mining, and references for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

APEX INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Data Mining and Warehousing (22CSH-380)


Faculty: Dr. Preeti Khera (E16576)

Lecture – 2.3.1 & 2.3.2


Mining Single-Dimensional Boolean Association DISCOVER . LEARN . EMPOWER
rules from Transactional Databases – Apriori
Algorithm

June 4, 2025 1
Data Mining and Warehousing : Course Objectives

COURSE OBJECTIVES
The Course aims to:

1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.

June 4, 2025 2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-

Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.

Demonstrate the strengths and weaknesses of different methods of


CO2
meaningful data mining.

Apply association rule, classification, and clustering algorithms


CO3
for large data sets.

Evaluate and employ correct data mining techniques depending on


CO4
characteristics of the dataset.
Verify and formulate the performance of various data mining
CO5
techniques according to the dataset.

June 4, 2025 3
Unit-2 Syllabus

Unit-2
Concept Description: Definition, Data Generalization, Analytical Characterization,
Analysis of attribute relevance, Mining Class comparisons, Statistical measures in large
Databases. Measuring Central Tendency, Measuring Dispersion of Data, Graph Displays
of Basic Statistical class Description, Mining Association Rules in Large Databases,
Association rule mining, mining Single-Dimensional Boolean Association rules from
Transactional Databases – Apriori Algorithm, Mining Multilevel Association rules from
Transaction Databases and Mining Multi- Dimensional Association rules from Relational
Databases.

June 4, 2025 4
Table of Content
• Mining Association Rules in Large Databases
• Association rule mining
• Mining Single-Dimensional Boolean Association rules from Transactional
Databases
• Apriori Algorithm

June 4, 2025 5
Association Mining
• Association rule mining:
• Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
• Applications:
• Basket data analysis, cross-marketing, catalog design, loss-
leader analysis, clustering, classification, etc.
• Examples.
• Rule form: “Body ® Head [support, confidence]”.
• buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]
• major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]
Association Rules: Basic Concepts
• Given: (1) database of transactions, (2) each transaction
is a list of items (purchased by a customer in a visit)
• Find: all rules that correlate the presence of one set of
items with that of another set of items
• E.g., 98% of people who purchase tires and auto accessories
also get automotive services done
• Applications
• *  Maintenance Agreement (What the store should do to
boost Maintenance Agreement sales)
• Home Electronics  * (What other products should the store
stocks up?)
• Attached mailing in direct marketing
• Detecting “ping-pong”ing of patients, faulty “collisions”
Interestingness Measures: Support and Confidence

Customer
Customer • Find all the rules X & Y  Z with
buys both
buys diaper minimum confidence and support
• support, s, probability that a
transaction contains {X  Y  Z}
• confidence, c, conditional
probability that a transaction having
Customer
buys beer
{X  Y} also contains Z

Transaction ID Items Bought Let minimum support 50%,


2000 A,B,C and minimum confidence
1000 A,C 50%, we have
4000 A,D – A  C (50%, 66.6%)
5000 B,E,F
– C  A (50%, 100%)
Association Rule Mining: A Road Map
• Boolean vs. quantitative associations (Based on the types of values
handled)
• buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%,
60%]
• age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%]
• Single dimension vs. multiple dimensional associations (each
distinct predicate of a rule is a dimension)
• Single level vs. multiple-level analysis (consider multiple levels of
abstraction)
• What brands of beers are associated with what brands of diapers?
• Extensions
• Correlation, causality analysis
Association does not necessarily imply correlation or causality

• Maxpatterns (a frequent pattern s.t. any proper subpattern is not frequent) and
closed itemsets (if there exist no proper superset c’ of c s.t. any transaction
containing c also contains c’)
Mining Association Rules-An Example

Transaction ID Items Bought Min. support 50%


2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C: {A,C} 50%
support = support({A C}) = 50%
confidence = support({A C})/support({A}) = 66.6%
The Apriori principle:
Any subset of a frequent itemset must be frequent
Mining Frequent Itemsets

• Find the frequent itemsets: the sets of items that


have minimum support
• A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
• Iteratively find frequent itemsets with cardinality from 1 to
k (k-itemset)
• Use the frequent itemsets to generate association
rules.
The Apriori Algorithm: Basic idea
• Join Step: C is generated by joining L with itself
k k-1

• Prune Step: Any (k-1)-itemset that is not frequent cannot


be a subset of a frequent k-itemset
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2
itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Candidate Generation
• Suppose the items in Lk-1 are listed in an order
• Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
To Count Supports of Candidates

• Counting supports of candidates a problem:


• The total number of candidates can be huge
• Each transaction may contain many candidates
• Method:
• Candidate itemsets are stored in a hash-tree
• Leaf node of hash-tree contains a list of itemsets and
counts
• Interior node contains a hash table
• Subset function: finds all the candidates contained in a
transaction
Example of Generating Candidates

• L3={abc, abd, acd, ace, bcd}

• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace

• Pruning:
• acde is removed because ade is not in L3

• C4={abcd}
Improving Apriori’s Efficiency
• Hash-based itemset counting: A k-itemset whose corresponding hashing
bucket count is below the threshold cannot be frequent

• Transaction reduction: A transaction that does not contain any frequent k-


itemset is useless in subsequent scans

• Partitioning: Any itemset that is potentially frequent in DB must be frequent


in at least one of the partitions of DB

• Sampling: mining on a subset of given data, need a lower support threshold


+ a method to determine the completeness

• Dynamic itemset counting: add new candidate itemsets immediately


(unlike Apriori) when all of their subsets are estimated to be frequent
Is Apriori Fast Enough - Performance
Bottlenecks
• The core of the Apriori algorithm:
• Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets
• Use database scan and pattern matching to collect counts for the
candidate itemsets
• The bottleneck of Apriori: candidate generation
• Huge candidate sets:
• 104 frequent 1-itemset will generate 107 candidate 2-itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one
needs to generate 2100  1030 candidates.
• Multiple scans of database:
• Needs (n +1 ) scans, n is the length of the longest pattern
Summary
• Association mining
• Association rules
• Mining frequent itemsets
• Apriori Algorithm
• Improving Apriori’s Efficiency

19
Assignment
• Discuss the concept of frequent item sets.
• Discuss the methods for improving Apriori’s Efficiency.
• Illustrate the steps of apriori algorithm with example.

20
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.

REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.

JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 21
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
 Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
 Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
 Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer
Science and Information Technologies. 2014 Feb;5(1):927-30.
 Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
 Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.

• WEB LINK
http://www.dataminingzone.weebly.com/uploads/6/5/9/4/6594749/ch14_min_assoc_rules.pdf

• VIDEO LINK
https://youtu.be/m5c27rQtD2E 22
THANK YOU

For queries
Email: [email protected]

You might also like