Lecture 2.3.1 2.3.2
Lecture 2.3.1 2.3.2
June 4, 2025 1
Data Mining and Warehousing : Course Objectives
COURSE OBJECTIVES
The Course aims to:
1. Develop understanding key concepts of data mining and obtain knowledge about
how to extract useful characteristics from data using data pre-processing techniques.
2. Demonstrate methods to apply and analyze relevant attributes, perform statistical
measure to look for meaningful variation in data, and mine association rules for
transactional datasets.
3. Teach use and application of data mining techniques such as classification, decision
tree, neural networks, back propagation and many more, in various applications.
June 4, 2025 2
COURSE OUTCOMES
On completion of this course, the students shall be able to:-
Understand the concept of Data mining and usage of various tools for
CO1
data warehousing and data mining.
June 4, 2025 3
Unit-2 Syllabus
Unit-2
Concept Description: Definition, Data Generalization, Analytical Characterization,
Analysis of attribute relevance, Mining Class comparisons, Statistical measures in large
Databases. Measuring Central Tendency, Measuring Dispersion of Data, Graph Displays
of Basic Statistical class Description, Mining Association Rules in Large Databases,
Association rule mining, mining Single-Dimensional Boolean Association rules from
Transactional Databases – Apriori Algorithm, Mining Multilevel Association rules from
Transaction Databases and Mining Multi- Dimensional Association rules from Relational
Databases.
June 4, 2025 4
Table of Content
• Mining Association Rules in Large Databases
• Association rule mining
• Mining Single-Dimensional Boolean Association rules from Transactional
Databases
• Apriori Algorithm
June 4, 2025 5
Association Mining
• Association rule mining:
• Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
• Applications:
• Basket data analysis, cross-marketing, catalog design, loss-
leader analysis, clustering, classification, etc.
• Examples.
• Rule form: “Body ® Head [support, confidence]”.
• buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]
• major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]
Association Rules: Basic Concepts
• Given: (1) database of transactions, (2) each transaction
is a list of items (purchased by a customer in a visit)
• Find: all rules that correlate the presence of one set of
items with that of another set of items
• E.g., 98% of people who purchase tires and auto accessories
also get automotive services done
• Applications
• * Maintenance Agreement (What the store should do to
boost Maintenance Agreement sales)
• Home Electronics * (What other products should the store
stocks up?)
• Attached mailing in direct marketing
• Detecting “ping-pong”ing of patients, faulty “collisions”
Interestingness Measures: Support and Confidence
Customer
Customer • Find all the rules X & Y Z with
buys both
buys diaper minimum confidence and support
• support, s, probability that a
transaction contains {X Y Z}
• confidence, c, conditional
probability that a transaction having
Customer
buys beer
{X Y} also contains Z
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2
itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Candidate Generation
• Suppose the items in Lk-1 are listed in an order
• Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
To Count Supports of Candidates
• Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
• Pruning:
• acde is removed because ade is not in L3
• C4={abcd}
Improving Apriori’s Efficiency
• Hash-based itemset counting: A k-itemset whose corresponding hashing
bucket count is below the threshold cannot be frequent
19
Assignment
• Discuss the concept of frequent item sets.
• Discuss the methods for improving Apriori’s Efficiency.
• Illustrate the steps of apriori algorithm with example.
20
References
TEXT BOOKS
T1: Tan, Steinbach and Vipin Kumar. Introduction to Data Mining, Pearson Education, 2016.
T2: Zaki MJ, Meira Jr W, Meira W. Data mining and machine learning: Fundamental concepts and algorithms.
Cambridge University Press; 2020 Jan 30.
T3: King RS. Cluster analysis and data mining: An introduction. Mercury Learning and Information; 2015 May
12.
REFERENCE BOOKS
R1: Pei, Han and Kamber. Data Mining: Concepts and Techniques, Elsevier, 2011.
R2: Halgamuge SK, Wang L, editors. Classification and clustering for knowledge discovery. Springer Science
& Business Media; 2005 Sep 2.
R3: Bhatia P. Data mining and data warehousing: principles and practical techniques. Cambridge University
Press; 2019 Jun 27.
JOURNALS
• https://www.igi-global.com/journal/international-journal-data-warehousing-mining/1085
• https://www.springer.com/journal/41060 21
• https://link.springer.com/journal/10618
References
RESEARCH PAPER
Alasadi SA, Bhaya WS. Review of data preprocessing techniques in data mining. Journal of Engineering and Applied
Sciences. 2017 Sep;12(16):4102-7.
Freitas AA. A survey of evolutionary algorithms for data mining and knowledge discovery. InAdvances in evolutionary
computing: theory and applications 2003 Jan 1 (pp. 819-845). Berlin, Heidelberg: Springer Berlin Heidelberg.
Kumbhare TA, Chobe SV. An overview of association rule mining algorithms. International Journal of Computer
Science and Information Technologies. 2014 Feb;5(1):927-30.
Srivastava S. Weka: a tool for data preprocessing, classification, ensemble, clustering and association rule mining.
International Journal of Computer Applications. 2014 Jan 1;88(10).
Dol SM, Jawandhiya PM. Classification technique and its combination with clustering and association rule mining in
educational data mining—A survey. Engineering Applications of Artificial Intelligence. 2023 Jun 1; 122:106071.
• WEB LINK
http://www.dataminingzone.weebly.com/uploads/6/5/9/4/6594749/ch14_min_assoc_rules.pdf
• VIDEO LINK
https://youtu.be/m5c27rQtD2E 22
THANK YOU
For queries
Email: [email protected]