0% found this document useful (0 votes)
25 views8 pages

3-Data Mining Task Primitives-19-12-2024

Data mining should be an interactive process where users direct what to mine, necessitating the provision of primitives and a query language for effective communication with the system. Key components defining a data mining task include task-relevant data, types of knowledge to be mined, background knowledge, and measurements of pattern interestingness. Visualization of discovered patterns is crucial for understanding and requires different representations based on the type of knowledge and user needs.

Uploaded by

naresh.r2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

3-Data Mining Task Primitives-19-12-2024

Data mining should be an interactive process where users direct what to mine, necessitating the provision of primitives and a query language for effective communication with the system. Key components defining a data mining task include task-relevant data, types of knowledge to be mined, background knowledge, and measurements of pattern interestingness. Visualization of discovered patterns is crucial for understanding and requires different representations based on the type of knowledge and user needs.

Uploaded by

naresh.r2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 8

Why Data Mining Primitives and

Languages?

 Finding all the patterns autonomously in a


database? — unrealistic because the patterns
could be too many but uninteresting
 Data mining should be an interactive process
 User directs what to be mined

 Users must be provided with a set of primitives to


be used to communicate with the data mining
system
 Incorporating these primitives in a data mining
query language
 More flexible user interaction

 Foundation for design of graphical user interface

 Standardization of data mining industry and

practice
SWE2009 Data Mining 1
What Defines a Data Mining
Task ?

 Task-relevant data

 Type of knowledge to be mined

 Background knowledge

 Pattern interestingness measurements

 Visualization of discovered patterns

SWE2009 Data Mining 2


SWE2009 Data Mining 3
Task-Relevant Data (Minable
View)

 Database or data warehouse name

 Database tables or data warehouse cubes

 Condition for data selection

 Relevant attributes or dimensions

 Data grouping criteria

SWE2009 Data Mining 4


Types of knowledge to be mined

 Characterization
 Discrimination
 Association
 Classification/prediction
 Clustering
 Outlier analysis
 Other data mining tasks
SWE2009 Data Mining 5
Background Knowledge: Concept
Hierarchies

 Schema hierarchy

street < city < province_or_state < country
 Set-grouping hierarchy

{20-39} = young, {40-59} = middle_aged
 Operation-derived hierarchy

email address: login-name < department <
university < country
 Rule-based hierarchy

low_profit_margin (X) <= price(X, P1) and
cost (X, P2) and (P1 - P2) < $50

SWE2009 Data Mining 6


Measurements of Pattern
Interestingness
 Simplicity
association rule length, decision tree size
 Certainty
confidence, P(A|B) = n(A and B)/ n (B),
classification reliability or accuracy, certainty
factor, rule strength, rule quality,
discriminating weight
 Utility
potential usefulness, support (association),
noise threshold (description)
 Novelty
not previously known, surprising (used to
remove redundant rules, Canada vs.
Vancouver rule implication support ratio
SWE2009 Data Mining 7
Visualization of Discovered
Patterns

 Different backgrounds/usages may require


different forms of representation
 rules, tables, cross tabs, pie/bar chart
 Concept hierarchy is also important
 Discovered knowledge might be more understandable
when represented at high level of abstraction
 Interactive drill up/down, pivoting, slicing and dicing
provide different perspective to data
 Different kinds of knowledge require different
representation: association, classification,
clustering
SWE2009 Data Mining 8

You might also like