Chapter1 Introduction (Autosaved)
Chapter1 Introduction (Autosaved)
— Chapter 1 —
Overview
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
June 6, 2023 Data Mining: Concepts and Techniques 6
Data Mining and Business Intelligence
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
http://www.datasciencecentral.com/forum/topics/the-3vs-that-
define-big-data
June 6, 2023 Data Mining: Concepts and Techniques 10
Multi-Dimensional View of Data Mining
Data to be mined
Relational, data warehouse, transactional, stream,
object-oriented/relational, active, spatial, time-series, text, multi-media,
heterogeneous, legacy, WWW
Knowledge to be mined
Characterization, association, classification, clustering, trend/deviation,
outlier analysis, etc.
Techniques utilized
Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining, stock
market analysis, text mining, Web mining, etc.
General functionality
Descriptive data mining (Democrat <-> Republican)
Predictive data mining
Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
June 6, 2023 13
Data Mining Functionalities
Multidimensional concept description: Characterization and discrimination
Generalize, summarize, and contrast data characteristics, e.g., human
and monkey?
Good income VS poor?
Frequent patterns, association, correlation
Diaper Beer [0.5%, 75%], Education -> Income
Classification and prediction
Construct models (functions) that describe and distinguish classes or
concepts for future prediction
E.g., classify countries based on (economy), cars based on (gas
mileage), internet news (Google News), product (Amazon)
Predict some stock price, traffic jam
June 6, 2023 14
Data Mining Functionalities (2)
Cluster analysis
Class label is unknown: Group data to form new classes, e.g., cluster
Outlier analysis
Outlier: Data object that does not comply with the general behavior
of the data
Noise or exception? Useful in fraud detection, rare events analysis
June 6, 2023 16
Major Issues in Data Mining
Mining methodology
Mining different kinds of knowledge from diverse data types, e.g., bio, stream,
Web
Performance: efficiency, effectiveness, and scalability
Pattern evaluation: the interestingness problem
Incorporation of background knowledge
Handling noise and incomplete data
Integration of the discovered knowledge with existing one: knowledge fusion
User interaction
Data mining query languages and ad-hoc mining
Expression and visualization of data mining results
Applications and social impacts
Protection of data security, integrity, and privacy
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server
June 6, 2023 18
Summary
June 6, 2023 20
Ex. 1: Market Analysis and Management
Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, plus (public) lifestyle studies
Target marketing
Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.
Determine customer purchasing patterns over time
Cross-market analysis—Find associations/co-relations between product sales, &
predict based on such association
Customer profiling—What types of customers buy what products (clustering or
classification)
Customer requirement analysis
Identify the best products for different groups of customers
Predict what factors will attract new customers
Provision of summary information
Multidimensional summary reports
Statistical summary information (data central tendency and variation)