0% found this document useful (0 votes)

31 views41 pages

1 Introduction

Uploaded by

sasank1613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views41 pages

1 Introduction

Uploaded by

sasank1613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

15CSE401 Machine Learning and Data Mining 1

15CSE401 Machine Learning and Data

Mining
Lecture 1,2,3
Course Information
An Introduction to Data Mining

Nalinadevi Kadiresan
CSE Dept.

Amrita School of Engg .

July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 2

Course Schedule
● Course code: 15CSE401
● Title: Machine Learning and Data Mining
● Semester: 7
● Batch : CSE- C
● Slots:
○ Tuesday : Slot -1 (9-10 )
○ Tuesday: Slot - 9 (18-19) (Discussion)
○ Wednesday: Slot -2 (10-11)
○ Saturday: Slot-4 (12- 13)

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 3

Course Objective
• To provide in-depth knowledge about data mining.

• To implement the machine learning models in

data mining problems .

• To improve the understanding of the on-going

research.

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 4

Course Outcome
Course Outcome BTL
CO1 Understand the fundamental concepts of Data mining and L2
basic theory underlying Machine learning.

CO2 Understand the types of the data to be mined and apply L3

pre-processing methods

CO3 Apply appropriate classification and clustering techniques L3

for real word applications

CO4 Analyze the performance of various classifiers and L4

clusters techniques

CO5 Apply and evaluate the interesting patterns discovered L4

from association mining

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 5

Course Syllabus
Unit 1: Introduction to Machine learning: Supervised learning, Unsupervised
learning, some basic concepts in machine learning, Review of probability,
Computational Learning theory. Bayesian concept learning, Likelihood, Posterior
predictive distribution, Naive Bayes classiﬁers, The log-sum-exp trick, Feature
selection using mutual information, Linear Regression, Logistic regression.
Unit 2: Introduction to data mining - challenges and tasks, measures of
similarity and dissimilarity, Classification - Rule based classifier, Nearest-
neighbour classifiers - Bayesian classifiers - decision trees; support vector
machines, Class imbalance problem performance evaluation of the classifier,
comparison of different classifiers.
Unit 3: Association analysis – frequent item generation rule generation,
evaluation of association patterns. Cluster analysis, K means algorithm, cluster
evaluation, application of data mining to web mining and Bioinformatics.
Classifying documents using bag of words advertising on the Web,
Recommendation Systems, and Mining Social network graphs.
The topics in red was covered in 15CSE432.
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 6

Text Books and References

1. Jiawei Han and Micheline
Kamber, Jian Pei, “Data
Mining: Concepts and
Techniques”, Third Edition,
Elsevier, 2012.
2. Kevin P. Murphey, “Machine
Learning, a probabilistic
perspective”, The MIT Press,
2012.
3. Tom Mitchell, “Machine Learning”,
McGraw Hill, 1997
4. Pang-Ning Tan, Michael Steinbach
and Vipin Kumar, “Introduction to
Data Mining”, First Edition,
Pearson Education, 2006.

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 7

Modified Course Content

● Data mining- Introduction, tasks ● Classification
and challenges o Random Forest
● Similarity and dissimilarity metrics o Bagging and Boosting
● Statistical concepts ● Clustering
o Distributions, P-value statistics o DBSCAN
● Association Rule mining o Fuzzy clustering
o Apriori (frequent itemset o Hierarchical clustering
generation & test) o Cluster evaluation
o Projection-based (FP-growth) ● Applications
o Vertical format approach o Recommender systems
(ECLAT) o Web mining
o Evaluation of Association Rule o Bioinformatics
Mining o Classifying documents,
o Mining social network
graphs
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 8

Evaluation Pattern
20 week semester Marks Weight BTL
Knowledge,
Quizzes: Min 1 quiz per 20 Comprehension
week
Internal: 70% Assignments: Minimum 30 70% Application,
1 assignment per unit Analysis

Project: 1 project per Synthesis,

20
semester Evaluation

Online End Sem exam

End Semester: 15
(15%)
30% 30%
Viva 15%(Mandatory) 15
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 9

Course Delivery
● Online classes (MS Teams)
○ Live lectures 20 to 30 minutes
○ Discussion
○ Quiz
● Tutorial/Discussion hour (MS Teams)
● Weekly Quizzes (AMPLE / AUMS)
● Assignments ( Problems and
programming)
● Case study (Group)
● Course Repository - AMPLE

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 10

General Comments

● Keep a separate course note book

● Active participation expected – In

class, Chats and discussion forums

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 11

Motivation: Why Data Mining??

• Data explosion problem

– Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories

• We are drowning in data, but starving for knowledge!

• Solution: Data warehousing and data mining
 Data warehousing and on-line analytical processing
 Extraction of interesting knowledge

(rules, regularities, patterns, constraints) from data in large database s

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 12

Why Mine Data? Commercial

Viewpoint
• Lots of data is being collected
and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card
transactions
• Computers have become cheaper and more powerful
• Competitive Pressure is Strong
– Provide better, customized services for an edge
(e.g. in Customer Relationship Management)

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 13

Why Mine Data? Scientific

Viewpoint
• Data collected and stored at
enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene
expression data
 scientific simulations
generating terabytes of data
• Traditional techniques infeasible for
raw data
• Data mining may help scientists
 in classifying and segmenting data
 in Hypothesis Formation
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 14

Examples

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 15

Examples

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 16

Examples

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 17

Examples

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 18

What is common in all??

● We are living in ‘BIG DATA’ age

○ Data is wealth
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 19

Review of Lecture-1

• To derive knowledge from raw data

for decision making in various
business and scientific applications.
• The knowledge inference is
challenging due to the nature of data.
– That is, Volume, Variety, Velocity, and
Veracity – Big Data

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 20

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 21

What is Data Mining???

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 22

What Is Data Mining?

• Data mining (knowledge discovery from data):

– Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data.

• Alternative names and their “inside stories”:

– Data mining: a misnomer?
 Knowledge Discovery(mining) from Data(KDD)
 Knowledge Extraction
 Data/Pattern Analysis
 Data Archeology
 Business Intelligence

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 23

Data Mining Definition

• Finding hidden information in a

database
• Fit data to a model
• Similar terms
Exploratory data analysis
Data driven discovery
Deductive learning

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 24

Applications of Data Mining

• Data analysis and decision support
– Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, market segmentation
– Fraud detection and detection of unusual patterns (outliers)
– Risk analysis and management
• Forecasting, customer retention, quality control, competitive
analysis
• Other Applications
– Text mining (news group, email, documents) and Web mining
– Stream data mining
– Bioinformatics and bio-data analysis
July 2020 Nalinadevi Kadiresan
15CSE401 Machine Learning and Data Mining 25

Market Analysis and Management

• Where does the data come from?
– Credit card transactions, discount coupons, customer
complaint calls
• Target marketing
– Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits,
etc.
– Determine customer purchasing patterns over time

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 26

Market Analysis and Management

• Cross-market analysis
– Associations/co-relations between product sales, &
prediction based on such association
• Customer profiling
– What types of customers buy what products

• Customer requirement analysis

– Identifying the best products for different customers
– Predict what factors will attract new customers

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 27

Examples: What is (not) Data

Mining?
 What is not Data Mining?  What is Data Mining?

– Look up phone number in - Certain names are more

phone directory prevalent in certain US
locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)

– Group together similar

- Query a Web search
documents returned by search
engine for information about
engine according to their
“Amazon” context (e.g. Amazon rainforest,
Amazon.com,)

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 28

Database Processing vs. Data Mining Processing

• Query • Query
– Well defined – Poorly defined
– SQL – No precise query
 Data  Data
– Operational data language
– Not operational data (Analytical Data)

 Output  Output
– Precise – Fuzzy
– Subset of database – Not a subset of database

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 29

Query Examples
• Database
– Find all credit applicants with last name of Smith.
– Identify customers who have purchased more than $10,000 in the
last month.
– Find all customers who have purchased milk

• Data Mining
– credit applicants who are poor credit risks. (classification)
– Identify customers with similar buying habits. (Clustering)
– Find all items which are frequently purchased with milk.
(association rules)

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 30

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 31

Knowledge Discovery
(KDD) Process Pattern Evaluation

• This is a view from

typical database
systems and data Data Mining
warehousing
Task-relevant Data
communities
• Data mining plays an
essential role in the Data Warehouse Selection
knowledge discovery
process
Data Cleaning

Data Integration

July 2020 Nalinadevi Kadiresan

Databases
15CSE401 Machine Learning and Data Mining 32

Knowledge Discovery (KDD)

Process
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from
the database)
4. Data transformation (where data are transformed and consolidated into
forms appropriate for mining by performing summary or aggregation
operations)4
5. Data mining (an essential process where intelligent methods are applied to
extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing
knowledge based on interestingness measure)
7. Knowledge presentation (where visualization and knowledge
representation techniques are used to present mined knowledge to users)

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 33

Typical framework of a data

warehouse

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 34

Architecture of a Data mining system

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining 35

What Kinds of Patterns can be Mined?

• Data mining functionalities are used to specify the kinds of
patterns to be found in data mining tasks.

• Data mining functionalities

 Kinds of databases to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted

• Data mining tasks

 Descriptive data mining - characterize properties of the data in a target data set

 Predictive data mining - perform induction on the current data in order to make predictions

July 2020 Nalinadevi Kadiresan

15CSE401 Machine Learning and Data Mining

Data Mining Functionalities

• Databases to be mined
– Relational, transactional, object-oriented, object-relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy, WWW,
etc.
• Knowledge to be mined
– Characterization, discrimination, association, classification, clustering,
trend, deviation and outlier analysis, etc.
– Multiple/integrated functions and mining at multiple levels
• Techniques utilized
– Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, neural network, etc.
• Applications adapted
– Retail, telecommunication, banking, fraud analysis, DNA mining, stock market
analysis, Web mining, Weblog analysis, etc.
15CSE401 Machine Learning and Data Mining

Data Mining Tasks

• Description Tasks
– Find human-interpretable patterns that describe the data
• Prediction Tasks
– Use some variables to predict unknown or future values
of other variables

Common data mining tasks

 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
15CSE401 Machine Learning and Data Mining

Data Mining Models and Tasks

15CSE401 Machine Learning and Data Mining
15CSE401 Machine Learning and Data Mining

KDD Process: A View from ML and

Statistics

Input Data Data Pre- Data Post-

Processin Processing
g
Minin
g

Data integration Pattern Pattern evaluation

discovery
Normalization Pattern selection
Classification
Feature Pattern
selection Clustering interpretation
Dimension Outlier analysis Pattern
reduction ………… visualization

• This is a view from typical machine

learning and statistics communities
15CSE401 Machine Learning and Data Mining 41

Major Issues in Data Mining

• Mining Methodology: includes mining various

– knowledge kinds,
– multidimensional data,
– exploring domain specific mining,
– data with uncertainty, noise, and incompleteness,
– user-constraint guided mining.

July 2020 Nalinadevi Kadiresan

CCS415-CCT416 Course Outline
No ratings yet
CCS415-CCT416 Course Outline
3 pages
16CS63: Machine Learning
No ratings yet
16CS63: Machine Learning
93 pages
UCSC Data Mining Course Overview
No ratings yet
UCSC Data Mining Course Overview
25 pages
Data Mining & BI Course Guide
No ratings yet
Data Mining & BI Course Guide
25 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Chapter1 Introduction
No ratings yet
Chapter1 Introduction
92 pages
Data Mining Course Overview and Syllabus
No ratings yet
Data Mining Course Overview and Syllabus
129 pages
SE 458 - Data Mining (DM) : Spring 2019 Section W1
No ratings yet
SE 458 - Data Mining (DM) : Spring 2019 Section W1
17 pages
Data Mining Concepts and Techniques - Han, Kamber & Pei
No ratings yet
Data Mining Concepts and Techniques - Han, Kamber & Pei
953 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
61 pages
Data Mining Chapter 1 Notes
100% (1)
Data Mining Chapter 1 Notes
40 pages
01 Intro
No ratings yet
01 Intro
45 pages
Unit 1
No ratings yet
Unit 1
148 pages
Intro to Data Mining Course
No ratings yet
Intro to Data Mining Course
56 pages
DE Unit1 - Introdcution - DE - 8jul24
No ratings yet
DE Unit1 - Introdcution - DE - 8jul24
56 pages
Lec Slides Combined Mid Quiz With Old Quizzes
No ratings yet
Lec Slides Combined Mid Quiz With Old Quizzes
378 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
Week 1A - Overview and Introduction of Data Mining
No ratings yet
Week 1A - Overview and Introduction of Data Mining
41 pages
1-Data Mining and Applications
No ratings yet
1-Data Mining and Applications
70 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
49 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
DM Day1 Intro MS F24
No ratings yet
DM Day1 Intro MS F24
111 pages
Data Mining: V Mounika Revathi Dept of Cse Sitam
No ratings yet
Data Mining: V Mounika Revathi Dept of Cse Sitam
13 pages
Data Mining: Concepts and Challenges
No ratings yet
Data Mining: Concepts and Challenges
32 pages
Unit 1
No ratings yet
Unit 1
59 pages
Unit 2
No ratings yet
Unit 2
67 pages
Data Mining and Analytics Course Overview
No ratings yet
Data Mining and Analytics Course Overview
2 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
DM Lecture 1 Introudction and Policies
No ratings yet
DM Lecture 1 Introudction and Policies
17 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
Data Mining and Knowledge Management
No ratings yet
Data Mining and Knowledge Management
9 pages
Data Mining-Exams
100% (2)
Data Mining-Exams
3 pages
Major Challenges in Data Mining
No ratings yet
Major Challenges in Data Mining
2 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
Data Mining & Warehousing - S. Prabhu
No ratings yet
Data Mining & Warehousing - S. Prabhu
144 pages
A Comprehensive Guide Through The Italian Database Research Over The Last 25 Years Flesca Download
100% (1)
A Comprehensive Guide Through The Italian Database Research Over The Last 25 Years Flesca Download
84 pages
Rapidminer 4.6 Tutorial
100% (1)
Rapidminer 4.6 Tutorial
695 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
01 Unit1
No ratings yet
01 Unit1
13 pages
Unit 2
No ratings yet
Unit 2
60 pages
2009 - Applying Cluster Analysis To Build A Patient-Centric Healthcare Service Strategy For Elderly
No ratings yet
2009 - Applying Cluster Analysis To Build A Patient-Centric Healthcare Service Strategy For Elderly
16 pages
Data Mining: Concepts and Challenges
100% (1)
Data Mining: Concepts and Challenges
24 pages
DSPM Notes
No ratings yet
DSPM Notes
21 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Data Mining Versus Knowledge Discovery I
No ratings yet
Data Mining Versus Knowledge Discovery I
3 pages
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
No ratings yet
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
46 pages
t107 Icalt140 End PDF
No ratings yet
t107 Icalt140 End PDF
6 pages
Eit-505 Iscl Unit-4 Notes
No ratings yet
Eit-505 Iscl Unit-4 Notes
10 pages
Knowledge Management Systems Overview
100% (2)
Knowledge Management Systems Overview
24 pages
Data Mining - Intro
No ratings yet
Data Mining - Intro
17 pages
Business Intelligence and Data Warehousing-Merged
No ratings yet
Business Intelligence and Data Warehousing-Merged
401 pages
FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
190329-KM, Add Material - Turban Dss 9e ch11 - Knowlwdge - Management
No ratings yet
190329-KM, Add Material - Turban Dss 9e ch11 - Knowlwdge - Management
59 pages
Handbook of Statistics 24 Data Mining and Data Visualization C.R. Rao 2024 Scribd Download
100% (24)
Handbook of Statistics 24 Data Mining and Data Visualization C.R. Rao 2024 Scribd Download
84 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
35 pages
基于大语言模型的小麦纹枯病防治知识图谱自动构建研究刘珂艺
No ratings yet
基于大语言模型的小麦纹枯病防治知识图谱自动构建研究刘珂艺
137 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Data Mining Dissertation Help & Examples
100% (2)
Data Mining Dissertation Help & Examples
4 pages