0% found this document useful (0 votes)

75 views67 pages

Week 6 - 7 - Classification

This document provides an overview of classification and decision tree induction in data mining. It discusses classification versus prediction, the two-step classification process of model construction and model usage, issues in classification like data preparation and model evaluation, and decision tree induction algorithms. Decision trees are constructed in a top-down recursive manner by splitting the training data on attributes to create partitions with similar class labels.

Uploaded by

Hussain ASL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views67 pages

Week 6 - 7 - Classification

Uploaded by

Hussain ASL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CIS 517 :Data Mining and

Warehousing
Week 6 & 7: Classification
(Chapter 8)

Instructor :
1
Email : maalnasser@[Link]
1
Lecture Outline

 What is classification? What is

prediction?
 Issues regarding classification and
prediction
 Classification by Decision tree induction
 Rule based Classification.
 Classification accuracy
 Summary
Classification vs. Prediction

 Classification:
 predicts categorical class labels

 classifies data (constructs a model) based on the

training set and the values (class labels) in a
classifying attribute and uses it in classifying new data

 Prediction:
 models continuous-valued functions, i.e., predicts
unknown or missing values
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the
attributes is the class.
 Find a model for class attribute as a function of the
values of other attributes.
 Goal: previously unseen records should be assigned a
class as accurately as possible.
 A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to
build the model and test set used to validate it.
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Classification—A Two-Step Process

 Model construction: describing a set of predetermined

classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction: training set
 The model is represented as classification rules, decision trees,
or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the
classified result from the model
 Accuracy rate is the percentage of test set samples that are
correctly classified by the model
 Test set is independent of training set, otherwise over-fitting
will occur
Classification Process (1):
Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Classification Process (2): Use
the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 3 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Classification Process (1):
Model Construction
Classification Process (2):
Use the Model in Prediction
Supervised vs. Unsupervised
Learning
 Supervised learning (classification)
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues (1): Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
Issues (2):
Evaluating Classification Methods

 Predictive accuracy

 Time to construct the model

 Time to use the model

 Handling noise and missing values

 Decision tree size

 Compactness of classification rules
Classification: Measure the quality

 Usually the Accuracy measure is used:

Classification Techniques

 Decision Tree based Methods

 Rule-based Methods
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines(SVM)
Classification by Decision Tree Induction

 Decision tree
 A flow-chart-like tree structure
 Internal node denotes a test on an attribute
 Branch represents an outcome of the test
 Leaf nodes represent class labels or class distribution
 Decision tree generation consists of two phases
 Tree construction
 At start, all the training examples are at the root
 Partition examples recursively based on selected attributes
 Tree pruning
 Identify and remove branches that reflect noise or outliers
 Use of decision tree: Classifying an unknown sample
 Test the attribute values of the sample against the decision tree
Example: Training Dataset
Output: A Decision Tree for
“buys_computer”
Algorithm for Decision
Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
 There are no samples left
Example of a Decision
Tree ca l
ca l
us
ri ri u o
ego ego tin ss
t t n a
ca ca co cl
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Another Example of
Decision Tree l l
ir ca ir ca ous
o o u
t eg t eg n tin ss
ca ca co cla MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree
Classification Task Tree
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Model
Decisio
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ? n Tree
12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test
Test Data
Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund
10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES
Apply Model to Test
Test Data
Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES
Decision Tree Induction

 Many Algorithms:
 Hunt’s Algorithm (one of the earliest)
 CART
 ID3, C4.5 (J48 on WEKA)
 SLIQ,SPRINT
Decision Tree Induction

 Greedy strategy (Heuristic method)

 Split the records based on an attribute test
that optimizes certain criterion.

 Issues
 Determine how to split the records
 How to specify the attribute test condition?
 How to determine the best split?
 Determine when to stop splitting
Greedy Algorithm : Generate a Decision
Tree
 Input:
 Data partition, D, which is a set of training tuples and
their associated class labels;
 attribute list, the set of candidate attributes;
 Attribute selection method, a procedure to determine the
splitting criterion that “best” partitions the data tuples into
individual classes. This criterion consists of a splitting
attribute and, possibly, either a split-point or splitting
subset.
 Output: A decision tree
Greedy Algorithm : Generate a
Decision Tree

 Method :
How to determine the best
split?
How to determine the best
split?
How to determine the best
split?
How to determine the best
split?
Example 2 :

New data : Age=30 , Student=Yes, Income=4000,

Buy_computer =?
Attribute Selection Measures
 Heuristic for selecting the splitting criterion that
“best” separates a given data partition, D, of
class-labeled training tuples into individual
classes.
 Three popular attribute selection measures:
1. Information gain.
2. Gain ratio.
3. Gini index.
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain.
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy)
m needed to classify a tuple in D:
Info ( D)   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
v | D |
classify D:
Info A ( D )  
j
 Info ( D j )
j 1 | D |

 Information gained by branching on attribute A

Gain(A)  Info(D)  Info A(D)
Attribute Selection Measure:
Information Gain (ID3)
Example: Induction of a decision tree using information gain.
Table presents a training set, D, of class-labeled tuples randomly selected from
the All Electronics customer database.
Example: Induction of a decision tree using
information gain
1. Compute the expected information needed to classify a tuple
in D : The class label attribute, buys computer, has two distinct
values (yes, no); there are two distinct classes m=2.

2. Compute the expected information requirement for each

attribute. Let’s start with the attribute age
Example: Induction of a decision tree using
information gain
3. The gain in information from such a partitioning would be :

Gain(income )=0.029 bits, Gain(student)= 0.151 bits,

and Gain(credit rating)=0.048 bits.

 Because age has the highest information gain among the

attributes, it is selected as the splitting attribute.
 Node N is labeled with age, and branches are grown for
each of the attribute’s values
Example: Induction of a decision tree using
information gain
Example: Induction of a decision tree using
information gain
Rule-Based Classification
Using IF-THEN Rules for Classification
Example : Rule accuracy and coverage

 Let’s go back to our data in Table .

 These are class labeled tuples from the All Electronics customer
database. Our task is to predict whether a customer will buy a
computer. Consider rule R1, which covers 2 of the 14 tuples.
 It can correctly classify both tuples.
 Coverage (R1)=2/14 = 14.28%
 Accuracy (R1)=2/2 =100%.
Rule Extraction from a Decision Tree
Naïve Bayes Classification
Bayes Classification
 A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
Bayes’ Theorem: Basics

 Bayes’ Theorem:

P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

P(X)

 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Example of Naïve Bayes Classifier: Training
Dataset
Example of Naïve Bayes Classifier: Training
Dataset
Example of Naïve Bayes Classifier: Training
Dataset
Example of Naïve Bayes Classifier: Training
Dataset
Example of Naïve Bayes Classifier: Training
Dataset
Example of Naïve Bayes Classifier: Training
Dataset
Model Evaluation and Selection
 Evaluation metrics: How can we measure accuracy?
Other metrics to consider?
 What if we have more than one classifier and want to
choose the “best” one? This is referred to as model
selection.
 Use validation test set of class-labeled tuples instead
of training set when assessing accuracy.
 Methods for estimating a classifier’s accuracy:
 Holdout method, random subsampling
 Cross-validation
 Bootstrap
57
Metrics for Evaluating Classifier Performance:
Confusion Matrix

Confusion Matrix:
Actual class\Predicted class YES NO
YES True Positives (TP) False Negatives (FN)
NO False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:

Actual class\Predicted buy_computer buy_computer Total
class = yes = no
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000

 Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

that were labeled by the classifier as class j
 May have extra rows/columns to provide totals
58
Classifier Evaluation Metrics: Accuracy,
Error Rate, Sensitivity and Specificity
A\P yes no Class Imbalance Problem:
yes TP FN P
 One class may be rare, e.g.
no FP TN N
fraud, or HIV-positive
P’ N’ All
 Significant majority of the

 Classifier Accuracy, or negative class and minority of

recognition rate: percentage of the positive class
 Sensitivity: True Positive
test set tuples that are
correctly classified recognition rate
 Sensitivity = TP/P
Accuracy = (TP + TN)/All
 Error rate: 1 – accuracy, or  Specificity: True Negative
Error rate = (FP + FN)/All recognition rate
 Specificity = TN/N

60
Classifier Evaluation Metrics:
Precision and Recall, and F-measures

 Precision: exactness – what % of tuples that the classifier labeled

as positive are actually positive

 Recall: completeness – what % of positive tuples did the classifier

label as positive?

 Perfect score is 1.0

 Inverse relationship between precision & recall
 F measure (F1 or F-score): harmonic mean of precision and
recall,

61
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
Classifier Evaluation Metrics: Example

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity
cancer = no 140 9560 9700 98.56 (specificity)
Total 230 9770 10000 96.40 (accuracy)

 Precision = 90/230 = 39.13%

 Recall = 90/300 = 30.00%

63
Evaluating Classifier Accuracy:
Holdout & Cross-Validation Methods

 Holdout method
 Given data is randomly partitioned into two independent sets
 Training set (e.g., 2/3) for model construction
 Test set (e.g., 1/3) for accuracy estimation
 Random sampling: a variation of holdout
 Cross-validation (k-fold, where k = 10 is most popular)
 Randomly partition the data into k mutually exclusive subsets, each
approximately equal size
 At i-th iteration, use Di as test set and others as training set
 Leave-one-out: k folds where k = # of tuples, for small sized data
 *Stratified cross-validation*: folds are stratified so that class
dist. in each fold is approx. the same as that in the initial data

64
Cross-Validation Method
Cross-Validation Method
Summary

 Classification is an extensively studied problem (mainly in

statistics, machine learning & neural networks)
 Classification is probably one of the most widely used
data mining techniques with a lot of extensions
 Scalability is still an important issue for database
applications: thus combining classification with database
techniques should be a promising topic
 Research directions: classification of non-relational data,
e.g., text, spatial, multimedia, etc..

TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Classification
No ratings yet
Classification
81 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
6 Classification DecisionTree
No ratings yet
6 Classification DecisionTree
47 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
7 Classification
100% (3)
7 Classification
63 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
CH 5
No ratings yet
CH 5
84 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Week 5
No ratings yet
Week 5
72 pages
Classification and Prediction
100% (2)
Classification and Prediction
31 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
29 pages
Classification
No ratings yet
Classification
75 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Supervised Learning Classification Techniques
No ratings yet
Supervised Learning Classification Techniques
224 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Unit 3
No ratings yet
Unit 3
98 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
71 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Decision Tree and Evalaution
No ratings yet
Decision Tree and Evalaution
50 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DM Unit-Iii
No ratings yet
DM Unit-Iii
13 pages
Concepts and Techniques
No ratings yet
Concepts and Techniques
53 pages
Classification
No ratings yet
Classification
33 pages
Classification Algorithms
No ratings yet
Classification Algorithms
23 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification
No ratings yet
Classification
45 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Classification-1
No ratings yet
Classification-1
48 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Phillips-Perron Test Statistic: 0.9990 I (0) - 12.62137 0.0000 I (1) - 1.686543 0.7533 I (0) - 12.76035 0.0000 I
No ratings yet
Phillips-Perron Test Statistic: 0.9990 I (0) - 12.62137 0.0000 I (1) - 1.686543 0.7533 I (0) - 12.76035 0.0000 I
6 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Statistical Data Analysis Overview
No ratings yet
Statistical Data Analysis Overview
53 pages
Fraud Detection in Credit Card Data
No ratings yet
Fraud Detection in Credit Card Data
8 pages
99-Article Text-341-1-10-20190510
No ratings yet
99-Article Text-341-1-10-20190510
9 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
17.1.6 General Comments On Linear Regression
No ratings yet
17.1.6 General Comments On Linear Regression
7 pages
Alpha Cronbach Reliability Analysis
No ratings yet
Alpha Cronbach Reliability Analysis
16 pages
C. Tamilselvi and G. Mohan Naidu
No ratings yet
C. Tamilselvi and G. Mohan Naidu
10 pages
Wooldridge 2010
No ratings yet
Wooldridge 2010
42 pages
Reliability: Notes
No ratings yet
Reliability: Notes
7 pages
SPSS Pearson R
No ratings yet
SPSS Pearson R
20 pages
Regression Questions Topic 6
No ratings yet
Regression Questions Topic 6
3 pages
I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
100% (1)
I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
3 pages
2024 FRM Part 1 IFT Notes (Sample)
No ratings yet
2024 FRM Part 1 IFT Notes (Sample)
14 pages
Mushroom Project Report
No ratings yet
Mushroom Project Report
80 pages
GNR 652 Assignment 2
No ratings yet
GNR 652 Assignment 2
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Publication Excerpt Dazong Richard Hosea
No ratings yet
Publication Excerpt Dazong Richard Hosea
6 pages
Hours Spent Studying Total Points Earned
No ratings yet
Hours Spent Studying Total Points Earned
20 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
7047 001 Group 8 Water Potability
No ratings yet
7047 001 Group 8 Water Potability
19 pages
Exercises Chapter2 Part1
No ratings yet
Exercises Chapter2 Part1
2 pages
Bella Anggita
No ratings yet
Bella Anggita
7 pages
Econometrics Basics and Data Types
No ratings yet
Econometrics Basics and Data Types
264 pages
Quiz2 ISDS 361B
No ratings yet
Quiz2 ISDS 361B
5 pages
(Regression Quizz) - Final
No ratings yet
(Regression Quizz) - Final
3 pages
CUHK STAT3004 Assignment 3
No ratings yet
CUHK STAT3004 Assignment 3
2 pages
1.2.3 KNN
No ratings yet
1.2.3 KNN
13 pages
GNEST305 - Introduction To Artificial Intelligence and Data Science (Model QP)
No ratings yet
GNEST305 - Introduction To Artificial Intelligence and Data Science (Model QP)
3 pages

Week 6 - 7 - Classification

Uploaded by

Week 6 - 7 - Classification

Uploaded by

CIS 517 :Data Mining and

 What is classification? What is

 classifies data (constructs a model) based on the

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

 Model construction: describing a set of predetermined

NAME RANK YEARS TENURED Classifier

 Time to construct the model

 Time to use the model

 Decision tree size

 Usually the Accuracy measure is used:

 Decision Tree based Methods

1 Yes Single 125K No

Training Data Model: Decision Tree

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

13 Yes Large 110K ?

 Greedy strategy (Heuristic method)

New data : Age=30 , Student=Yes, Income=4000,

 Information gained by branching on attribute A

2. Compute the expected information requirement for each

Gain(income )=0.029 bits, Gain(student)= 0.151 bits,

 Because age has the highest information gain among the

 Let’s go back to our data in Table .

P( H | X)  P(X | H ) P( H )  P(X | H ) P( H ) / P(X)

 Let X be a data sample (“evidence”): class label is unknown

Example of Confusion Matrix:

 Given m classes, an entry, CMi,j in a confusion matrix indicates # of tuples in class i

 Classifier Accuracy, or negative class and minority of

 Precision: exactness – what % of tuples that the classifier labeled

 Recall: completeness – what % of positive tuples did the classifier

 Perfect score is 1.0

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

 Precision = 90/230 = 39.13%

 Recall = 90/300 = 30.00%

 Classification is an extensively studied problem (mainly in

You might also like