0% found this document useful (0 votes)
36 views28 pages

Chap4 Imbalanced Classes

Uploaded by

sweetie DL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

Chap4 Imbalanced Classes

Uploaded by

sweetie DL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Mining

Classification: Alternative Techniques

Imbalanced Class Problem

Introduction to Data Mining, 2nd Edition


by
Tan, Steinbach, Karpatne, Kumar
Class Imbalance Problem

 Lots of classification problems where the classes


are skewed (more records from one class than
another)
– Credit card fraud
– Intrusion detection
– Defective products in manufacturing assembly line
– COVID-19 test results on a random sample

 Key Challenge:
– Evaluation measures such as accuracy are not well-
suited for imbalanced class

2/15/2021 Introduction to Data Mining, 2 nd Edition 2


Confusion Matrix

 Confusion Matrix:
PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
CLASS
Class=No c d

a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)

2/15/2021 Introduction to Data Mining, 2 nd Edition 3


Accuracy

PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
(TP) (FN)
CLASS
Class=No c d
(FP) (TN)

 Most widely-used metric:

ad TP  TN
Accuracy  
a  b  c  d TP  TN  FP  FN
2/15/2021 Introduction to Data Mining, 2 nd Edition 4
Problem with Accuracy
 Consider a 2-class problem
– Number of Class NO examples = 990
– Number of Class YES examples = 10
 If a model predicts everything to be class NO, accuracy is
990/1000 = 99 %
– This is misleading because this trivial model does not detect any class
YES example
– Detecting the rare class is usually more interesting (e.g., frauds,
intrusions, defects, etc)

PREDICTED CLASS
Class=Yes Class=No

Class=Yes 0 10
ACTUAL
CLASS
Class=No 0 990
2/15/2021 Introduction to Data Mining, 2 nd Edition 5
Which model is better?

PREDICTED
Class=Yes Class=No
A ACTUAL Class=Yes 0 10
Class=No 0 990

Accuracy: 99%

PREDICTED
B Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490

Accuracy: 50%
2/15/2021 Introduction to Data Mining, 2 nd Edition 6
Which model is better?

PREDICTED
A Class=Yes Class=No
ACTUAL Class=Yes 5 5
Class=No 0 990

PREDICTED
B Class=Yes Class=No
ACTUAL Class=Yes 10 0
Class=No 500 490

2/15/2021 Introduction to Data Mining, 2 nd Edition 7


Alternative Measures

PREDICTED CLASS
Class=Yes Class=No

Class=Yes a b
ACTUAL
CLASS Class=No c d

a
Precision (p) 
ac
a
Recall (r) 
ab
2rp 2a
F - measure (F)  
r  p 2a  b  c
2/15/2021 Introduction to Data Mining, 2 nd Edition 8
Alternative Measures
10
PREDICTED CLASS Precision (p)   0 .5
10  10
Class=Yes Class=No 10
Recall (r)  1
10  0
Class=Yes 10 0 2 * 1 * 0 .5
ACTUAL F - measure (F)   0.62
CLASS Class=No 10 980 1  0 .5
990
Accuracy   0.99
1000

2/15/2021 Introduction to Data Mining, 2 nd Edition 9


Alternative Measures
10
PREDICTED CLASS Precision (p)   0 .5
10  10
Class=Yes Class=No 10
Recall (r)  1
10  0
Class=Yes 10 0 2 * 1 * 0 .5
ACTUAL F - measure (F)   0.62
CLASS Class=No 10 980 1  0 .5
990
Accuracy   0.99
1000

PREDICTED CLASS 1
Precision (p)  1
1 0
Class=Yes Class=No
1
Recall (r)   0 .1
Class=Yes 1 9 1 9
ACTUAL
2 * 0 .1 * 1
CLASS Class=No 0 990 F - measure (F)   0.18
1  0.1
991
Accuracy   0.991
1000
2/15/2021 Introduction to Data Mining, 2 nd Edition 10
Which of these classifiers is better?

PREDICTED CLASS
Precision (p)  0.8
Class=Yes Class=No
A Class=Yes 40 10
Recall (r)  0.8
F - measure (F)  0.8
ACTUAL
CLASS Class=No 10 40 Accuracy  0.8

PREDICTED CLASS
B Class=Yes Class=No Precision (p)  ~ 0.04
Class=Yes 40 10 Recall (r)  0.8
ACTUAL F - measure (F)  ~ 0.08
CLASS Class=No 1000 4000
Accuracy  ~ 0.8

2/15/2021 Introduction to Data Mining, 2 nd Edition 11


Measures of Classification Performance

PREDICTED CLASS
Yes No
ACTUA
L Yes TP FN
CLASS No FP TN

 is the probability that we reject


the null hypothesis when it is
true. This is a Type I error or a
false positive (FP).

 is the probability that we


accept the null hypothesis when
it is false. This is a Type II error
or a false negative (FN).

2/15/2021 Introduction to Data Mining, 2 nd Edition 12


Alternative Measures

A PREDICTED CLASS
Class=Yes Class=No

Class=Yes 40 10
ACTUAL
CLASS Class=No 10 40
TPR
=4
FPR

Precision (p)=0.038
B PREDICTED CLASS
Class=Yes Class=No

Class=Yes 40 10
ACTUAL
CLASS Class=No 1000 4000
TPR
=4
FPR

2/15/2021 Introduction to Data Mining, 2 nd Edition 13


Which of these classifiers is better?

A PREDICTED CLASS
Class=Yes Class=No
Precision (p)  0.5
Class=Yes 10 40
TPR  Recall (r)  0.2
ACTUAL FPR  0.2
CLASS Class=No 10 40
F  measure  0.28

B PREDICTED CLASS
Precision (p)  0.5
Class=Yes Class=No
Class=Yes 25 25
TPR  Recall (r)  0.5
ACTUAL
Class=No 25 25 FPR  0.5
CLASS
F  measure  0.5

C PREDICTED CLASS
Precision (p)  0.5
Class=Yes Class=No
TPR  Recall (r)  0.8
Class=Yes 40 10
ACTUAL FPR  0.8
CLASS Class=No 40 10
F  measure  0.61
2/15/2021 Introduction to Data Mining, 2 nd Edition 14
ROC (Receiver Operating Characteristic)

 A graphical approach for displaying trade-off


between detection rate and false alarm rate
 Developed in 1950s for signal detection theory to
analyze noisy signals
 ROC curve plots TPR against FPR
– Performance of a model represented as a point in an
ROC curve

2/15/2021 Introduction to Data Mining, 2 nd Edition 15


ROC Curve

(TPR,FPR):
 (0,0): declare everything
to be negative class
 (1,1): declare everything
to be positive class
 (1,0): ideal

 Diagonal line:
– Random guessing
– Below diagonal line:
 prediction is opposite
of the true class

2/15/2021 Introduction to Data Mining, 2 nd Edition 16


ROC (Receiver Operating Characteristic)

 To draw ROC curve, classifier must produce


continuous-valued output
– Outputs are used to rank test records, from the most likely
positive class record to the least likely positive class record
– By using different thresholds on this value, we can create
different variations of the classifier with TPR/FPR tradeoffs
 Many classifiers produce only discrete outputs (i.e.,
predicted class)
– How to get continuous-valued outputs?
 Decision trees, rule-based classifiers, neural networks,
Bayesian classifiers, k-nearest neighbors, SVM

2/15/2021 Introduction to Data Mining, 2 nd Edition 17


Example: Decision Trees
Decision Tree
x2 < 12.63

x1 < 13.29 x2 < 17.35


Continuous-valued outputs
x1 < 6.56 x1 < 2.15

x2 < 12.63
x1 < 7.24
x2 < 8.64
x1 < 13.29 x2 < 17.35

x1 < 12.11
x2 < 1.38 x1 < 6.56 x1 < 2.15
0.059 0.220
x1 < 18.88
x1 < 7.24
x2 < 8.64 0.071
0.107

x1 < 12.11
x2 < 1.38 0.164 0.727

x1 < 18.88
0.143 0.669 0.271

0.654 0

2/15/2021 Introduction to Data Mining, 2 nd Edition 18


ROC Curve Example

x2 < 12.63

x1 < 13.29 x2 < 17.35

x1 < 6.56 x1 < 2.15


0.059 0.220

x1 < 7.24
x2 < 8.64 0.071
0.107

x1 < 12.11
x2 < 1.38 0.727
0.164

x1 < 18.88
0.143 0.669 0.271

0.654 0

2/15/2021 Introduction to Data Mining, 2 nd Edition 19


ROC Curve Example
- 1-dimensional data set containing 2 classes (positive and negative)
- Any points located at x > t is classified as positive

At threshold t:
TPR=0.5, FNR=0.5, FPR=0.12, TNR=0.88
2/15/2021 Introduction to Data Mining, 2 nd Edition 20
How to Construct an ROC curve
• Use a classifier that produces a
Instance Score True Class
continuous-valued score for
1 0.95 +
each instance
2 0.93 + • The more likely it is for the
3 0.87 - instance to be in the + class, the
4 0.85 - higher the score
5 0.85 - • Sort the instances in decreasing
6 0.85 + order according to the score
7 0.76 - • Apply a threshold at each unique
8 0.53 + value of the score
9 0.43 - • Count the number of TP, FP,
10 0.25 + TN, FN at each threshold
• TPR = TP/(TP+FN)
• FPR = FP/(FP + TN)

2/15/2021 Introduction to Data Mining, 2 nd Edition 21


How to construct an ROC curve
Class + - + - - - + - + +
P
Threshold >= 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00

TP 5 4 4 3 3 3 3 2 2 1 0

FP 5 5 4 4 3 2 1 1 0 0 0

TN 0 0 1 1 2 3 4 4 5 5 5

FN 0 1 1 2 2 2 2 3 3 4 5

TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0

FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0

ROC Curve:

2/15/2021 Introduction to Data Mining, 2 nd Edition 22


Using ROC for Model Comparison

 No model consistently
outperforms the other
 M is better for
1
small FPR
 M is better for
2
large FPR
 Area Under the ROC
curve (AUC)
 Ideal:
 Area = 1
 Random guess:
 Area = 0.5

2/15/2021 Introduction to Data Mining, 2 nd Edition 23


Dealing with Imbalanced Classes - Summary

 Many measures exists, but none of them may be ideal in


all situations
– Random classifiers can have high value for many of these measures
– TPR/FPR provides important information but may not be sufficient by
itself in many practical scenarios
– Given two classifiers, sometimes you can tell that one of them is
strictly better than the other
 C1 is strictly better than C2 if C1 has strictly better TPR and FPR relative to C2 (or same
TPR and better FPR, and vice versa)
– Even if C1 is strictly better than C2, C1’s F-value can be worse than
C2’s if they are evaluated on data sets with different imbalances
– Classifier C1 can be better or worse than C2 depending on the scenario
at hand (class imbalance, importance of TP vs FP, cost/time tradeoffs)

2/15/2021 Introduction to Data Mining, 2 nd Edition 24


Which Classifer is better?
Precision (p)  0.98
T1 PREDICTED CLASS TPR  Recall (r)  0.5
Class=Yes Class=No
FPR  0.01
ACTUAL
Class=Yes 50 50 TPR/FPR  50
CLASS Class=No 1 99
F  measure  0.66

Precision (p)  0.9


T2 PREDICTED CLASS
TPR  Recall (r)  0.99
Class=Yes Class=No

Class=Yes 99 1
FPR  0.1
ACTUAL TPR/FPR  9.9
CLASS Class=No 10 90
F  measure  0.94

T3 PREDICTED CLASS Precision (p)  0.99


Class=Yes Class=No TPR  Recall (r)  0.99
Class=Yes 99 1 FPR  0.01
ACTUAL
CLASS Class=No 1 99 TPR/FPR  99

2/15/2021 Introduction to Data Mining, 2 nd Edition F  measure  0.99


25
Which Classifer is better? Medium Skew case

Precision (p)  0.83


T1 PREDICTED CLASS TPR  Recall (r)  0.5
Class=Yes Class=No
FPR  0.01
ACTUAL
Class=Yes 50 50 TPR/FPR  50
CLASS Class=No 10 990
F  measure  0.62

Precision (p)  0.5


T2 PREDICTED CLASS
TPR  Recall (r)  0.99
Class=Yes Class=No

Class=Yes 99 1
FPR  0.1
ACTUAL TPR/FPR  9.9
CLASS Class=No 100 900
F  measure  0.66

T3 PREDICTED CLASS Precision (p)  0.9


Class=Yes Class=No TPR  Recall (r)  0.99
Class=Yes 99 1 FPR  0.01
ACTUAL
CLASS Class=No 10 990 TPR/FPR  99

2/15/2021 Introduction to Data Mining, 2 nd Edition F  measure  0.94


26
Which Classifer is better? High Skew case

Precision (p)  0.3


T1 PREDICTED CLASS TPR  Recall (r)  0.5
Class=Yes Class=No
FPR  0.01
ACTUAL
Class=Yes 50 50 TPR/FPR  50
CLASS Class=No 100 9900
F  measure  0.375

Precision (p)  0.09


T2 PREDICTED CLASS
TPR  Recall (r)  0.99
Class=Yes Class=No

Class=Yes 99 1
FPR  0.1
ACTUAL TPR/FPR  9.9
CLASS Class=No 1000 9000
F  measure  0.165

T3 PREDICTED CLASS Precision (p)  0.5


Class=Yes Class=No TPR  Recall (r)  0.99
Class=Yes 99 1 FPR  0.01
ACTUAL
CLASS Class=No 100 9900 TPR/FPR  99

2/15/2021 Introduction to Data Mining, 2 nd Edition F  measure  0.66


27
Building Classifiers with Imbalanced Training Set

 Modify the distribution of training data so that rare


class is well-represented in training set
– Undersample the majority class
– Oversample the rare class

2/15/2021 Introduction to Data Mining, 2 nd Edition 28

You might also like