0% found this document useful (0 votes)

32 views52 pages

DL IT324a 4

The document discusses various evaluation parameters for machine learning classifiers, including precision, recall, accuracy, F-measure, sensitivity, and ROC curves. It emphasizes the importance of unbiased accuracy estimates through techniques like cross-validation and stratified sampling. Additionally, it highlights the limitations of accuracy as a metric, especially in cases of class imbalance, and introduces alternative metrics such as sensitivity and specificity.

Uploaded by

Jay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views52 pages

DL IT324a 4

Uploaded by

Jay Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Evaluation of Machine Learning

Classifiers
Machine Learning

Dr. Dinesh K. Vishwakarma

1
Outline: Evaluation Parameters
 Precision
 Recall
 Accuracy
 F-Measure
 True Positive Rate
 False Positive Rate
 Sensitivity
 ROC 2
Experiment: Training and Testing
 Objective: Unbiased estimate of accuracy

3
Experiment: Training and
Testing…
 How can we get an unbiased estimate of the
accuracy of a learned model?
 when learning a model, you should pretend that you
don’t have the test data yet (it is “in the mail”)*
 if the test-set labels influence the learned model in
any way, accuracy estimates will be biased
 * In some applications it is reasonable to assume that
you have access to the feature vector (i.e. x) but not the
y part of each test instance

4
Learning Curve
 How does the accuracy of a learning method
change as a function of the training-set size?
 This can be assessed by plotting learning curves
#Given training/test set
partition
• for each sample size s on
learning curve
• (optionally) repeat n times
• randomly select s instances
from training set
• learn model
• evaluate model on test set
to determine accuracy a
• plot (s, a) or (s, avg.
accuracy and error bars)
5
Validation (Tuning) Set
 Consider we want unbiased estimates of accuracy
during the learning process (e.g. to choose the best level
of decision-tree pruning)?

Partition training data into separate training/validation sets 6

Limitation of Single Training/Test Partition
 We may not have enough data to make sufficiently
large
 training and test sets a larger test set gives us more
reliable estimate of accuracy (i.e. a lower variance
estimate)
 but… a larger training set will be more representative of
how much data we actually have for learning process
 A single training set doesn’t tell us how sensitive
accuracy is to a particular training sample

7
Random Sampling
 It can be addressed the second issue by repeatedly
randomly partitioning the available data into training and set
sets.

8
Random Sampling…
 When randomly selecting
training or validation sets,
we may want to ensure that
class proportions are
maintained in each selected
set.
 This can be done via
stratified sampling: first
stratify instances by class,
then randomly select
instances from each class
proportionally.

9
Cross Validation

Partition
data
into n
subsamples

Iteratively
leave one
subsample
out for
the test set,
train on
the rest
10
Cross Validation Example
 Suppose we have 100 instances, and we want to
estimate accuracy with cross validation.

11
Cross Validation…
 10-fold cross validation is common, but smaller values
of n are often used when learning takes a lot of time
 In leave-one-out cross validation, n = # instances
 In stratified cross validation, stratified sampling is used
when partitioning the data
 CV makes efficient use of the available data for testing
 Note that whenever we use multiple training sets, as in
CV and random resampling, we are evaluating a
learning method as opposed to an individual learned
model
12
Internal Cross Validation
 Instead of a single validation set, we can use cross-
validation within a training set to select a model (e.g.
to choose the best level of decision-tree pruning)

13
Confusion Matrix
 It is also called as prediction
table.
 It is an 𝑵 × 𝑵 matrix used for
evaluating the performance of a
classification model, where 𝑵 is
the number of target classes
 It compares the actual target
values with those predicted.
 The columns represent the actual
values of the target variable
 The rows represent the predicted
values of the target variable. 14
Confusion Matrix…

15
Type-I and Type-II Error

16
Sec. 8.3

Precision
 Precision: measures the correctness achieved in true
prediction. Also, tells us how many predictions are actually
positive out of all the total positive predicted. Precision
should be high(ideally 1)
 “Precision is a useful metric in cases where False Positive
is a higher concern than False Negatives”
𝑡𝑝
 Precision/ Positive Prediction Value 𝑃 = 𝑡
𝑝 +𝑓𝑝
𝑡𝑝
 Recall R=
𝑡𝑝 +𝑓𝑛

17
Issues with “Precision & Recall”

TP FP
FN TN

 Both classifiers gives the same precision and recall

values of 66.7% and 40% (Note: the data sets are
different)
 They exhibit very different behaviours:
 Same positive recognition rate
 Extremely different negative recognition rate: strong on the left /
nil on the right
 Note: Accuracy has no problem catching this!
18
Sec. 8.3

A combined measure: F
 Combined measure that assesses
precision/recall tradeoff is F measure (weighted
harmonic mean):
1 (   1) PR
2
F 
1
  (1   )
1  PR
2

P R

 People usually use balanced F1 measure

 i.e., with  = 1 or  = ½
 Harmonic mean is a conservative average. 19
Accuracy
 Measures the correct predictions.
 The accuracy metric is not suited for imbalanced
classes.
 Accuracy has its own disadvantages, for imbalanced
data, when the model predicts that each point belongs to
the majority class label, the accuracy will be high. But,
the model is not accurate.
 Accuracy is a valid choice of evaluation for
classification problems which are well
balanced and not skewed or there is no class
imbalance.
20
Sec. 8.3

Accuracy Measure
 The accuracy of
an engine: the
fraction of these
classifications that
are correct.
 𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚(%) =
(𝒕𝒑 +𝒕𝒏 )
× 100
(𝒕𝒑 +𝒕𝒏 +𝒇𝒏 +𝒇𝒑 )

21
Accuracy Measure
𝒚 labelled Value (0- 𝒚 predicted
ෝ Output at Confusion Matrix
Negative, 1-Positive) value threshold (0.5)
0 0.3 0 TP=2 FP=1
1 0.4 0 FN=1 TN=2
0 0.7 1

1 0.8 1

0 0.4 0

1 0.7 1

4 𝑇𝑃 2
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = .666 𝑅𝑒𝑐𝑎𝑙𝑙 = = = .666
6 𝑇𝑃 + 𝐹𝑁 3
2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = .666
3
22
Issues with Accuracy
 Consider a 2-class problem
 Number of Class 0 examples = 9990

 Number of Class 1 examples = 10

 If model predicts everything to be class 0,

accuracy is 9990/10000 = 99.9 %
 Accuracy is misleading because model does not detect
any class 1 example

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 23

Issues with Accuracy…

 Both classifiers gives 60% accuracy.

 They exhibit very different behaviors:
 On the left: weak positive recognition rate/strong
negative recognition rate
 On the right: strong positive recognition rate/weak
negative recognition rate
24
Is accuracy adequate measure?
 Accuracy may not be useful measure in cases
where
 there is a large class skew
 Is 98% accuracy good if 97% of the instances are negative?
 there are differential misclassification costs – say,
getting a positive wrong costs more than getting a
negative wrong.
 Consider a medical domain in which a false positive results in
an extraneous test but a false negative results in a failure to
treat a disease
 we are most interested in a subset of high-confidence
predictions
25
Miss Classification Error
 Recognition rate=accuracy=success rate
 Miss classification rate= failure rate

5+10
 𝑀𝑖𝑠𝑠 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐸𝑟𝑟𝑜𝑟 = = 0.09
50+10+5+100
𝐹𝑁+𝐹𝑃
 Error in percentage= ∗ 100
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁 26
Sensitivity & Specificity
 Sensitivity is the metric that evaluates a
model’s ability to predict true positives of each
available category.
 Specificity is the metric that evaluates a
model’s ability to predict true negatives of each
available category.

27
Find Sensitivity and Specificity

28
Other form of Accuracy Metrics

29
ROC/AUC
 A Receiver Operating Characteristic (ROC)/Area Under
Curve plots the TP-rate vs. the FP-rate as a threshold on
the confidence of an instance being positive is varied.

Different methods can

work better in different
parts of ROC space.
This depends on cost of
false + vs. false -

expected curve for

random guessing
30
Area Under the Receiver
Operating Characteristics
 AUC-ROC curve measure the
performance at various threshold
settings.
 ROC is a probability curve and
AUC represents the degree or
measure of separability.
 AUC tells the model capability of
distinguishing between classes.
 Higher AUC, the better the model
is at predicting 0 classes as 0 and
1 classes as 1.
 The ROC curve is plotted between
TPR & FPR, where TPR is on the
y-axis and FPR is on the x-axis.

31
ROC curves & Misclassification
costs

Best operating point

when FN costs 10× FP

Best operating point when

cost of misclassifying
positives
and negatives is equal

Best operating point when

FP costs 10× FN

32
Create ROC of a model
 Consider a prediction table at different threshold setting
𝒚 𝒚 predicted
ෝ Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0

Threshold TP=3 FP=1 TN=2 FN=0 TPR=3/(3+0)=1 FPR=2/(2+1)=.66

Setting (0.5)

33
Create ROC of a model…
 Threshold setting (0.6)
𝒚 𝒚 predicted
ෝ Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0

Threshold TP=2 FP=1 TN=2 FN=1 TPR=2/(2+1)=.66 FPR=1/(1+2)=.66

Setting (0.6)

34
Create ROC of a model…
 Threshold setting (0.72)
𝒚 𝒚 predicted
ෝ Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0

Threshold TP=1 FP=1 TN=2 FN=2 TPR=1/(1+2)=.33 FPR=1/(1+2)=.33

Setting (0.72)

35
Create ROC of a model…
 Threshold setting (0.80)
𝒚 𝒚 predicted
ෝ Output at Output at Output at Output at
labelled Value (0- value threshold threshold threshold threshold
Negative, 1- (0.5) (0.6) (0.72) (0.8)
Positive)
0 0.3 0 0 0 0
1 0.55 1 0 0 0
0 0.75 1 1 1 0
1 0.8 1 1 1 1
0 0.4 0 0 0 0
1 0.7 1 1 0 0

Threshold TP=1 FP=0 TN=3 FN=2 TPR=1/(1+2)=.33 FPR=0

Setting (0.80

36
Plot of ROC
Threshold TP=3 FP=1 TN=2 FN=0 TPR=3/(3+0)=1 FPR=2/(2+1)=.66
Setting (0.5)

Threshold TP=2 FP=1 TN=2 FN=1 TPR=2/(2+1)=.66 FPR=1/(1+2)=.66

Setting (0.6)

Threshold TP=1 FP=1 TN=2 FN=2 TPR=1/(1+2)=.33 FPR=1/(1+2)=.33

Setting (0.72)

Threshold TP=1 FP=0 TN=3 FN=2 TPR=1/(1+2)=.33 FPR=0

Setting (0.80
1
0.9
0.8
0.7
0.6
TPR

0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FPR 37
Step to create ROC
 Sort test-set predictions according to confidence
that each instance is positive.
 Step through sorted list from high to low
confidence
 locate a threshold between instances with opposite
classes (keeping instances with the same confidence
value on the same side of threshold)
 compute TPR, FPR for instances above threshold
 output (FPR, TPR) coordinate

38
Example of ROC Plot

39
Example of ROC Plot …
 Rearrange the samples according to class
Correct class Instance Confidence Positive
+ Ex 9 0.99
+ Ex 7 0.98 Positive
+ Ex 2 0.70 Class

+ Ex 6 0.65
+ Ex 5 0.24
- Ex 1 0.72
- Ex10 0.51
Negative
- Ex 3 0.39 Class
- Ex 4 0.11
- Ex 8 0.01
40
Example of ROC Plot …
 For Threshold 0.72
Correct class Instance confidence positive predicted class
+ Ex 9 0.99 +
+ Ex 7 0.98 +
+ Ex 2 0.70 -
+ Ex 6 0.65 -
+ Ex 5 0.24 -
- Ex 1 0.72 +
- Ex10 0.51 -
- Ex 3 0.39 -
- Ex 4 0.11 -
- Ex 8 0.01 -

Confidence > threshold

TP=2
Positive class FP=1
Else TN=4
Negative class FN=3
TPR=TP/TP+FN=2/5
41
FPR=FP/FP+TN=1/5
Example of ROC Plot …
 For Threshold 0.65
Correct class Instance confidence positive predicted class
+ Ex 9 0.99 +
+ Ex 7 0.98 +
+ Ex 2 0.70 +
+ Ex 6 0.65 +
+ Ex 5 0.24 -
- Ex 1 0.72 +
- Ex10 0.51 -
- Ex 3 0.39 -
- Ex 4 0.11 -
- Ex 8 0.01 -

Confidence > threshold

TP=4
Positive class FP=1
Else TN=4
Negative class FN=1
TPR=TP/TP+FN=4/5
42
FPR=FP/FP+TN=1/5
Significance of ROC

 This is an ideal situation, when two curves don’t overlap

at all, means model has an ideal measure of separability.
 It is perfectly able to distinguish between positive class
and negative class.

43
Significance of ROC…

 When two distributions overlap, then type 1 and type 2

errors are introduced.
 Depending upon the threshold, it can be minimized or
maximized. When AUC is 0.7, it means there is a 70%
chance that the model will be able to distinguish between
positive class and negative class.
44
Significance of ROC…

 This is the worst situation.

 When AUC is approximately 0.5, the model has no
discrimination capacity to distinguish between positive
class and negative class.

45
Significance of ROC…

 When AUC is approximately 0, the model is actually

reciprocating the classes. It means the model is
predicting a negative class as a positive class and vice
versa.

TPR⬆️, FPR⬆️ and TPR⬇️, FPR⬇️ 46

Issues with ROC/AUC
 AUC/ROC has adopted as replacement of
accuracy but it has also some criticism such as:
 The ROC curves on which the AUCs of different
classifiers are based may cross, thus not giving an
accurate picture of what is really happening.
 The misclassification cost distributions used by the
AUC are different for different classifiers.
 Therefore, we may be comparing “apples and
oranges” as the AUC may give more weight to
misclassifying a point by classifier A than it does by
classifier B. Ans: H-Measure
47
Other Accuracy Metrics

48
Precision/recall curves
 A precision/recall curve plots the precision vs.
recall (TP-rate) as a threshold on the confidence
of an instance being positive is varied.

49
Comment on ROC/PR Curve
 Both
 allow predictive performance to be assessed at various levels of
confidence
 assume binary classification tasks
 sometimes summarized by calculating area under the curve
 ROC curves
 insensitive to changes in class distribution (ROC curve does not
change if the proportion of positive and negative instances in the
test set are varied)
 can identify optimal classification thresholds for tasks with
differential misclassification costs
 Precision/Recall curves
 show the fraction of predictions that are false positives
 well suited for tasks with lots of negative instances
50
Loss Function
 Mean Square Error Loss Function
 It is used for regression problem
 Mean square error loss for m-data point is defined as
1 𝑚
𝐿𝑆𝐸 = σ𝑖=1(𝑦𝑖 − 𝑦ෝ𝑖 )2
𝑚
ො 2.
 For single point 𝐿𝑆𝐸 1 = (𝑦 − 𝑦)
 Binary Cross Entropy Loss Function
 It is used for classification problem
1
 BCELF is defined 𝐿𝐶𝐸 = − σ𝑚 ො𝑖 + (1 −
𝑖=1 [𝑦𝑖 ln 𝑦
𝑚

51
Example
 Consider a 2-class problem, where ground truth is
𝑦 = 0. 𝑡ℎ𝑒𝑛 𝑳𝑺𝑬 𝟏 = 𝒚 ෝ𝟐 and, 𝑦 = 1 𝐿𝑆𝐸 1 = (1 − 𝑦)
ො 2.
 Similarly 𝑳𝑪𝑬 𝟏 = 𝒍𝒏 𝟏 − 𝒚 ෝ 𝒂𝒏𝒅 𝒍𝒏(ෝ𝒚) 𝟐
2 ෝ𝒚
(1 − 𝑦)
ො
 Consider example,
 𝑦 = 0, & 𝑦ො = 0.9, 𝐿𝑆𝐸 = 0.81
 Similarly 𝐿𝐶𝐸 = 2.3
𝜕𝐿𝑆𝐸 𝜕𝐿𝐶𝐸
 Gradient = 1.8 and = 10.0
𝜕 𝑦ො 𝜕𝑦ො

Cross entropy loss 𝒍𝒏(ෝ

𝒚)
𝒍𝒏 𝟏 − ෝ
𝒚
penalizes model
more
52

Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
AI351 Lecture 2 - Common Evaluation Metrics
No ratings yet
AI351 Lecture 2 - Common Evaluation Metrics
50 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
Model Evaluation
No ratings yet
Model Evaluation
31 pages
Lesson 6 Analytics Methods
No ratings yet
Lesson 6 Analytics Methods
12 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Module 6
No ratings yet
Module 6
24 pages
ML - Training - Evaluation For Machine Learning Course
No ratings yet
ML - Training - Evaluation For Machine Learning Course
31 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
61 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
Unit 4
No ratings yet
Unit 4
20 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
8 pages
ML - 03 Evaluation Metrics
No ratings yet
ML - 03 Evaluation Metrics
17 pages
Data Mining Evaluation Metrics Guide
No ratings yet
Data Mining Evaluation Metrics Guide
40 pages
Data M
No ratings yet
Data M
10 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Classification Metrics Guide
No ratings yet
Classification Metrics Guide
15 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Evaluation Metrics in Machine Learning - GeeksforGeeks
No ratings yet
Evaluation Metrics in Machine Learning - GeeksforGeeks
6 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Data M11
No ratings yet
Data M11
5 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
Evaluation Metricsflaksdj Fa
No ratings yet
Evaluation Metricsflaksdj Fa
22 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
Lecture - Model Evaluation
No ratings yet
Lecture - Model Evaluation
18 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Advanced ML Classification Guide
No ratings yet
Advanced ML Classification Guide
40 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
Binary Classifier Evaluation Guide
No ratings yet
Binary Classifier Evaluation Guide
12 pages
Bobcat Advanced Troubleshooting System Bats
No ratings yet
Bobcat Advanced Troubleshooting System Bats
2 pages
SDG Quiz Answers
100% (2)
SDG Quiz Answers
2 pages
Prospective Board Member Questionnaire
No ratings yet
Prospective Board Member Questionnaire
2 pages
12thDailyTest5-Students 20241127110535
No ratings yet
12thDailyTest5-Students 20241127110535
2 pages
AgriTech Solutions for ASEAN Farmers
No ratings yet
AgriTech Solutions for ASEAN Farmers
3 pages
Tracing A Number: //CELLID// // LAC
No ratings yet
Tracing A Number: //CELLID// // LAC
7 pages
M.Com Marketing Analysis: Apple
No ratings yet
M.Com Marketing Analysis: Apple
19 pages
CASEL CSI Emerging Insights Brief 2020
100% (1)
CASEL CSI Emerging Insights Brief 2020
37 pages
ANICAS, Jerimi V. - Project - in - IE203
No ratings yet
ANICAS, Jerimi V. - Project - in - IE203
12 pages
Analysis of Training Evaluation Process Using Kirkpatrick'S Training Evaluation Model at Pt. Bank Tabungan Negara (Persero) TBK
No ratings yet
Analysis of Training Evaluation Process Using Kirkpatrick'S Training Evaluation Model at Pt. Bank Tabungan Negara (Persero) TBK
10 pages
Student Visa SOP for Canada
100% (2)
Student Visa SOP for Canada
3 pages
10 - 12 Fancy Yarn-1
No ratings yet
10 - 12 Fancy Yarn-1
4 pages
Student Centered Learning Toolkit
No ratings yet
Student Centered Learning Toolkit
72 pages
Curriculum Map Contemporary Arts 1st and 2nd Quarter
No ratings yet
Curriculum Map Contemporary Arts 1st and 2nd Quarter
12 pages
CHA Hyderabad (AutoRecovered) Jan2023
No ratings yet
CHA Hyderabad (AutoRecovered) Jan2023
18 pages
TR 28
100% (1)
TR 28
4 pages
(Original PDF) Mathematical Proofs: A Transition To Advanced Mathematics 4th Edition Available Instanly
100% (2)
(Original PDF) Mathematical Proofs: A Transition To Advanced Mathematics 4th Edition Available Instanly
155 pages
Engineering Graphics I
No ratings yet
Engineering Graphics I
2 pages
Tài Liệu Không Có Tiêu Đề-2
No ratings yet
Tài Liệu Không Có Tiêu Đề-2
19 pages
ILS L4 Transcripts PDF
75% (4)
ILS L4 Transcripts PDF
38 pages
Planificare Anuala Upstream Proficiency L1 Cls 12 Teoretic Si Vocational
No ratings yet
Planificare Anuala Upstream Proficiency L1 Cls 12 Teoretic Si Vocational
6 pages
High Seas
No ratings yet
High Seas
7 pages
An Analysis of The Wood Sugar Assay Using HPLC PDF
No ratings yet
An Analysis of The Wood Sugar Assay Using HPLC PDF
7 pages
3 Abdelilah Salim SEHLAOUI
No ratings yet
3 Abdelilah Salim SEHLAOUI
17 pages
T S Eliot Poems
No ratings yet
T S Eliot Poems
9 pages
Instant Access To The Problem Centred Interview Principles and Practice Andreas Witzel Ebook Full Chapters
100% (10)
Instant Access To The Problem Centred Interview Principles and Practice Andreas Witzel Ebook Full Chapters
77 pages
Four or Dead 1 PDF
17% (71)
Four or Dead 1 PDF
9 pages
Is LBA Mandatory For SCAN Listener in Oracle RAC?
100% (1)
Is LBA Mandatory For SCAN Listener in Oracle RAC?
8 pages
Technical Drawing 8 (Q1-Week 3)
100% (1)
Technical Drawing 8 (Q1-Week 3)
3 pages
Assam Project List & Name of The Contractor - Email
No ratings yet
Assam Project List & Name of The Contractor - Email
4 pages

DL IT324a 4

Uploaded by

DL IT324a 4

Uploaded by

Evaluation of Machine Learning

Dr. Dinesh K. Vishwakarma

Partition training data into separate training/validation sets 6

 Both classifiers gives the same precision and recall

 People usually use balanced F1 measure

 Number of Class 1 examples = 10

 If model predicts everything to be class 0,

3/30/2022 Dinesh K. Vishwakarma, Ph.D. 23

 Both classifiers gives 60% accuracy.

Different methods can

expected curve for

Best operating point

Best operating point when

Best operating point when

Threshold TP=3 FP=1 TN=2 FN=0 TPR=3/(3+0)=1 FPR=2/(2+1)=.66

Threshold TP=2 FP=1 TN=2 FN=1 TPR=2/(2+1)=.66 FPR=1/(1+2)=.66

Threshold TP=1 FP=1 TN=2 FN=2 TPR=1/(1+2)=.33 FPR=1/(1+2)=.33

Threshold TP=1 FP=0 TN=3 FN=2 TPR=1/(1+2)=.33 FPR=0

Threshold TP=2 FP=1 TN=2 FN=1 TPR=2/(2+1)=.66 FPR=1/(1+2)=.66

Threshold TP=1 FP=1 TN=2 FN=2 TPR=1/(1+2)=.33 FPR=1/(1+2)=.33

Threshold TP=1 FP=0 TN=3 FN=2 TPR=1/(1+2)=.33 FPR=0

Confidence > threshold

Confidence > threshold

 This is an ideal situation, when two curves don’t overlap

 When two distributions overlap, then type 1 and type 2

 This is the worst situation.

 When AUC is approximately 0, the model is actually

TPR⬆️, FPR⬆️ and TPR⬇️, FPR⬇️ 46

Cross entropy loss 𝒍𝒏(ෝ

You might also like