0% found this document useful (0 votes)

24 views22 pages

Evaluation Metricsflaksdj Fa

The document discusses the evaluation of machine learning (ML) models, emphasizing the importance of understanding the model's purpose and the appropriate metrics for assessment. It outlines methods for model evaluation, including holdout testing and cross-validation, and highlights key performance metrics such as precision, recall, and F1 score. The document concludes that no single metric is universally applicable, and careful consideration is needed to avoid overfitting during testing.

Uploaded by

Safyan Tariq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views22 pages

Evaluation Metricsflaksdj Fa

Uploaded by

Safyan Tariq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Evaluating Machine Learning Models

Violeta Menéndez González

[email protected]
What is Machine Learning?

Image by Manfred Steger from Pixabay

 It’s a data analytics technique that provides systems with the

ability to automatically learn and improve from experience
without being explicitly programmed.

 They learn on sample data to make predictions or decisions in

future data.
What is a good ML model?

 What task is the model designed to perform?

 How can we assess whether a model is doing it well? (How would
you evaluate a human’s performance?)

Image by Ahmed Gad from Pixabay

There is no one type of model that definitively solves any problem;

neither is there any one set of definitive set of data that produces
the best predictions.
Model evaluation
 Carefully analyse the model’s outputs to evaluate whether they
are meeting the goals that we set up for it.

 This will allow us to compare models.

Chabacano / CC BY-SA
(https://creativecommons.org/licenses/by-sa/4.0)
Model evaluation

 To prevent overfitting we split the data we have:

 Some to train the model.
 Some to test the model (“fake” future data).

 Split data methods:

 Holdout: train-validation-test data sets.
 Cross-validation: k-fold cross validation.
Holdout testing
 Data is divided into train/validation/test sets.
 Split is usually 60/20/20, but if we have a lot of data it’s enough to
test in smaller proportions (~1,000,000 data 98/1/1 split) as long as
it has high confidence in the overall performance.
 Validation and test sets should come from the same distribution,
something that reflects future data.
 Model learns from training set. Test on validation data. If
performance is not good enough we do another round of test until
performance is good enough, then we test on test data.
 It’s like studying for an exam: Study → Practice test → Refocus
studying → (repeat) → Final exam.
Cross-validation testing

 Divide data into k number of sets (k-folds).

 Leave one set out for testing, train on k-1 sets. Repeat for all
combinations of sets. Then compute an average in each of
the tests.
 Useful when we have limited amounts of data.

Gufosowa / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

Model metrics
 A way of quantifying how well a model is performing at a certain
task.
 Different type of models need different type of metrics.
 It’s like scores of an exam. They score each prediction as right
or wrong and produce an overall score of the model.
 We can give more importance to a type of performance, like
we can give more value to certain questions on an exam.

Image by Manfred Steger from Pixabay

Classification models
 Predict a class for each input.
 Binary classification: is picture a cat or not.
 Multi-class: is picture a cat, a dog or an owl.
 Multi-label: object in picture is a cat, an animal, black.
 Output is commonly the probability of an input belonging to a
class. We can change the decision threshold.
Classification model evaluation

 Intuitively we tend to use accuracy: how many right predictions

we got over all predictions made.
 This doesn’t work well for imbalanced problems!
 For example, a fraud detection algorithm that predicts all
transactions to be valid. If we have 5% of fraud transactions
the accuracy is 95%! But the model is useless.

Image by mohamed Hassan from Pixabay

Confusion matrix

 Model prediction vs reality.

Precision and recall
 Precision: of all the positive predictions, how many were actually
positive?
 Recall: of all the actual positive results, how many did the model
predict were positive?
Precision and recall
 Example: of 100 transactions 5% are fraud.
 Model 1: All transactions are detected as fraud.

 Model 2: Only 1 transaction correctly detected as fraud.

F1 score

 We want a compromise on precision and recall.

 We use the harmonic mean because it punishes extreme
values.
 We can adjust the importance we give each of them.
 In critical system models we may want to favour recall as

to not miss any positive results.

Precision-recall curve

 Shows the precision vs recall trade-off as we vary the threshold

for identifying a positive in our model.
 Appropriate for imbalanced data.

Jason Brownlee, ROC Curves and Precision-Recall Curves for Imbalanced Classification, Machine Learning Mastery, Available from
https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/, accessed May 3rd, 2020.
Receiver Operating Characteristic curve
 Shows how the true positive rate vs the false positive rate
changes as we vary the model’s threshold.
 TPR – recall.
 FPR – probability of a false alarm.
 Better for balanced data.

 We can quantify a model’s curve by calculating the total Area

Under the Curve (AUC).
 The AUC summarises the skill of a model across thresholds.
 The F-score summarises model skill for a specific threshold.
Regression models

 Models a target prediction value given previous values.

 House price prediction.
 Stock price prediction.

 Training and testing methods are the same, what differs is how
we assess the models.
Regression metrics
 Most common metrics are:
 Mean Absolute Error (MAE).
 Mean Squared Error (MSE).
 MSE penalises large errors more than MAE.
Clustering models
 Grouping of data points:
 Unsupervised learning, we don’t have a label with
groundtruth.
 We use metrics that measure how self-similar observations
are in the same cluster.
 Doesn’t really measure the validity of the predictions but can
help us comparing models.

Jason Brownlee, 10 Clustering Algorithms With Python, Machine Learning Mastery, Available
from https://machinelearningmastery.com/clustering-algorithms-with-python/, accessed May 3rd,
2020.
Conclusions?
 It’s very important to understand what a model does and what we
want to measure.

 One metric does NOT fit all!

 Metrics are only useful if you can compare them to something.

 Be careful when testing your models, you may be overfitting.

The end

 Thanks!

Photo by Franck V. on Unsplash

AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
How to Evaluate Machine Learning Models
No ratings yet
How to Evaluate Machine Learning Models
14 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
ML Unit IV
No ratings yet
ML Unit IV
70 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Machine Learning for Data Analysts
No ratings yet
Machine Learning for Data Analysts
31 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
MACHINELEARNING
No ratings yet
MACHINELEARNING
20 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
Unit 2 Part 2 Data Science Final 23june
No ratings yet
Unit 2 Part 2 Data Science Final 23june
39 pages
Clase10 11
No ratings yet
Clase10 11
18 pages
Machine Learning # 2
No ratings yet
Machine Learning # 2
17 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
MachineLearning Chatgpt
No ratings yet
MachineLearning Chatgpt
19 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Module3 DS PPT
No ratings yet
Module3 DS PPT
68 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Model Evaluation
No ratings yet
Model Evaluation
31 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Mod8 DM
No ratings yet
Mod8 DM
13 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
Model Evaluation
No ratings yet
Model Evaluation
44 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Unit 3 ML
No ratings yet
Unit 3 ML
40 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
CHP 3
No ratings yet
CHP 3
70 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
QB ML Ans
No ratings yet
QB ML Ans
14 pages
Bi Unit 5
No ratings yet
Bi Unit 5
20 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
AIMl TA2
No ratings yet
AIMl TA2
4 pages
Unit3 Evaluating Models
No ratings yet
Unit3 Evaluating Models
10 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
ML - Training - Evaluation For Machine Learning Course
No ratings yet
ML - Training - Evaluation For Machine Learning Course
31 pages
Classification
100% (2)
Classification
105 pages
UNIT 4 1 ConfusionMatrix
No ratings yet
UNIT 4 1 ConfusionMatrix
33 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Classification
No ratings yet
Classification
22 pages
Data Mining Evaluation Metrics Guide
No ratings yet
Data Mining Evaluation Metrics Guide
40 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
CDA Assignment4
No ratings yet
CDA Assignment4
12 pages
Machine Learning Techniques Unit-2
No ratings yet
Machine Learning Techniques Unit-2
100 pages
Weighted Least SQ
No ratings yet
Weighted Least SQ
5 pages
Understanding Multiple Regression Analysis
No ratings yet
Understanding Multiple Regression Analysis
38 pages
Introduction to Regression Analysis in Econometrics
No ratings yet
Introduction to Regression Analysis in Econometrics
47 pages
Chapter Two
No ratings yet
Chapter Two
19 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
Forecasting Milk Sales in Dade County
No ratings yet
Forecasting Milk Sales in Dade County
19 pages
Multiple Linear Regression 1
No ratings yet
Multiple Linear Regression 1
8 pages
DID101R
No ratings yet
DID101R
5 pages
K-Nearest Neighbour - Jupyter Notebook
No ratings yet
K-Nearest Neighbour - Jupyter Notebook
2 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Bayesian R-squared for Regression Models
No ratings yet
Bayesian R-squared for Regression Models
6 pages
Statistical Mechanics Midterm Exam
No ratings yet
Statistical Mechanics Midterm Exam
1 page
Binary Logistic Regression
No ratings yet
Binary Logistic Regression
1 page
Optimization of Alpha Parameters in Single Exponen v1
No ratings yet
Optimization of Alpha Parameters in Single Exponen v1
6 pages
Measures of Asso
No ratings yet
Measures of Asso
30 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Chapter 12 Notes Gas Laws Notes
No ratings yet
Chapter 12 Notes Gas Laws Notes
2 pages
Entropy and Probability
No ratings yet
Entropy and Probability
8 pages
Module 4
No ratings yet
Module 4
12 pages
Linear Regression & Residuals Guide
No ratings yet
Linear Regression & Residuals Guide
59 pages
Analysis of Variance of A One-Way Layout With Random Effects Model
No ratings yet
Analysis of Variance of A One-Way Layout With Random Effects Model
8 pages
DTB (ch5)
No ratings yet
DTB (ch5)
14 pages
ANLY510 Exam2 Choices
No ratings yet
ANLY510 Exam2 Choices
3 pages
ML Test Questions 3 Confusion Matrix
No ratings yet
ML Test Questions 3 Confusion Matrix
5 pages
Dummy Variables
No ratings yet
Dummy Variables
8 pages
Foodborne Outbreak Detection Tool
No ratings yet
Foodborne Outbreak Detection Tool
21 pages
Detecting Heteroscedasticity Methods
No ratings yet
Detecting Heteroscedasticity Methods
16 pages
Multiple Regression Analysis: DR J Reeves Wesley Professor VIT Business School Reeveswesley.j@vit - Ac.in
No ratings yet
Multiple Regression Analysis: DR J Reeves Wesley Professor VIT Business School Reeveswesley.j@vit - Ac.in
19 pages