0% found this document useful (0 votes)
3 views14 pages

Data Science

The document outlines key evaluation metrics used in data science, including confusion matrix, precision, recall, F1-score, accuracy, true positive rate, and false positive rate. It provides definitions and formulas for each metric, along with examples to illustrate their application. Additionally, it discusses advanced metrics like the area under the ROC curve, Dice score, and Intersection over Union (IoU) for assessing model performance.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views14 pages

Data Science

The document outlines key evaluation metrics used in data science, including confusion matrix, precision, recall, F1-score, accuracy, true positive rate, and false positive rate. It provides definitions and formulas for each metric, along with examples to illustrate their application. Additionally, it discusses advanced metrics like the area under the ROC curve, Dice score, and Intersection over Union (IoU) for assessing model performance.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ED5340 - Data Science: Theory apa th y

and Practise an
h ug
u t
M
a n
L24 - Evaluation Metrics n a th
a
Ram

Ramanathan Muthuganapathy (https://ed.iitm.ac.in/~raman)


Course web page: https://ed.iitm.ac.in/~raman/datascience.html
Moodle page: Available at https://courses.iitm.ac.in/
Classification

• Confusion Matrix
h y
a t
• Precision n ap
a
h ug
• Recall u t
M
a n
• F1-Score n a th
a
• True positive rate Ram

• False positive rate


• Accuracy
• AUC
Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras
Confusion Matrix

h y
a t
Actual Class ap TP - True Positive
a n
1 u g 0
th
u FP - False Positive
M
a n
t h
Predicted Class

TP n a FP
1 a FN - False Negative
a m
R
TN - True Negative

FN TN
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Details

• TP (True Positive) - Actual and Prediction are both


h y positive.
a t
ap
• FP (False Positive) - Actual is false but thegaprediction n is true (Prediction cancer
when there is no such case). h u
u t
M
a n
• FN (False Negative) - Actual is true n a t hbut the prediction is false (Prediction no
cancer when there is one). a
a m
R
• TN (True Negative) - Actual and Prediction are both negative.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Precision and recall

• Precision - Of all the positive predicted cases, hwhat


y is the fraction that is
actually positive? a t
ap
a n Actual Class
TP u th ug 1 0
• P = M
TP + FP a th a n

Predicted Class
a n TP FP
am 1
R

FN TN
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Precision and recall

• Recall - Of all the actual positive cases, what ishythe fraction that has been
correctly predicted? a t
ap
a n Actual Class
TP u th ug 1 0
• R = M
TP + FN a th a n

Predicted Class
a n TP FP
am 1
R

FN TN
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Example

• Dataset - 50 cases, 40 true and 10 h y


false a t
ap
a n Actual Class
TP u th ug 1 0
• P = M
TP + FP a th a n

Predicted Class
n
TP Ram
a 1 30 FP

• R =
TP + FN
FN 3
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Precision or Recall

• E.g. - Email spam filter h y


a t Actual Spam
ap
• High precision or high recall an
1 0
h ug
u t
• FP - Genuine email getting a n
M

Predicted Spam
classified as spam th TP FP
n a 1
a
am
• R
FN - Spam coming to your
inbox
FN TN
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


F1 - Score (Harmonic mean)

• Dataset - 50 cases, 40 true and 10 false h y


a t
p
P*R a n a Actual Class
• F 1 = 2 * h u g
1 0
P+R M
u t
a n
th
a

Predicted Class
a n 30 FP
am 1
R

FN 3
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Accuracy

• 50 cases, 40 true and 10 false h y


a t
p
TP + TN a n a Actual Class
Acc = ug
• TP + FP + FN + TN u th 1 0
M
a n
th
a

Predicted Class
a n 30 FP
am 1
R

FN 3
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


TPR and FPR

• Dataset - 50 cases, 40 true and 10 h y


false a t
ap
a n Actual Class
TP u th ug 1 0
• TPR = M
TP + FN a t h a n

Predicted Class
n
FP R a m
a 1 TP FP

• FPR = (negative cases


FP + TN
being predicted incorrectly
FN TN
0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Area under ROC curve (AUC)
TPR and FPR

• Higher the area, the better. h y


a t
ap
• Qn: How to get this curve? an
h ug
u t

TPR
M
a n
th
n a
a
Ram

FPR

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Dice score / coefficient (pixel data)

|A ∩ B|
DC = 2 * th y
• |A| + |B| n apa
a
ug
Areaofintersection u th
DC = 2 * n
M
• Sumofthetwoareas a th a
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


IoU (Intersection over union)

|A ∩ B| y
IoU = th
• |A ∪ B| n apa
a
ug
Areaofintersection M
u th
DC = n
• Areaoftheunion a th a
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

You might also like