0% found this document useful (0 votes)
25 views76 pages

جلسه 13

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views76 pages

جلسه 13

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

‫دیشب بعد از حدود ‪ ۶‬ماه کودکان غزه آرام خوابیدند‬

‫در شش ماه گذشته دیشب اولین شبی بود که هیچکس در غزه شهید نشد‪.‬‬
‫جلسه سیزدهم‪ -‬یکشنبه ‪ ۲۶‬فروردين ‪۱۴۰۳‬‬
A confusion matrix is a square matrix that reports the counts of the true
positive (TP), true negative (TN), false positive (FP), and false negative (FN)
predictions of a classifier

Figure 6.10: A confusion matrix for our data


>>> from sklearn.metrics import confusion_matrix
>>> pipe_svc.fit(X_train, y_train)
>>> y_pred = pipe_svc.predict(X_test)
>>> confmat = confusion_matrix(y_true=y_test, y_pred=y_pred)
>>> print(confmat)
[[71 1]
[ 2 40]]

>>> fig, ax = plt.subplots(figsize=(2.5, 2.5))


>>> ax.matshow(confmat, cmap=plt.cm.Blues, alpha=0.3)
>>> for i in range(confmat.shape[0]):
... for j in range(confmat.shape[1]):
... ax.text(x=j, y=i, s=confmat[i, j],
... va='center', ha='center')
>>> ax.xaxis.set_ticks_position('bottom')
>>> plt.xlabel('Predicted label')
>>> plt.ylabel('True label')
>>> plt.show()
Assuming that class 1 (malignant) is the positive class
in this example, our model correctly classified 71 of the
examples that belong to class 0 (TN) and 40 examples
that belong to class 1 (TP), respectively. However, our
model also incorrectly misclassified two examples from
class 1 as class 0 (FN), and it predicted that one example Is
malignant although it is a benign tumor (FP). (pp. 195)
Both the prediction error (ERR) and accuracy (ACC) provide general information about how
many examples are misclassified. The error can be understood as the sum of all false
predictions divided by the number of total predictions, and the accuracy is calculated as the
sum of correct predictions divided by the total number of predictions, respectively:
The true positive rate (TPR) and false positive rate (FPR) are performance
metrics that are especially useful for imbalanced class problems:

In tumor diagnosis, for example, we are more concerned about the detection of malignant
tumors in order to help a patient with the appropriate treatment. However, it is also important
to decrease the number of benign tumors incorrectly classified as malignant (FP) to not
unnecessarily concern patients. In contrast to the FPR, the TPR provides useful information
about the fraction of positive (or relevant) examples that were correctly identified out of the
total pool of positives (P).
Both the prediction error (ERR) and accuracy (ACC) provide general information about how
many examples are misclassified. The error can be understood as the sum of all false
predictions divided by the number of total predictions, and the accuracy is calculated as the
sum of correct predictions divided by the total number of predictions, respectively:
The performance metrics precision (PRE) and recall (REC) are related to
those TP and TN rates, and in fact, REC is synonymous with TPR:

In other words, recall quantifies how many of the relevant records (the
positives) are captured as such (the true positives). Precision quantifies how
many of the records predicted as relevant (the sum of true and false positives)
are actually relevant (true positives):
F1 score is a machine learning evaluation metric that measures a model’s accuracy. It
combines the precision and recall scores of a model.

The accuracy metric computes how many times a model made a correct
prediction across the entire dataset. This can be a reliable metric only if the
dataset is class-balanced; that is, each class of the dataset has the same number
of samples.

Nevertheless, real-world datasets are heavily class-imbalanced, often making this


metric unviable. For example, if a binary class dataset has 90 and 10 samples in
class-1 and class-2, respectively, a model that only predicts “class-1,” regardless of
the sample, will still be 90% accurate. Accuracy computes how many times a
model made a correct prediction across the entire dataset. However, can this
model be called a good predictor? This is where the F1 score comes into play.
The F1 score is calculated as the harmonic
mean of the precision and recall scores. It
ranges from 0-100%, and a higher F1 score
denotes a better quality classifier.
When the positive class is considered,
the FP is 12, and the FN is 8. However,
for the negative class, the initial FP and
FN switch places. The FP is now 8, and
the FN is 12.
• Receiver operating characteristic (ROC) curves are useful tools to select models for
classification based on their performance with respect to the FPR and TPR, which are
computed by shifting the decision threshold of the classifier.
• The diagonal of a ROC graph can be interpreted as random guessing, and classification
models that fall below the diagonal are considered as worse than random guessing.
• A perfect classifier would fall into the top-left corner of the graph with a TPR of 1 and an
FPR of 0.
• Based on the ROC curve, we can then compute the so-called ROC area under the curve
(ROC AUC) to characterize the performance of a classification model.
Accuracy:
Percentage of positive instances out of the total
Precision: predicted positive instances. Take it as to find out
‘how much the model is right when it says it is right’.

Recall/Sensitivity/True Positive Rate:


Percentage of positive instances out of the total actual positive instances.
Therefore denominator (TP + FN) here is the actual number of positive
instances present in the dataset. Take it as to find out ‘how much extra right
ones, the model missed when it showed the right ones’.
Specificity:
Percentage of negative instances out of the total actual negative instances.
Therefore denominator (TN + FP) here is the actual number of negative
instances present in the dataset. It is similar to recall but the shift is on the
negative instances. Like finding out how many healthy patients were not
having cancer and were told they don’t have cancer.

F1 score:

It is the harmonic mean of precision and recall. This takes the contribution
of both, so higher the F1 score, the better.
https://www.datacamp.com/tutorial/what-is-a-confusion-matrix-in-machine-learning
Assuming that class 1 (malignant) is the positive class in this example, our model
correctly classified 71 of the examples that belong to class 0 (TN) and 40 examples
that belong to class 1 (TP), respectively. However, our model also incorrectly
misclassified 2 examples from class 1 as class 0 (FN), and it predicted that 1
example is malignant although it is a benign tumor (FP).
Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’,
‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]

Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘cat’,
‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
The F1 score is a number between 0 and 1 and is the harmonic
mean of precision and recall.
F1 score is a harmonic mean of Precision and Recall. As compared to
Arithmetic Mean, Harmonic Mean punishes the extreme values more.
F-score should be high (ideally 1).
https://neptune.ai/blog/performance-metrics-in-machine-learning-complete-guide

How to Evaluate the Performance of a


Machine Learning Model
https://www.kdnuggets.com/2020/09/performance-machine-learning-model.html
Count plot showing how many has heart disease or not
https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html?source=post_page-----
d1c0f8feda5--------------------------------
Sensitivity and Specificity
Recall, Precision and F-Measure
Recall and Precision can be reported by a measure that combines them. One
example is called F-measure, which is the harmonic mean of recall and precision:
Receiver operating characteristic (ROC) graphs are useful tools to select models
for classification based on their performance with respect to the FPR and TPR,
which are computed by shifting the decision threshold of the classifier.

The diagonal of a ROC graph can be interpreted as random guessing, and


classification models that fall below the diagonal are considered as worse than
random guessing.

A perfect classifier would fall into the top-left corner of the graph with a TPR of 1
and an FPR of 0. Based on the ROC curve, we can then compute the so-called ROC
area under the curve (ROC AUC) to characterize the performance of a classification
Figure 6.11: The ROC plot
• Class imbalance is a quite common problem when working with real-world
data—examples from one class or multiple classes are over-represented in a
dataset

• We can think of several domains where this may occur, such as spam filtering,
fraud detection, or screening for diseases

• One way to deal with imbalanced class proportions during model fitting is to
assign a larger penalty to wrong predictions on the minority class
• At the beginning of this chapter, we discussed how to chain different transformation
techniques and classifiers in convenient model pipelines that help us to train and
evaluate machine learning models more efficiently
• We then used those pipelines to perform k-fold cross-validation, one of the essential
techniques for model selection and evaluation
• Using k-fold cross-validation, we plotted learning and validation curves to diagnose
common problems of learning algorithms, such as overfitting and underfitting
• Using grid search, randomized search, and successive halving, we further fine-tuned
our model
• We then used confusion matrices and various performance metrics to evaluate and
optimize a model’s performance for specific problem tasks
• Finally, we concluded this chapter by discussing different methods for dealing with
imbalanced data, which is a common problem in many real-world applications
Now, you should be well equipped with the essential techniques to build
supervised machine learning models for classification successfully
o True
o False

You might also like