Evaluation Question Answers
Evaluation Question Answers
Artificial Intelligence
Unit: 7 Evaluation
Questions and Answers
While in modelling we can make different types of models, how do we check if one’s
better than the other? That’s where Evaluation comes into play.
1. What is evaluation?
The process of understanding the reliability of any AI model, based on outputs by feeding test dataset
into the model and comparing with actual answers.
2. Define Evaluation.
Moving towards deploying the model in the real world, we test it in as many ways as possible. The
stage of testing the models is known as EVALUATION.
OR
Evaluation is a process that critically examines a program. It involves collecting and analyzing
information about a program’s activities, characteristics, and outcomes. Its purpose is to make
judgments about a program, to improve its effectiveness, and/or to inform programming decisions.
3. Why are we not using the same training data for testing (evaluation) purpose?
This is because our model will simply remember the whole training set, and will therefore always
predict the correct label for any point in the training set.
That is, it takes into account the True Positives and False Positives.
14. What is Recall? Mention its formula.
Recall is defined as the fraction of positive cases that are correctly Identified.
When we have a value of 1 (that is 100%) for both Precision and Recall. The F1 score would also be
an ideal 1 (100%). It is known as the perfect value for F1 Score. As the values of both Precision and
Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1.
17. Which evaluation metric would be crucial in the following cases? Justify your answer.
a. Mail Spamming
b. Gold Mining
c. Viral Outbreak
Here, Mail Spamming and Gold Mining are related to FALSE POSITIVE cases which are expensive at
cost.
But Viral Outbreak is a FALSE NEGATIVE case which infects a lot of people on health and leads to
expenditure of money too for checkups.
So, False Negative case (VIRAL OUTBREAK) are more crucial and dangerous when compared to
FALSE POSITIVE cases.
(OR)
a. If the model always predicts that the mail is spam, people would not look at it and eventually
might lose important information. False Positive condition would have a high cost. (predicting
the mail as spam while the mail is not spam)
b. A model saying that there exists treasure at a point and you keep on digging there but it turns
out that it is a false alarm. False Positive case is very costly. (predicting there is a treasure but
there is no treasure)
c. A deadly virus has started spreading and the model which is supposed to predict a viral
outbreak does not detect it. The virus might spread widely and infect a lot of people. Hence,
False Negative can be dangerous
18. What are the possible reasons for an AI model not being efficient? Explain.
Reasons of an AI model not being efficient:
a. Lack of Training Data: If the data is not sufficient for developing an AI Model, or if the data is
missed while training the model, it will not be efficient.
b. Unauthenticated Data / Wrong Data: If the data is not authenticated and correct, then the
model will not give good results.
c. Inefficient coding / Wrong Algorithms: If the written algorithms are not correct and relevant,
Model will not give desired output. Not Tested: If the model is not tested properly, then it will
not be efficient.
d. Not Easy: If it is not easy to be implemented in production or scalable.
e. Less Accuracy: A model is not efficient if it gives less accuracy scores in production or test
data or if it is not able to generalize well on unseen data.
19. Answer the following:
Give an example where High Accuracy is not usable.
SCENARIO: An expensive robotic chicken crosses a very busy road a thousand times per day.
An ML model evaluates traffic patterns and predicts when this chicken can safely cross the
street with an accuracy of 99.99%.
Explanation: A 99.99% accuracy value on a very busy road strongly suggests that the ML
model is far better than chance. In some settings, however, the cost of making even a small
number of mistakes is still too high. 99.99% accuracy means that the expensive chicken will
need to be replaced, on average, every 10 days. (The chicken might also cause extensive
damage to cars that it hits.)
Give an example where High Precision is not usable.
Example: “Predicting a mail as Spam or Not Spam”
False Positive: Mail is predicted as “spam” but it is “not spam”.
False Negative: Mail is predicted as “not spam” but it is “spam”.
Of course, too many False Negatives will make the spam filter ineffective but False Positives
may cause important mails to be missed and hence Precision is not usable.
Four (04) Mark Questions
1. Deduce the formula of F1 Score? What is the need of its formulation?
The F1 Score, also called the F score or F measure, is a measure of a test’s accuracy. It
is calculated from the precision and recall of the test, where the precision is the
number of correctly identified positive results divided by the number of all positive
results, including those not identified correctly, and the recall is the number of
correctly identified positive results divided by the number of all samples that should
have been identified as positive. The F1 score is defined as the weighted harmonic
mean of the test’s precision and recall. This score is calculated according to the
formula.
Formula:
Necessary:
F-Measure provides a single score that balances both the concerns of precision and recall in one
number.
A good F1 score means that you have low false positives and low false negatives, so you’re
correctly identifying real threats, and you are not disturbed by false alarms.
An F1 score is considered perfect when it’s 1, while the model is a total failure when it’s 0.
F1 Score is a better metric to evaluate our model on real-life classification problems and when
imbalanced class distribution exists.
Confusion Matrix:
A Confusion Matrix is a table that is often used to describe the performance of a classification
model (or "classifier") on a set of test data for which the true values are known.
A 2x2 matrix denoting the right and wrong predictions might help us analyse the rate of success.
This matrix is termed the Confusion Matrix.
Evaluation of the performance of a classification model is based on the counts of test records
correctly and incorrectly predicted by the model.
Therefore, Confusion Matrix provides a more insightful picture which is not only the
performance of a predictive model, but also which classes are being predicted correctly and
incorrectly, and what type of errors are being made.
The confusion matrix is useful for measuring Recall (also known as Sensitivity), Precision,
Accuracy and F1 Score.
The following confusion matrix table illustrates how the 4-classification metrics are calculated
(TP, FP, FN, TN), and how our predicted value compared to the actual value in a confusion
matrix.
Let’s decipher the matrix:
True Positive, True Negative, False Positive and False Negative in a Confusion Matrix
The actual value was negative but the model predicted a positive value. Also known as the Type
1 error
Example:
Case: Loan (Good loan & Bad loan)
The result of TP will be that bad loans are correctly predicted as bad loans.
The result of TN will be that good loans are correctly predicted as good loans.
The result of FP will be that (actual) good loans are incorrectly predicted as bad loans.
The result of FN will be that (actual) bad loans are incorrectly predicted as good loans.
The banks would lose a bunch of money if the actual bad loans are predicted as good loans due to
loans not being repaid. On the other hand, banks won't be able to make more revenue if the
actual good loans are predicted as bad loans. Therefore, the cost of False Negatives is much
higher than the cost of False Positives.