Important questions on Evaluation

Q. What is an Evaluation?
A.
Click here to know more >>


Q. Why is an Evaluation important?
A.
Click here to know more >>


Q. Which two parameters are considered for Evaluation of a model?
A.
Prediction and Reality are the two parameters considered for Evaluation of a model. The “Prediction” is the output which is given by the machine and the “Reality” is the real scenarioon which the prediction has been made. Click here to know more >>

Q. What is meant by Overfitting of Data?
A.
Models that use the training dataset during testing, will always results in correct output. This is known as overfitting. Click here to know more >>


Q. What is True Positive?
A.
Click here to know more >>


Q. What is True Negative?
A.
Click here to know more >>


Q. What is False Positive?
A.
Click here to know more >>


Q. What is False Negative?
A.
Click here to know more >>

Q. What is a confusion matrix?
A.
Click here to know more >>


Q. What is Accuracy? Mention its formula.
A.
Click here to know more >>


Q. What is Precision? Mention its formula.
A.
Click here to know more >>

Q. What is Recall? Mention its formula.
A.
Click here to know more >>


Q. What is F1 Score? What is the need of its formulation?
A.
Click here to know more >>

Q. Let us assume that you developed an AI model that tests pooled specimen(blood/urine etc.) to diagnose some ailment (say covid). After its training with sample collection of specimens whose accurate results were known to you, post testing, you are now ready to evaluate your AI model. For your AI model, you conduct about 630 tests and the confusion matrix with these 630 tests results looked like the confusion matrix below. Find accuracy, Precision, Recall and FI-Score.

SOLUTION

As per the problem, to find F1-score, we need to calculate Precision and Recall.

To find precision,

precision

Thus, the Precision rate for our sample AI model is 0.64706

To compute Recall,

Confusion matrix for AI model

Thus, the Recall for our sample AI model is 68.75 % or 0.6875.

Here, We got the values of Precision and Recall for the model. Now, we caculate the F1-score as -

Confusion matrix for AI model

Thus, the F1-score for our sample AI model is 0.666666.
In an ideal situation, both Precision and Recall will be 100 % i.e. value of 1.


Q. Give an example where High Accuracy is not usable.
A.
SCENARIO: An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of 99.99%.
Explanation: A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better than chance. In some settings, however, the cost of making even a small number of mistakes is still too high. 99.99% accuracy means that the expensive chicken will need to be replaced, on average, every 10 days. (The chicken might also cause extensive damage to cars that it hits.)


Q. Give an example where High Precision is not usable.
A.
Example: “Predicting a mail as Spam or Not Spam”
False Positive: Mail is predicted as “spam” but it is “not spam”.
False Negative: Mail is predicted as “not spam” but it is “spam”.
Of course, too many False Negatives will make the spam filter ineffective but False Positives may cause important mails to be missed and hence Precision is not usable.

Q. What are the possible reasons for an AI model not being efficient? Explain.
A.
a. Lack of Training Data: If the data is not sufficient for developing an AI Model, or if the data is missed while training the model, it will not be efficient.
b. Unauthenticated Data / Wrong Data: If the data is not authenticated and correct, then the model will not give good results.
c. Inefficient coding / Wrong Algorithms: If the written algorithms are not correct and relevant, Model will not give desired output. Not Tested: If the model is not tested properly, then it will not be efficient.
d. Not Easy: If it is not easy to be implemented in production or scalable.
e. Less Accuracy: A model is not efficient if it gives less accuracy scores in production or test data or if it is not able to generalize well on unseen data.