0% found this document useful (0 votes)
7 views6 pages

B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary

This document covers a module on classification techniques in machine learning, including an introduction to classification, linear regression, and decision trees. It explains key concepts such as binary and multiclass classification, confusion matrices, and performance metrics like accuracy, precision, recall, and F1-score. Additionally, it discusses logistic regression, ROC curves, and decision tree algorithms for feature importance evaluation.

Uploaded by

sunilksaini121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

B83c05aa 672f 4234 A627 Cfc944f11d45 Classification Summary

This document covers a module on classification techniques in machine learning, including an introduction to classification, linear regression, and decision trees. It explains key concepts such as binary and multiclass classification, confusion matrices, and performance metrics like accuracy, precision, recall, and F1-score. Additionally, it discusses logistic regression, ROC curves, and decision tree algorithms for feature importance evaluation.

Uploaded by

sunilksaini121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

This module has the following sessions:

● Introduction to Classification
● Linear Regression
● Decision Trees

Introduction to Classification
In this session, you learnt about classification, which is another form of a supervised learning
algorithm. Classification is the task of predicting or detecting which category (or categories) an
observation (or data point) belongs to. In classification problems, the output variables are always
discrete values (such as, ‘yes’ or ‘no’). The main difference between classification and regression is
that classification predicts discrete categories or groups, whereas regression predicts real-valued
and continuous quantities.

There are two types of classification problems:

● Binary classification: The target variable has two classes. This is the most common type of
classification problem.

● Multiclass classification: The target variable has more than two classes.

Confusion matrix is used to assess any model’s performance by estimating the correct and
incorrect predictions made for any given data sample. The following table shows an example.

Actual/Predicted True (1) False (0)

True (1) True positive False negative

False (0) False positive True negative

© upGrad Campus Private Limited. All rights reserved.


A confusion matrix can also be used to estimate the following:

● Accuracy: It is the ratio of correct predictions made by the model to the total number of
predictions. It is mathematically represented as follows:
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑁 + 𝑇𝑃
Accuracy = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
= 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑃

You learnt that accuracy might not be enough to evaluate most classification models, especially
those with imbalanced classes. This is where you were introduced to the following new metrics:

● Precision: It is the probability that a predicted ‘True’ case is actually a ‘True’ case. It can be
represented as follows:
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑃
Precision = 𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
= 𝑇𝑃 + 𝐹𝑃

● Recall: It is the probability that an actual ‘True’ case is predicted correctly. It can be
represented as follows:
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑃
Recall = 𝐴𝑐𝑡𝑢𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
= 𝑇𝑃 + 𝐹𝑁

● F1-score: It is used to check the model’s overall hygiene to ensure that neither the model’s
precision nor recall is too far off. It can be represented as follows:
2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙
F1-score = 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

© upGrad Campus Private Limited. All rights reserved.


Logistic Regression
A sigmoid curve is used in logistic regression to assign probabilities to each observation. The
sigmoid curve equation (for one independent variable) is as follows:
1
y = P(True) = −(β0+β1𝑥) .
1+𝑒

Then, we used Maximum Likelihood Estimation (MLE) to find the best values of β0 and β1 so that
the likelihood function has the highest value. You performed this operation using the ‘Real
Statistics’ add-in in Excel.

You also learnt that the relationship between x (the input value or feature) and probability is not
linear, so we transformed the equation such that the relationship between x and log odds is linear.
Hence, we got the following:
𝑃
Log(odds) = ln( 1−𝑃 ) = β0 + β1x

For multiple independent variables, the equation for logistic regression is as follows:
1
y = P(True) = (
− β0+β1𝑥1+β2𝑥2+β3𝑥3…+β𝑛𝑥𝑛 )
1+𝑒

The ‘Real Statistics’ add-in also calculates the probability of each observation using the optimal
beta values and given values of independent variables. The final predictions are then made based
on a particular cut-off.

You were also introduced to some new metrics to help you decide the optimal cut-off for
prediction. They are as follows:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑦𝑒𝑠𝑒𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑇𝑃


● Sensitivity = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑦𝑒𝑠𝑒𝑠
= 𝑇𝑃 + 𝐹𝑁 = True positive rate
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑛𝑜𝑠 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑇𝑁 𝐹𝑃
● Specificity = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑛𝑜𝑠
= 𝑇𝑁 + 𝐹𝑃 = 1 – 𝑇𝑁 + 𝐹𝑃 = 1 –
False positive rate

© upGrad Campus Private Limited. All rights reserved.


Receiver Operating Characteristics Curve (ROC Curve) is used to assess the diagnostic
capability of any binary classifier system when the cut-offs are varied. The following image shows
the image of an ROC curve.

The following are the steps for plotting a ROC curve:

1. Choose a cut-off probability value for a given model.

2. Calculate the true positive rate, also known as recall or sensitivity.

3. Calculate the false positive rate, which is defined as (1-Specificity).

4. Plot the point on the graph.

5. Repeat the first and fourth steps for different cut-off probability values to arrive at
different points on the graph, which you can then connect to plot the curve.

Similarly, different ROC curves can be plotted for different models (different beta values), and the
best model will have the highest area under the ROC curve. You also observed from the ROC curve
that TPR and FPR have a positive relationship; this implies that sensitivity and specificity have a
negative relationship.

You learnt the following about determining the optimal cut-off:

● The optimal cut-off depends on the business context, and one may also need to trade-off
between the metrics, sensitivity and specificity, to determine the same.

● One method to select a cut-off is to ensure that the values of all metrics, i.e., accuracy,
specificity and sensitivity, are almost equal.

© upGrad Campus Private Limited. All rights reserved.


● The optimal cut-off probability of an ROC curve is the one which maximises the TPR and
minimises the FPR.

Decision Trees
A decision tree is a kind of classification algorithm. A classification problem can be approached in
two ways: descriptive way and discriminative way. Decision trees use an algorithm that follows the
discriminative way of classification. They do not need pre-defined classification rules, such as
logistic regression. So, in this case, you do not need to define a cut-off score to divide the data set
into two classes. Rather, decision trees use an algorithm that selects the most important feature
for classification, then the second most and so on.

To determine which features are more important than others, you need to evaluate the purity of a
feature. Purity must be calculated at a block level before calculating the feature’s purity. The more
skewed the ratio of positive and negative outputs for any block is, the purer a block is and the
easier it is to predict (0 or 1) for that block.

Following are the three essential metrics that can be used to calculate purity:

● Accuracy: It is defined as max(P1, P2), where P1 and P2 are the probabilities of the two
classes that occur in any particular region. For example, in the case discussed in the
module’s fourth segment, The Decision Tree Algorithm, P1 and P2 were the probabilities of
liking or not liking a particular block.

● Gini score: It is generally considered a better metric for purity than accuracy and can be
calculated as P1 × accuracy(P1) + P2 × accuracy(P2).

● Information gain: It is given by (1 - entropy), and entropy is defined as Σpi × log2(1/pi). In the
case discussed in the module’s fourth segment, The Decision Tree Algorithm, since there
were two classes, i.e., a block can either be liked or disliked, the entropy formula becomes
P1 × log2(P1) + P2 × log2(P2).

After calculating the purity of a block, you need to take the sum-product of the purity of the blocks
(with respect to a feature’s subcategory) and their corresponding sizes. This will give you the
purity of one of the feature’s subcategories. This process must be repeated for all subcategories of
that feature. The purity of each sub-category is then multiplied by their respective sizes to get the
purity of a feature. This process must be repeated for each feature. Once the purity of each
feature is calculated (with respect to one of the three metrics listed above), one can simply identify
the most important feature by determining which feature has the highest score.

© upGrad Campus Private Limited. All rights reserved.


Disclaimer: All content and material on the upGrad Campus website is copyrighted material, either
belonging to upGrad Campus or its bona fide contributors and is purely for the dissemination of
education. You are permitted to access, print, and download extracts from this site purely for your own
education only and on the following basis:

● You can download this document from the website for self-use only.

● Any copies of this document, in part or full, saved to disk or to any other storage medium, may
only be used for subsequent, self-viewing purposes or to print an individual extract or copy for
non-commercial personal use only.

● Any further dissemination, distribution, reproduction or copying of the content of the document
herein or the uploading thereof on other websites, or use of the content for any other
commercial/unauthorized purposes in any way which could infringe the intellectual property
rights of upGrad Campus or its contributors, is strictly prohibited.

● No graphics, images, or photographs from any accompanying text in this document will be used
separately for unauthorized purposes.

● No material in this document will be modified, adapted, or altered in any way.

● No part of this document or upGrad Campus content may be reproduced or stored on any other
website or included in any public or private electronic retrieval system or service without prior
written permission from upGrad Campus.

● Any rights not expressly granted in these terms are reserved.

© upGrad Campus Private Limited. All rights reserved.

You might also like