Multiclass Prediction

The document discusses multi-class classification techniques, focusing on SoftMax regression, One-vs-All, and One-vs-One methods. SoftMax regression generalizes logistic regression for multi-class scenarios, while One-vs-All creates multiple binary classifiers for each class, and One-vs-One trains classifiers for each pair of classes. Each method has its own advantages and challenges, particularly in handling ambiguous classification regions.

Uploaded by

Shani Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

Multiclass Prediction

Uploaded by

Shani Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Multiclass Prediction

SoftMax Regression, One-vs-All & One-vs-One for Multi-class Classification

In Multi-class classification, we classify data into multiple class labels. Unlike
classification trees and nearest neighbors, the concept of Multi-class classification
for linear classifiers is not as straightforward. We can convert logistic regression to
Multi-class classification using multinomial logistic regression or SoftMax regression;
this is a generalization of logistic regression. SoftMax regression will not work for
Support Vector Machines (SVM); One vs. All (One-vs-Rest) and One vs One are two
other multi-class classification techniques that can convert most two-class
classifiers to a multi-class classifier.
SoftMax Regression

the actual distances i.e. dot products of 𝑥 with each of the parameters θiθi for 𝐾
SoftMax regression is similar to logistic regression, the SoftMax function converts

classes in the range from 0 to 𝐾-1. This is converted to probabilities using the
following formula.
softmax(x,i)=e−θiTx∑j=1Ke−θjTxsoftmax(x,i)=∑j=1Ke−θjTxe−θiTx (1)
The training procedure is almost identical to logistic regression using cross-entropy,
but the prediction is different . Consider the three-class example
where y∈{0,1,2}y∈{0,1,2} i.e yy can equal 0,1,2. We would like to classify xx. We
can use the SoftMax function to generate a probability of how likely the sample
belongs to each class. We then make a prediction using the argmaxargmax
function:
y^=argmaxi(softmax(x,i))y^=argmaxi(softmax(x,i)) (2)
Let’s do an example, consider sample x1x1, we will start by creating a table where
each column will be the i−thi−th values of the SoftMax function. The index of each
column is the same as the class.

probability of y^=0y^=0 probability of y^=1y^=1 probability of y^=2y^=2

softmax(x1,0)softmax(x1,0) softmax(x1,1)softmax(x1,1) softmax(x1,2)softmax(x1,2)

i=0i=0 i=1i=1 i=2i=2

Table 1. Each column will be the i-th values of the SoftMax function. The
index of each column is the same as the class.
Let’s add some real probabilities , this is the models estimate of how likely a sample
belongs to each class.

0.97 0.02 0.01

i=0i=0 i=1i=1 i=2i=2

Table 2. Table of real probabilities. Each column will be the i-th values of
the SoftMax function. The index of each column is the same as the class.
We can represent the probability as a vector [0.97,0,02,0.01]. To get the class we
simply apply the argmaxargmax function, this returns the index of the largest value.
y^=argmaxi([0.97,0.02,0.01])y^=0y^=argmaxi([0.97,0.02,0.01])y^=0
Geometric Interpretation
Each θiTxθiTx is the equation of a hyperplane, we plot the intersection of the three
hyperplanes with 0 in fig 1 as colored lines, in addition, we can overlay several
training samples. We also shade the regions where the value of θiTxθiTx is largest,
this also corresponds to the largest probability. This color corresponds to where a
sample xx would be classified. For example if the input is in the blue region, the
sample would be classified y^=0y^=0, If the input is in the red region it would be
classified as y^=1y^=1, and in the yellow region y^=2y^=2. We will use this
convention going forward.

Fig 1. Equation of a hyperplane. We plot the intersection of the three

hyperplanes with 0, in addition we can overlay several samples. We also
shade the regions where the value of i is largest.
One problem with SoftMax regression with cross-entropy is it cannot be used for
SVM and other types of two-class classifiers.
One vs. All (One-vs-Rest)
For one-vs-All classification, if we have KK classes, we use KK two-class classifier
models, the number of class labels present in the dataset is equal to the number of
generated classifiers. First, we create an artificial class we will call this "dummy"
class. For each classifier, we split the data into two classes. We take the class
samples we would like to classify; the rest of the samples will be labelled as a
dummy class. We repeat the process for each class. To make a classification, we
can use majority vote or use the classifier with the highest probability, disregarding
the probabilities generated for the dummy class.
Although classifiers such as logistic regression and SVM class values are {0,1}
{0,1} and {−1,1}{−1,1} respectively we will use arbitrary class values. Consider
the following samples colored according to class y=0y=0 for blue, y=1y=1 for red,
and y=2y=2 for yellow:

Fig 2. Samples colored according to class.

For each class we take the class samples we would like to classify, and the rest will
be labeled as a “dummy” class. For example, to build a classifier for the blue class
we simply assign all other labels that are not in the blue class to the Dummy class,
we then train the classifier accordingly. The result is shown in fig 3 where the
classifier predicts blue y^=0y^=0 and in the purple region where we have our
“dummy class” y^=dummyy^=dummy.

Fig 3. The classifier predicts blue y^=0y^=0 in blue region and dummy
class y^=dummyy^=dummy in purple region.
We repeat the process for each class as shown in Fig 4, the actual class is shown
with the same color and the corresponding dummy class is shown in purple. The
color of the space is the actual classifier predictions shown in the same manner as
above.
Fig 4. The classifier predicts y^=0,1,2y^=0,1,2 in blue, red, and yellow
region and dummy class y^=dummyy^=dummy in purple region.
For a sample in the blue region, we would get the following output shown in table 3.
You would disregard the dummy classes and select the output y^0=0y^0=0 in
yellow where the subscript is the classifier number.

Classifier 0 Classifier 1 Classifier 2

y^=0y^=0 y^=dummyy^=dummy y^=dummyy^=dummy

Table 3. Example classification output, 2 of the 3 outputs are dummy;

these classifiers would be ignored and the class would be zero.
One issue with one vs all is the ambiguous regions as shown in Fig 5 in purple. In
these regions you may get multiple classes for example y^0=0y^0=0 and
y^1=1y^1=1 or all the outputs will equal ”dummy.”
Fig 5. The classifier predicts all outputs y^0y^0, y^1y^1y^2y^2 will equal
"dummy."
There are several ways to reduce this ambiguous region, you can use the output
based on the output of the linear function this is called the fusion rule. We can also
use the probability of a sample belonging to the actual class as shown in Fig 6,
where we select the class with the largest probability in this case y^0=0y^0=0; we
disregard the dummy values. These probabilities are scores, as the probabilities are
between the dummy class and the actual class not between classes. Just a note
packages like Scikit-learn can output probabilities for SVM.
Fig 6. Probability of a sample belonging to the actual class compared to
dummy variable, selects the class with the highest probability.
One-vs-One classification
In One-vs-One classification, we split up the data into each class; we then train a
two-class classifier on each pair of classes. For example, if we have class 0,1, and 2,
we would train one classifier on the samples that are class 0 and class 1, a second
classifier on samples that are of class 0 and class 2, and a final classifier on samples
of class 1 and class 2. Fig 7 is an example of class 0 vs class 1, where we drop
training samples of class 2. Using the same convention as above where the color
of the training samples are based on their class. The separating plane of the
classifier is in black. The color represents the output of the classifier for that
particular point in space.
Fig 7. Probability of a sample belonging to the actual class compared to
dummy variable , select the class with the highest probability.
We repeat the process for each pair of classes, in Fig 8. For KK classes, we have to
train K(K−1)/2K(K−1)/2classifiers. So if K=3K=3, we have
(3×2)/2=3(3×2)/2=3classes.
Fig 8. Probability of a sample belonging to the actual class compared to
dummy variable, select the class with the highest probability.
To perform Classification on a sample, we perform a majority vote where we select
the class with the most predictions. This is shown in Fig 9 where the black point
represents a new sample and the output of each classifier is shown in the table. In
this case, the final output is one as selected by two of the three classifiers. There is
also an ambiguous region but it’s smaller, we can also use similar schemes as in
One vs all like the fusion rule or using the probability. Check out the labs for more.

Multiclass Classification Techniques
No ratings yet
Multiclass Classification Techniques
45 pages
One vs. All vs. One vs. One Classification
No ratings yet
One vs. All vs. One vs. One Classification
16 pages
Understanding Multi-Class SVM Techniques
No ratings yet
Understanding Multi-Class SVM Techniques
19 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
74 pages
Classification Techniques in MLPR
No ratings yet
Classification Techniques in MLPR
5 pages
Multiclass Classification Techniques
No ratings yet
Multiclass Classification Techniques
46 pages
ECOC in Multi-Class Classification
No ratings yet
ECOC in Multi-Class Classification
23 pages
Multiclass Classification Techniques
No ratings yet
Multiclass Classification Techniques
20 pages
Understanding Binary Classification Models
No ratings yet
Understanding Binary Classification Models
32 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
49 pages
Beyond Binary: Multi-Class Classification
No ratings yet
Beyond Binary: Multi-Class Classification
16 pages
Multiclass Classification in CS 373
No ratings yet
Multiclass Classification in CS 373
33 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
54 pages
Multiclass Classification in Machine Learning
No ratings yet
Multiclass Classification in Machine Learning
28 pages
Multiclass Classification Techniques
No ratings yet
Multiclass Classification Techniques
17 pages
Support Vector Machine Lecture Notes
No ratings yet
Support Vector Machine Lecture Notes
13 pages
Supervised Learning: Classification Types
No ratings yet
Supervised Learning: Classification Types
16 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
4 pages
Types of Machine Learning Classification
No ratings yet
Types of Machine Learning Classification
14 pages
Machine Learning Classification Types
No ratings yet
Machine Learning Classification Types
21 pages
Supervised Learning: Classification Overview
No ratings yet
Supervised Learning: Classification Overview
14 pages
Overview of Pattern Classifiers
No ratings yet
Overview of Pattern Classifiers
42 pages
Softmax Regression for Multiclass Classification
No ratings yet
Softmax Regression for Multiclass Classification
9 pages
Data Preparation for Machine Learning
No ratings yet
Data Preparation for Machine Learning
47 pages
SVM Classification Basics and Hinge Loss
No ratings yet
SVM Classification Basics and Hinge Loss
5 pages
Softmax Regression for Image Classification
No ratings yet
Softmax Regression for Image Classification
9 pages
Final Class Prediction in Classification
No ratings yet
Final Class Prediction in Classification
51 pages
Machine Learning in EDA Tools
No ratings yet
Machine Learning in EDA Tools
150 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
53 pages
Pattern Recognition: Zhe Wang, Zonghai Zhu, Dongdong Li
No ratings yet
Pattern Recognition: Zhe Wang, Zonghai Zhu, Dongdong Li
14 pages
Binary vs. Multi-Class Classification
No ratings yet
Binary vs. Multi-Class Classification
7 pages
Image Classification Techniques Overview
No ratings yet
Image Classification Techniques Overview
66 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
19 pages
Introduction to Machine Learning Types
No ratings yet
Introduction to Machine Learning Types
3 pages
Understanding Multiclass Classification
No ratings yet
Understanding Multiclass Classification
3 pages
Introduction to Softmax Regression
No ratings yet
Introduction to Softmax Regression
4 pages
Understanding Multiclass Classification
No ratings yet
Understanding Multiclass Classification
6 pages
Beyond Binary Classification Techniques
No ratings yet
Beyond Binary Classification Techniques
34 pages
Multiclass Classification with Softmax
No ratings yet
Multiclass Classification with Softmax
4 pages
Classification: Prof. Gheith Abandah
No ratings yet
Classification: Prof. Gheith Abandah
30 pages
Understanding Multiclass Classification
No ratings yet
Understanding Multiclass Classification
3 pages
Understanding Binary and Multiclass Classification
No ratings yet
Understanding Binary and Multiclass Classification
24 pages
Multiclass Classification Techniques Survey
No ratings yet
Multiclass Classification Techniques Survey
9 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
21 pages
Machine Learning Basics & Linear Classifiers
No ratings yet
Machine Learning Basics & Linear Classifiers
93 pages
AI Learning Techniques Overview
No ratings yet
AI Learning Techniques Overview
63 pages
Generative vs Discriminative Classifiers
No ratings yet
Generative vs Discriminative Classifiers
53 pages
Understanding Classification in Machine Learning
No ratings yet
Understanding Classification in Machine Learning
10 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
25 pages
Logistic vs Softmax Regression Explained
No ratings yet
Logistic vs Softmax Regression Explained
29 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
73 pages
Performance of Three Classifiers Analysis
No ratings yet
Performance of Three Classifiers Analysis
4 pages
Understanding Machine Learning Classification
No ratings yet
Understanding Machine Learning Classification
4 pages
Introduction to Classification in ML
No ratings yet
Introduction to Classification in ML
31 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
79 pages
Semi-Supervised Learning Algorithm Insights
No ratings yet
Semi-Supervised Learning Algorithm Insights
1 page
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
26 pages
Class Weights in CatBoost for Imbalance
No ratings yet
Class Weights in CatBoost for Imbalance
31 pages
Supervised vs Unsupervised Learning Guide
No ratings yet
Supervised vs Unsupervised Learning Guide
34 pages
Cumulative Regret in Healthcare Finance
No ratings yet
Cumulative Regret in Healthcare Finance
194 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
109 pages
Transformer: Attention Mechanism Unleashed
67% (3)
Transformer: Attention Mechanism Unleashed
11 pages
Deep Learning Midterm Exam Overview
No ratings yet
Deep Learning Midterm Exam Overview
14 pages
Deep Learning Notes for AKTU Students
No ratings yet
Deep Learning Notes for AKTU Students
14 pages
Image Captioning with Flickr Datasets
No ratings yet
Image Captioning with Flickr Datasets
51 pages
Maximal Jacobian Saliency Map Attack
No ratings yet
Maximal Jacobian Saliency Map Attack
5 pages
Safety-Driven Perception in RL Models
No ratings yet
Safety-Driven Perception in RL Models
9 pages
Statistical Neural Networks for Recognition
No ratings yet
Statistical Neural Networks for Recognition
17 pages
Explainable Speaker Verification System
No ratings yet
Explainable Speaker Verification System
5 pages
Exponential Family in GLMs
No ratings yet
Exponential Family in GLMs
15 pages
Deep Feedforward Neural Networks
No ratings yet
Deep Feedforward Neural Networks
97 pages
Stage-Isolated Incremental Learning
No ratings yet
Stage-Isolated Incremental Learning
9 pages
Agile Hardware Accelerator for MHSA
No ratings yet
Agile Hardware Accelerator for MHSA
4 pages
Collaborative Deep Reinforcement Learning
No ratings yet
Collaborative Deep Reinforcement Learning
9 pages
Mathematics 08 01245 v2
No ratings yet
Mathematics 08 01245 v2
29 pages
DreamerV3: Mastering Tasks via World Models
No ratings yet
DreamerV3: Mastering Tasks via World Models
40 pages
Bootstrapping for Noisy Label Training
No ratings yet
Bootstrapping for Noisy Label Training
11 pages
RTL Accelerator for Softmax Layer Design
No ratings yet
RTL Accelerator for Softmax Layer Design
5 pages
DaCSeg: Overlapping Chromosome Segmentation
No ratings yet
DaCSeg: Overlapping Chromosome Segmentation
24 pages
Softmax vs Sigmoid in Machine Learning
No ratings yet
Softmax vs Sigmoid in Machine Learning
5 pages
Smart Grid Theft Detection via DNN
No ratings yet
Smart Grid Theft Detection via DNN
18 pages
Monocular Vehicle Pose & Shape Estimation
No ratings yet
Monocular Vehicle Pose & Shape Estimation
18 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
13 pages
Enhancing Distributed Deep Learning with FF
No ratings yet
Enhancing Distributed Deep Learning with FF
10 pages
Kimi K2: Advanced Agentic Intelligence Model
No ratings yet
Kimi K2: Advanced Agentic Intelligence Model
32 pages
ML Visuals: Neural Network Architectures
No ratings yet
ML Visuals: Neural Network Architectures
101 pages
Steering MoE Models for Safety & Faithfulness
No ratings yet
Steering MoE Models for Safety & Faithfulness
21 pages
Railway Fault Detection with YOLOv11
No ratings yet
Railway Fault Detection with YOLOv11
17 pages
Understanding Transformer Models Basics
No ratings yet
Understanding Transformer Models Basics
3 pages
Inappropriate Content Detection in Cartoons
No ratings yet
Inappropriate Content Detection in Cartoons
24 pages

Multiclass Prediction

Uploaded by

Multiclass Prediction

Uploaded by

Multiclass Prediction

SoftMax Regression, One-vs-All & One-vs-One for Multi-class Classification

probability of y^=0y^=0 probability of y^=1y^=1 probability of y^=2y^=2

softmax(x1,0)softmax(x1,0) softmax(x1,1)softmax(x1,1) softmax(x1,2)softmax(x1,2)

i=0i=0 i=1i=1 i=2i=2

0.97 0.02 0.01

i=0i=0 i=1i=1 i=2i=2

Fig 1. Equation of a hyperplane. We plot the intersection of the three

Fig 2. Samples colored according to class.

Classifier 0 Classifier 1 Classifier 2

y^=0y^=0 y^=dummyy^=dummy y^=dummyy^=dummy

Table 3. Example classification output, 2 of the 3 outputs are dummy;

You might also like