0% found this document useful (0 votes)

17 views6 pages

Topic 2.8: Topics Comes Under The Topic of Discriminant Functions

This document introduces key concepts in machine learning, focusing on discriminant functions, least squares for classification, and Fisher's linear discriminant. It explains how to derive the parameter matrix for classification models, discusses the perceptron algorithm, and highlights the relationship between least squares and Fisher's criterion. The document emphasizes the importance of class separation and dimensionality reduction in effective classification.

Uploaded by

p.hemanthsai125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Topic 2.8: Topics Comes Under The Topic of Discriminant Functions

Uploaded by

p.hemanthsai125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INTRODUCTION TO MACHINE LEARNING - 20EC6404C

Topic 2.8: Topics comes under the topic of Discriminant Functions

Unit - II
Faculty: Turimerla Pratap

Least squares for classiﬁcation

We considered models that were linear functions of the parameters, and we saw that the mini-
mization of a sum-of-squares error function led to a simple closed-form solution for the parameter
values. One justiﬁcation for using least squares in such a context is that it approximates the condi-
tional expectation E[t|x] of the target values given the input vector.

Each class Ck is described by its own linear model, so that

yk (x) = wkT x + wk0

where k = 1, . . . , K. We can conveniently group these together using vector notation so that

y(x) = WT x

where W is a matrix whose kth column(comprises) the (D + 1)-dimensional vector wk and x is the
T
corresponding augmented input vector x with a dummy input x0 = 1. A new input x is then
assigned to the class for which the output yk = wkT x is largest.

We now determine the parameter matrix W by minimizing a sum-of-squares error function.

Consider a training dataset {xn , tn } where n = 1, . . . , N , and deﬁne a matrix T whose nth row
is the vector (tn )T , together with a matrix X whose nth row is (xn )T . The sum-of-squares error
function can then be written as
1 [ ]
ED (W) = Tr (XW − T)T (XW − T)
2
Setting the derivative with respect to W to zero and rearranging, we then obtain the solution for
W in the form

W = (XT X)−1 XT T = X† T
where X† is the pseudo-inverse of the matrix X. We then obtain the discriminant function in
the form
y(x) = WT x = T† (X† )T x

1
An interesting property of least-squares solutions with multiple target variables is that if every
target vector in the training set satisﬁes some linear constraint aT tn + b = 0 for some constants a
and b, then the model prediction for any value of x will satisfy the same constraint, so that

aT y(x) + b = 0

Thus, if we use a 1-of-K coding scheme for K classes, then the predictions made by the model
will have the property that the elements of y(x) will sum to 1 for any value of x. However, this
summation constraint alone is not sufﬁcient to allow the model outputs to be interpreted as proba-
bilities because they are not constrained to lie within the interval (0, 1).

The least-squares approach gives an exact closed-form solution for the discriminant function
parameters. However, even as a discriminant function, it may not perform well in practice for
classiﬁcation problems due to its limited ﬂexibility.

Fisher’s linear discriminant

One way to view a linear classification model is in terms of dimensionality reduction. Consider
first the case of two classes, and suppose we take the D-dimensional input vector x and project it
down to one dimension using y = wT x. If we place a threshold on y and classify y ≥ −w0 as
class C1 , and otherwise class C2 , then we obtain our standard linear classifier.

In general, the projection onto one dimension leads to a considerable loss of information, and
classes that are well separated in the original D-dimensional space may become strongly overlap-
ping in one dimension. However, by adjusting the components of the weight vector w, we can
select a projection that maximizes the class separation. To begin with, consider a two-class prob-
lem in which there are N1 points of class C1 and N2 points of class C2 , so that the mean vectors of
the two classes are given by
1 ∑ 1 ∑
m1 = xn , m2 = xn .
N1 n∈C N2 n∈C
1 2

The simplest measure of the separation of the classes when projected onto w is the separation
of the projected class means. This suggests that we might choose w so as to maximize

m2 − m1 = wT (m2 − m1 ),

where mk = wT mk is the mean of the projected data from class Ck . However, this expression
can be made arbitrarily large simply by increasing the magnitude of w. To solve this problem, we
could constrain w to have unit length, so that |w|2 = 1. Using a Lagrange multiplier to perform
the constrained maximization, we then ﬁnd that

w ∝ S−1
W (m2 − m1 ),

2
where SW is the total within-class covariance matrix deﬁned as
∑ ∑
SW = (xn − m1 )(xn − m1 )T + (xn − m2 )(xn − m2 )T .
n∈C1 n∈C2

The result w ∝ S−1 W (m2 − m1 ) is known as Fisher’s linear discriminant, which gives the di-
rection for the projection of the data.

The projection formula that transforms the set of labeled data points in x into a labeled set in
the one-dimensional space y is given by:

y = wT x.

The within-class variance of the transformed data from class Ck is deﬁned as:
∑
s2k = (yn − mk )2 ,
n∈Ck

where yn = wT xn . We can deﬁne the total within-class variance for the whole dataset as:

s21 + s22 .

The Fisher criterion is deﬁned as the ratio of the between-class variance to the within-class
variance and is given by:
(m2 − m1 )2
J(w) = .
s21 + s22
By rewriting the Fisher criterion using the projection formula and the within-class variances,
we have:
wT SB w
J(w) = T ,
w SW w
where SB is the between-class covariance matrix deﬁned as:

SB = (m2 − m1 )(m2 − m1 )T .

To ﬁnd the weight vector w that maximizes J(w), we differentiate J(w) with respect to w, set
it to zero, and solve for w. This leads to the following equation:

S−1
W (m2 − m1 ) = λSB w,

where λ is a scalar. It can be shown that the optimal weight vector w is proportional to S−1
W (m2 −
m1 ).
Finally, we can use the projected data and a threshold y0 to construct a discriminant. A new
vector x can be classiﬁed as belonging to class C1 if y(x) ≥ y0 , and as belonging to class C2
otherwise. The threshold y0 can be determined by modeling the class-conditional densities p(y|Ck )
using Gaussian distributions and using maximum likelihood estimation to ﬁnd the parameters of
the Gaussian distributions.

3
Relation to least squares
The least squares approach and the Fisher criterion are related for the two-class problem. By
adopting a different target coding scheme, the least squares solution can be shown to be equivalent
to the Fisher solution.
In the least squares approach, we minimize the sum-of-squares error function given by:

1 ∑( T )2
N
E= w x n + w 0 − tn ,
2 n=1

where w is the weight vector, w0 is the bias, xn are the input vectors, and tn are the target values.
Setting the derivatives of E with respect to w0 and w to zero, we obtain the following equations:

∑
N
( ) ∑
N
( )
w xn + w0 − tn = 0,
T
w T x n + w 0 − tn x n = 0. (1)
n=1 n=1

Using a speciﬁc target coding scheme where the targets for class C1 are N/N1 and the targets
for class C2 are −N/N2 , we can simplify the above equations. The bias can be expressed as:

w0 = −wT m,

where m is the mean of the total dataset.

By substituting the bias expression into the second equation and performing some algebraic
manipulations, we arrive at:
SW w = N (m1 − m2 ),
where SW is the total within-class covariance matrix.
Comparing this equation with the Fisher solution, we ﬁnd that w is proportional to S−1
W (m2 −
m1 ). Therefore, the weight vector obtained from the least squares approach coincides with the
Fisher solution.

Fishers discriminant for multiple classes

Fisher’s linear discriminant can also be extended to handle multiple classes. The objective is to ﬁnd
a projection that maximizes the class separability while minimizing the within-class scatter. The
resulting discriminant is known as Fisher’s discriminant or Fisher’s linear discriminant analysis.

Let’s consider a problem with K classes. We want to ﬁnd a projection vector w that maps the
original D-dimensional input space to a lower-dimensional space, such that the projected data can
be effectively separated into their respective classes.

The between-class scatter matrix SB is deﬁned as the sum of the scatter matrices for each
class. Each scatter matrix measures the deviation between the class mean and the overall mean in

4
the projected space. Mathematically, SB is given by:

∑
K
SB = Nk (mk − m)(mk − m)T ,
k=1

where Nk is the number of samples in class k, mk is the mean of class k, and m is the overall mean
of the data.

The within-class scatter matrix SW measures the scatter within each class. It is deﬁned as the
sum of the scatter matrices for each class, which quantify the deviations of samples from their
respective class means. Mathematically, SW is given by:

∑
K ∑
SW = (xn − mk )(xn − mk )T ,
k=1 n∈Ck

where xn is a sample from class k and mk is the mean of class k.

The Fisher criterion seeks to maximize the ratio of the between-class scatter to the within-class
scatter. This can be expressed as the following optimization problem:

w T SB w
max .
w w T SW w
Solving this optimization problem yields the optimal projection vector w, which can be ob-
−1
tained by computing the eigenvectors of SW SB and selecting the eigenvectors corresponding to
the largest eigenvalues.

Once the projection vector w is obtained, new samples can be projected onto this vector, and a
classiﬁcation rule can be applied to assign them to the appropriate class.

Fisher’s discriminant provides a linear decision boundary in the projected space that optimally
separates the classes. However, it assumes Gaussian distributions for the class data and equal
covariance matrices for all classes, which may not always hold in practice.

The Perceptron
The perceptron is a fundamental binary classiﬁcation algorithm in machine learning. It is a type
of linear classiﬁer that separates two classes by learning a decision boundary based on the input
features. The perceptron algorithm was introduced by Frank Rosenblatt in 1957 and forms the
basis for many subsequent developments in neural networks.

The perceptron model takes an input vector x of D features and assigns weights w1 , w2 , . . . , wD
to each feature. Additionally, there is a bias term b that represents the threshold for the decision
boundary. The output of the perceptron is a binary prediction ŷ, indicating which class the input

5
belongs to.

The prediction ŷ is obtained by applying the following steps:

• Calculate the weighted sum of the input features:

∑
D
z= wi xi + b.
i=1

• Apply an activation function to the weighted sum. In the perceptron algorithm, the activation
function is a simple step function (also known as the Heaviside step function), which returns
1 if z is positive or zero, and 0 otherwise:
{
1 if z ≥ 0,
ŷ =
0 otherwise.

The predicted output ŷ represents the class label assigned by the perceptron. The learning
process of the perceptron involves adjusting the weights and bias based on the training data. The
algorithm starts with random or zero weights and iterates through the training samples until con-
vergence or a predeﬁned number of iterations. During each iteration, the perceptron updates its
weights based on the following rule:

wi ← wi + η(y − ŷ)xi ,
b ← b + η(y − ŷ),
where y is the true class label of the training sample, ŷ is the predicted class label, xi is the i-th
feature of the input, and η is the learning rate, which controls the step size of the weight updates.

The learning process continues until the algorithm correctly classiﬁes all training samples or
reaches the maximum number of iterations. The perceptron algorithm guarantees convergence if
the classes are linearly separable; otherwise, it may not converge. The perceptron is a foundational
algorithm that paved the way for more sophisticated neural network architectures. While it is a
linear classiﬁer, its simplicity and interpretability make it an essential concept in understanding the
basics of machine learning and neural networks.

Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Supervised Machine Learning Guide
No ratings yet
Supervised Machine Learning Guide
74 pages
Hota ML LDF
No ratings yet
Hota ML LDF
28 pages
Linear Classifiers
No ratings yet
Linear Classifiers
48 pages
Fisher Discriminant Analysis
No ratings yet
Fisher Discriminant Analysis
5 pages
Probabilities in Linear Classification
No ratings yet
Probabilities in Linear Classification
40 pages
Linear Models for Classification Techniques
No ratings yet
Linear Models for Classification Techniques
21 pages
Linear Models for Classification
No ratings yet
Linear Models for Classification
72 pages
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
No ratings yet
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
11 pages
Fischer LDA
No ratings yet
Fischer LDA
8 pages
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
No ratings yet
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
8 pages
4.ML-Linear Models For Classification
No ratings yet
4.ML-Linear Models For Classification
26 pages
Module 2 - LDA
No ratings yet
Module 2 - LDA
28 pages
FDA Class 2025
No ratings yet
FDA Class 2025
29 pages
Discriminant Functions
No ratings yet
Discriminant Functions
8 pages
Understanding Fisher Linear Discriminant Analysis
No ratings yet
Understanding Fisher Linear Discriminant Analysis
6 pages
Course MDA-12
No ratings yet
Course MDA-12
48 pages
Fisher's Linear Discriminant
No ratings yet
Fisher's Linear Discriminant
25 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Fishers LDA
No ratings yet
Fishers LDA
47 pages
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
No ratings yet
Genomic Signal Processing: Classification of Disease Subtype Based On Microarray Data
26 pages
Linear Discriminant Analysis Guide
No ratings yet
Linear Discriminant Analysis Guide
47 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
6 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
Bayesian Classification
No ratings yet
Bayesian Classification
14 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
ML Feature Selection Guide
No ratings yet
ML Feature Selection Guide
40 pages
Linear Classifiers Explained
No ratings yet
Linear Classifiers Explained
13 pages
Feature Selection and Extraction in ML
No ratings yet
Feature Selection and Extraction in ML
40 pages
Fisher Linear Discriminant Analysis: Max Welling
No ratings yet
Fisher Linear Discriminant Analysis: Max Welling
4 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
06b Discriminant Analysis
No ratings yet
06b Discriminant Analysis
18 pages
AI Linear Regression & Perceptron
No ratings yet
AI Linear Regression & Perceptron
8 pages
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
No ratings yet
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
31 pages
ML - Discriminant Functions
No ratings yet
ML - Discriminant Functions
17 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Objectives:: Linear Discriminant Analysis
No ratings yet
Objectives:: Linear Discriminant Analysis
10 pages
Machine Learning Assignment 3 Guide
No ratings yet
Machine Learning Assignment 3 Guide
3 pages
Linear Discriminant Analysis Overview
No ratings yet
Linear Discriminant Analysis Overview
15 pages
Outline: Reducing Data Dimension
No ratings yet
Outline: Reducing Data Dimension
7 pages
3b Features PDF
No ratings yet
3b Features PDF
40 pages
Discriminant Analysis Example 2: Fisher's Iris Data
No ratings yet
Discriminant Analysis Example 2: Fisher's Iris Data
12 pages
Supervised Learning: Linear Models
No ratings yet
Supervised Learning: Linear Models
34 pages
08 Classification
No ratings yet
08 Classification
46 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
cs188 sp23 Lec25 - Z
No ratings yet
cs188 sp23 Lec25 - Z
38 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Techniques for Image Classification
No ratings yet
SVM Techniques for Image Classification
22 pages
PR 2 Unit
No ratings yet
PR 2 Unit
13 pages
Feed-Forward Network Functions
No ratings yet
Feed-Forward Network Functions
2 pages
Unit 3
No ratings yet
Unit 3
25 pages
Vlsi Unit 2
No ratings yet
Vlsi Unit 2
16 pages
Adc Unit 3
No ratings yet
Adc Unit 3
47 pages
ADC Unit 1
No ratings yet
ADC Unit 1
59 pages
Severe Accident
No ratings yet
Severe Accident
24 pages
EE3701-High Voltage Engineering
No ratings yet
EE3701-High Voltage Engineering
27 pages
Eigenvalue Sensitivity Accepted Version
No ratings yet
Eigenvalue Sensitivity Accepted Version
12 pages
3.5 Mpa Hydraulic Cylinder 3.5 Mpa Hydraulic Cylinder
No ratings yet
3.5 Mpa Hydraulic Cylinder 3.5 Mpa Hydraulic Cylinder
18 pages
Gravitational Field Strength Questions - Revised
No ratings yet
Gravitational Field Strength Questions - Revised
2 pages
Plastic Piping System Guide
No ratings yet
Plastic Piping System Guide
20 pages
Theodolite Traversing and Error Correction
No ratings yet
Theodolite Traversing and Error Correction
11 pages
Prestressed Concrete Slab Design
No ratings yet
Prestressed Concrete Slab Design
11 pages
Spark Optical Emission Spectrometry Overview
No ratings yet
Spark Optical Emission Spectrometry Overview
21 pages
Understanding SO3 Structure and Decomposition
0% (1)
Understanding SO3 Structure and Decomposition
2 pages
DCS-II Exam Paper for B.Tech CE Students
No ratings yet
DCS-II Exam Paper for B.Tech CE Students
2 pages
2021 Che 2b 02 - 4361 - Theoretical and Inorganic Chemistry II
No ratings yet
2021 Che 2b 02 - 4361 - Theoretical and Inorganic Chemistry II
2 pages
Quantum Mechanics - 2
No ratings yet
Quantum Mechanics - 2
23 pages
JEE Math Prep for 2023/24 Students
No ratings yet
JEE Math Prep for 2023/24 Students
6 pages
Class - XI - Half Yearly - Chemistry - Set A - Chandra Shekhar - 23.09.24
100% (1)
Class - XI - Half Yearly - Chemistry - Set A - Chandra Shekhar - 23.09.24
5 pages
9702 As Physics 12
No ratings yet
9702 As Physics 12
317 pages
ELL406 Assignment 3
No ratings yet
ELL406 Assignment 3
2 pages
Skr200 Manual
No ratings yet
Skr200 Manual
18 pages
Albert Einstein Achievements
No ratings yet
Albert Einstein Achievements
1 page
844g65vtzasx PDF
No ratings yet
844g65vtzasx PDF
2 pages
675, mst532 1pdfcfcd
No ratings yet
675, mst532 1pdfcfcd
2 pages
Module 1 Intro Electric Drives
No ratings yet
Module 1 Intro Electric Drives
71 pages
Chapter-4 Example Steel and Timber
No ratings yet
Chapter-4 Example Steel and Timber
16 pages
The Convergence AI and Quantum Computing
No ratings yet
The Convergence AI and Quantum Computing
10 pages
Manual Makita 9557HNR
No ratings yet
Manual Makita 9557HNR
84 pages
Systems of Nonlinear Partial Differential Equations Applications To Biology and Engineering Mathematics and Its Applications A.W. Leung Download PDF
100% (4)
Systems of Nonlinear Partial Differential Equations Applications To Biology and Engineering Mathematics and Its Applications A.W. Leung Download PDF
54 pages
Pump Calculation
No ratings yet
Pump Calculation
1 page
Balancing Redox Reactions in Acidic Solutions
No ratings yet
Balancing Redox Reactions in Acidic Solutions
17 pages
Electromagnetic Brakes for Engineers
No ratings yet
Electromagnetic Brakes for Engineers
5 pages
Efficient Cooling Solutions for SPEEDIO
No ratings yet
Efficient Cooling Solutions for SPEEDIO
2 pages

Topic 2.8: Topics Comes Under The Topic of Discriminant Functions

Uploaded by

Topic 2.8: Topics Comes Under The Topic of Discriminant Functions

Uploaded by

INTRODUCTION TO MACHINE LEARNING - 20EC6404C

Topic 2.8: Topics comes under the topic of Discriminant Functions

Least squares for classiﬁcation

Each class Ck is described by its own linear model, so that

yk (x) = wkT x + wk0

We now determine the parameter matrix W by minimizing a sum-of-squares error function.

Fisher’s linear discriminant

where m is the mean of the total dataset.

Fishers discriminant for multiple classes

where xn is a sample from class k and mk is the mean of class k.

The prediction ŷ is obtained by applying the following steps:

• Calculate the weighted sum of the input features:

You might also like