0% found this document useful (0 votes)
120 views5 pages

Fisher Discriminant Analysis

Fisher's linear discriminant is a supervised learning classifier that finds a decision boundary by maximizing the separation between classes in projected data. It achieves this by maximizing the distance between projected means while minimizing the within-class variance. The method ultimately allows for the creation of an optimal decision boundary for multivariate Gaussian distributions with identical covariance matrices.

Uploaded by

mrsaqlain27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views5 pages

Fisher Discriminant Analysis

Fisher's linear discriminant is a supervised learning classifier that finds a decision boundary by maximizing the separation between classes in projected data. It achieves this by maximizing the distance between projected means while minimizing the within-class variance. The method ultimately allows for the creation of an optimal decision boundary for multivariate Gaussian distributions with identical covariance matrices.

Uploaded by

mrsaqlain27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Fisher’s linear discriminant can be used as a supervised

learning classifier. Given labeled data, the classifier can


find a set of weights to draw a decision boundary,
classifying the data. Fisher’s linear discriminant
attempts to find the vector that maximizes the
separation between classes of the projected
data. Maximizing “separation” can be ambiguous. The
criteria that Fisher’s linear discriminant follows to do this
is to maximize the distance of the projected means and to
minimize the projected within-class variance.

For example:

Here are two bivariate Gaussians with identical covariance


matrices and distinct means. We want to find the vector
that best separates the projections of the data. Let's
draw a random vector and plot the projections.
Remember we are looking at projections of the data onto
the vector (the dot product of the weights vector and the
data matrix) and not a decision boundary. The projections
of the data onto this random weights vector can be plotted
as a histogram (image on the right). As you can see when
projecting the data onto the vector and drawing the
histogram the two classes of data aren’t well separated.
The goal is to find the line that best separates the two
distributions on the image on the right.

To separate the two distributions, we could first try to


maximize the distance between the projected means,
meaning the distributions are, on average, as far as
possible from each other. Let’s draw a line between the
two means and plot the histogram of the projections onto
that line.
Image by Author

That’s quite a bit better, but the projections of the data are
not fully separated yet. To fully separate them, Fisher’s
linear discriminant minimizes the within-class variance of
the projections at the same time as maximizing the
projections between the means. It tries to maximise the
means as we discussed before to separate them, but also
attempts to make the distributions as tight as possible.
This allows for better separation as you’ll see below.

Image by Author
As you can see the projections of the data are well
separated. We can take an orthogonal vector from the
weights vector to create a decision boundary. The decision
boundary tells us that on either side of the boundary the
data can be predicted to be one class or another. For
multivariate gaussian distributions with identical
covariance matrices, this yields an optimal classifier.

Π= hyper plane

W= vector projection (dimension of the vectors)

Πw = hyper plane

According to the vector projection formula

wTx=0

x=represent the data points (black and red)

xi=points projected on the plane

wT xi = xi ‘

D= Data set

D(xi,yi)

J(w)= fisher discriminant function

u1 ‘ and u2 ‘= mean of classes


s1 ‘ and s2 ‘= variance of classes 1 and 2 (variance means a distance of point at hyperplane)

J(w) = (u1 ‘ +u2 ‘)2 /s1 ‘2 +s2 ‘2

You might also like