Statistical Methods in Artificial Intelligence
CSE471 - Monsoon 2015 : Lecture 03
Avinash Sharma
CVIT, IIIT Hyderabad
Lecture 03: Plan
Linear Algebra Recap
Linear Discriminant Functions (LDFs)
The Perceptron
Generalized LDFs
The Two-Category Linearly Separable
Case
Learning LDF: Basic Gradient Descend
Perceptron Criterion Function
Basic Linear Algebra
Operations
Vector
Vector Operations
Scaling
Transpose
Addition
Subtraction
Dot Product
Equation of a Plane
Vector
Operations
Transpose
Scaling: Only Magnitude Changes
Vector
Operations
Vector
Operations
Dot Product (Inner Product) of two vectors is a scalar.
Dot product if two perpendicular vectors is 0
Equation of a Plane
Linear Discriminant
Functions
Assumes
a 2-class classification setup
Decision boundary is represented explicitly in
terms of components of .
Aim is to seek parameters of a linear discriminant
function which minimize the training error.
Why Linear ?
Simplest possible
Generalized
Linear Discriminant Functions
Class A
Class B
The perceptron
Perceptron Decision
Boundary
Perceptron Summary
Decision boundary surface (hyperplane) divides
feature space into two regions.
Orientation of the boundary surface is decided by the
normal vector .
Location of the boundary surface is determined by the
bias term .
is proportional to distance of from the boundary
surface.
positive side and negative side.
Generalized LDFs
Linear:
Non Linear
(Quadratic)
Generalized LDFs
Linear
=1
Non Linear
Generalized LDFs
Generalized LDFs Summary
can be any arbitrary mapping function that projects
original data points to points where .
The hyperplane decision surface passes through
origin.
Advantage: In the mapped higher dimensional space
data might be linear separable.
Disadvantage: The mapping is computationally
intensive and learning the classification parameters
can be non-trivial.
Two-Category Linearly Separable
Case
Two-Category Linearly Separable
Case
Normalized Case
Two-Category Linearly Separable
Case
Data vector
Two-Category Linearly Separable
Case
Learning LDF: Basic Gradient
Descend
Define
a scalar function which captures
classification error for specific boundary plane
described by parameter
Minimize using gradient descent.
Start with arbitrary value of for .
Iteratively refine estimate of :
is the positive scale factor also known as
learning rate
A too small makes the convergence very slow
A too large can diverge due to overshooting of