0% found this document useful (0 votes)

17 views104 pages

Pattern Recognition

The document outlines a course on Pattern Recognition taught by Dr. Jayakrishnan Anandakrishnan, detailing its objectives, outcomes, and units of study, including parametric and nonparametric methods, clustering, and decision-making processes. It emphasizes the importance of identifying patterns in data and covers techniques such as Bayesian decision-making, linear discriminant functions, and minimum squared error methods. The course aims to equip students with the knowledge and skills to apply various classification and clustering methods effectively.

Uploaded by

14567emoji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views104 pages

Pattern Recognition

Uploaded by

14567emoji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

24ASD641 Pattern Recognition

Dr. Jayakrishnan Anandakrishnan

Assistant Professor
Amrita School of Computing
Amrita Vishwa Vidyapeetham, Coimbatore

Google Scholar ORCID Web of Science

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 1 / 104

Course Objectives and Outcomes

Objectives
Identify patterns, regularities, or structures in data to make informed decisions.
Focus on computational properties of patterns and algorithms used to process
them.

Outcomes
CO01: To get an idea about pattern recognition with suitable examples
CO02: To gain knowledge about parametric classification methods using
Bayesian decision making approach
CO03: To apply nonparametric techniques such as nearest neighbor, adaptive
discriminant functions, and decision regions based on minimum squared error
CO04: To gain knowledge about nonmetric methods, classification trees, and
some resampling methods
CO05: To study and apply various clustering methods

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 2 / 104

Course Units and References
Unit I: Introduction – pattern recognition systems, design cycle, learning/adapta-
tion, applications, statistical decision theory, examples in pattern recognition and
image processing.
Unit II: Parametric methods – Bayes theorem, Bayesian decision making, Gaussian
case, discriminant functions, decision boundaries/regions, dimensionality problems,
ROC curves, ML classification.
Unit III: Nonparametric methods – histograms, density estimation, mixture densi-
ties, kernel/window estimators, nearest neighbor techniques, adaptive discriminant
functions, minimum squared error methods.
Unit IV: Nonmetric methods – decision trees, CART, algorithm-independent ML,
bias-variance, jackknife and bootstrap resampling.
Unit V: Clustering – unsupervised learning, criterion functions, hierarchical (single,
complete, average, Ward’s), partitional (Forgy’s, k-means).
References:
1 Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, 2nd
Ed., Wiley, 2003.
2 Earl Gose, Richard Johnsonbaugh, Steve Jost, Pattern Recognition and Image
Analysis, PHI, 2002.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 3 / 104
Outline

1 Linear Discriminant Functions

2 Minimum Squared Error Discriminant Functions

3 Non-metric Methods

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 4 / 104

Linear Discriminant Functions

In parametric estimation, we assumed that the forms for the underly-

ing probability densities were known and used the training samples to
estimate the values of their parameters.
Instead, assume that the proper forms for the discriminant functions
are known, and use the samples to estimate the values of parameters
of the classifier.
None of the various procedures for determining discriminant functions
requires knowledge of the forms of underlying probability distributions,
so-called nonparametric approach.
Linear discriminant functions are relatively easy to compute and esti-
mate the form using training samples.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 5 / 104

Linear Discriminant Functions

Parametric vs. Nonparametric

Parametric: Assume the data follows a probability distribution
(Gaussian/Normal), and just estimate parameters (mean, variance).
Nonparametric: No assumption of any distribution. Instead, directly
learn from the training data itself.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 6 / 104

Parametric Case

Training Data:

Fruit Weight (grams)

Apple 160
Apple 170
Banana 120
Banana 130

Estimated Parameters:
Apple: Mean µ = 165, Variance σ 2 = 25
Banana: Mean µ = 125, Variance σ 2 = 25

New fruit weight: x = 150 grams

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 7 / 104

Classification Using Gaussian Likelihood

(x − µ)2

1
P(x|class) = √ exp −
2πσ 2 2σ 2

For Apple:

(150 − 165)2

1
P(150|Apple) = p exp − =≈ 0.0026
2π(25) 2(25)

For Banana:
(150 − 125)2

1
P(150|Banana) = p exp − =≈ 1.5 × 10−6
2π(25) 2(25)

Conclusion: The fruit is more likely to be an Apple.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 8 / 104
Non-parametric Case

No assumption of Gaussian or any probability distribution

Instead, we can use a function g (x), ie, a linear discriminant function

Simple Rule:
g (x) = ax + b
(
If g (x) > 0, then classify as Apple
If g (x) < 0, then classify as Banana

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 9 / 104

Discriminant Function for Classification

A discriminant function helps in decision-making.

For example: Given a point X , assign it to class ω1 or class ω2
depending on the value of a function.
(
If function > 0, assign to class ω1
If function < 0, assign to class ω2

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 10 / 104

Linear Classifier

w is the weight vector

w0 is the bias or threshold
weight
g (x) = 0 defines the decision
surface
Decision surface separates
classes
Two-category case
Multi-category case

g (x) = wT x + w0

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 11 / 104

Linear Classifier

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 12 / 104

Two-Category Classification
A two-category case can be defined as a decision between class ω1 and
class ω2 . (
g (x) > 0 ⇒ Class ω1
g (x) < 0 ⇒ Class ω2
Given:
g (x) = wT x + w0 > 0 ⇒ Class ω1
This implies:
wT x > −w0
So the decision rule becomes:
(
wT x > −w0 ⇒ Class ω1
wT x < −w0 ⇒ Class ω2

If g (x) = 0, the point lies on the decision boundary.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 13 / 104
Linear Classifier

y = f (g (x))
Where:
(
+1 if g (x) > 0
y=
−1 if g (x) < 0

The equation g (x) = 0 defines

the decision surface that separates
points assigned to the categories
ω1 and ω2 .

If x1 and x2 are both on the deci-

sion surface, then...

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 14 / 104

Geometry of the Decision Surface

If g (x1 ) = g (x2 ) = 0, then both x1 and x2 lie on the decision surface.

wT x1 + w0 = w T x2 + w0 = 0
Subtracting the two equations:

w T (x1 − x2 ) = 0
This indicates that w is orthogonal (normal) to any vector lying in the
hyperplane defined by the decision surface.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 15 / 104

How g (x) is Related to Distance Measure
xp is the projection of x on hyperplane H
r is positive if x is on the positive side, negative if on the negative side
w
x = xp + r
∥w∥

T w T
g (x) = w x+w0 = w xp + r +w0
∥w∥

∥w∥2

w
= wT xp +r wT +w0 = wT xp +r +w0
∥w∥ ∥w∥

= wT xp + r ∥w∥ + w0
If xp lies on the decision surface, then
wT xp + w0 = 0, so:

Dr. Jayakrishnan Ananadakrishnan g (x)

24ASD641 Pattern Recognition 16 / 104
How g (x) is Related to Distance Measure

The distance from the origin to the

hyperplane H is:
w0
∥w∥

If w0 > 0, the origin lies on the

positive side of H
If w0 < 0, the origin lies on the
negative side of H
If w0 = 0, then g (x) = wT x is
homogeneous and the hyperplane
passes through the origin

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 17 / 104

Multi Class Case

c(c − 1)/2 linear

c two-class problem (1 vs
discriminant one for every
the rest)
pair of classes

Both lead to an ambiguous region, where if a sample falls the its

difficulty in deciding
How can it be solved?

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 18 / 104

Multi Class Case: Solution

Can be solved by considering a linear discriminant function for each

individual class
Let us consider, a linear discriminant function for the i th class is:

gi (x) = wiT x + wi0 , i = 1, . . . , C

We assign x to wi if gi (x) > gj (x) for j ̸= i.

In this case, the resulting classifier is known as a Linear Machine.
The linear machine divides the feature space into C decision regions:

R1 , R2 , . . . , RC

with gi (x) being the largest discriminant if x lies in the region Ri .

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 19 / 104

Multi Class Case: Point on Decision Boundary

For two contiguous regions Ri and Rj , and if point lies on the decision
boundary then:
gi (x) = gj (x)

⇒ wiT x + wi0 = wjT x + wj0

⇒ (wi − wj )T x + (wi0 − wj0 ) = 0

It shows that (wi − wj ) is normal to Hij (separating plane between class i
& j).
So the algebraic distance from x to Hij is

gi (x) − gj (x)
r=
∥wi − wj ∥

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 20 / 104

Multi Class Case: Regions

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 21 / 104

Minimum Squared Error Discriminant Functions

Possibilities to extract the weight vector all at once without any

iterative process
Neural network has an iterative process to learn the weight a
Can you learn a in a single step?
    
x11 x12 · · · x1d a1 b1
x12 x22 · · · x2d   a2  b2 
..   ..  =  .. 
    
 .. .. . .
 . . . .  .   . 
xn1 xn2 · · · xnd ad bn

Let X be an n × d matrix
Now apply augmentation. Why augmentation?

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 22 / 104

Minimum Squared Error Discriminant Functions
Because of the bias (intercept) term need to be added in linear
discriminant functions
The linear discriminant function for a single data point xi

g (xi ) = a0 + a1 xi1 + a2 xi2 + · · · + ad xid

What is a0 , its the augmentation to wright vector a
Augmented vector Y , whose each data point yi has an extra 1 at the
beginning
 
1 x11 x12 · · · x1d
1 x21 x22 · · · x2d 
Y = .
 
. .. .. . . .. 
. . . . . 
1 xn1 xn2 · · · xnd
‘
Now its n × (d + 1)
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 23 / 104
Minimum Squared Error Discriminant Functions
    
1 x11 x12 · · · x1d a0 b1
1 x21 x22 · · · x2d   a1  b2 
..   ..  =  .. 
    
 .. .. .. . .
. . . . .  .   . 
1 xn1 xn2 · · · xnd ad bn

Now you can write as a linear equation Ya = b, and a = Y −1 b

a = Y −1 b is only possible if Y is a square, invertible matrix
is it true for the real world?
In the real world, usually you have more data points n than features d
How to get the first linear equation?
Multiply the first row of Y by column wright a equal to the first
element in b
Now, we don’t have the exact weight vector a, then the error will
happen
Error e = Ya − b
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 24 / 104
Minimum Squared Error Discriminant Functions

If error e is there, then we can write the error as the sum of squared
errors, and the function is given as
X
Js (a) = ∥Ya − b∥2 = (aT yi − bi )2
Note: a is the weights, including bias, y is the input sample with a
bias term, b is the target
Now we have to minimize this error, take the gradient, and equate it
to zero provides a simple solution
Differentiate w.r.t a
n
X
∇Js (a) = 2(aT yi − bi )yi ⇒ 2Y T (Ya − b)
i=1

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 25 / 104

Minimum Squared Error Discriminant Functions

When you equate the gradient of the error function to zero, you are
finding the point where the function’s slope is flat. For a convex func-
tion like the squared error, this flat point is the global minimum.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 26 / 104

Minimum Squared Error Discriminant Functions
Equate it to zero to get a simple solution

2Y T (Ya − b) = 0

Y T Ya = Y T b
a = (Y T Y )−1 Y T b
Now you can write a as

a = Y † b, where Y † = (Y T Y )−1 Y T
Y † is called the pseudoinverse. is Y T Y is always a square matrix?
Indicating if we know b, we can compute our solution weight a from
the pseudoinverse
Regularized pseudoinverse is given as, where ϵ is a small positive
constant.
Y † ≡ lim (Y T Y + ϵI )−1 Y T
ϵ→0

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 27 / 104

Minimum Squared Error Discriminant Functions
Suppose we have the following two-dimensional points for two categories:
ω1 : (1, 2)t , (2, 0)t and ω2 : (3, 1)t , (2, 2)t . Construct a classifier with a
pseudoinverse.

1 2 3 2
ω1 : , ω2 : ,
2 0 1 3
Augmented vectors are:
y1 = [1, 1, 2], y2 = [1, 2, 0], y3 = [−1, −3, −1], y4 = [−1, −2, −3]
Matrix Y is:  
1 1 2
1 2 0
Y= −1 −3 −1


−1 −2 −3
The pseudoinverse of Y is given by:

Y+ = (YT Y)−1 YT
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 28 / 104
Minimum Squared Error Discriminant Functions

5 13 3 7
 
4 12 4 12
Y+ = 
 1
− 16 − 21 − 16 

− 2 
0 − 13 0 −3 1

We arbitrarily let all the margins/labels be equal, i.e.,

 
1
1
b = (1, 1, 1, 1)t = 


1
1

Our solution is:

11
 
3
a = Y+ b = − 4 
3
− 23

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 29 / 104

Introduction to Non-Metric Methods

Previous pattern recognition methods involved real-valued feature vec-

tors with clear metrics.
Are there instances of data without clear metrics?
For Nominal data: Data that are discrete and without any natural order
notion of similarity or even ordering.
For example, eye Colors. It can be Brown, Blue, or Green. You can
count how many people have brown eyes, but you can’t perform a
mathematical operation like finding the average eye color.
Solution: decision trees and string grammars.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 30 / 104

Introduction to Non-Metric Methods: Decision Trees

I am thinking of a fruit. Ask me up to 20 yes/no questions to determine

what is the fruit am thinking of.
How did you ask the questions?
What underlying measure led you to the questions, if any?
Most importantly, iterative yes/no questions of this sort require no
metric and are well-suited for nominal data.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 31 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 32 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 33 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 34 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 35 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 36 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 37 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 38 / 104

Decision Tree

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 39 / 104

Decision Tree

A decision tree starts with a root node at the very top. This is where
the classification of a particular pattern begins by checking a specific
property that was chosen during the tree’s learning phase.
The tree then branches out from the root node. Each branch corre-
sponds to a different possible value of the property being checked.
You follow the branch that matches the value of your pattern, which
leads you to a new node where the process is repeated.
This step-by-step process continues, with the tree checking one prop-
erty after another, until you reach a leaf node.
A leaf node signifies that a final decision has been reached and the
pattern has been classified.
This logical, easy-to-follow structure is what makes decision trees highly
interpretable.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 40 / 104

Decision Tree-CART

Consider a labeled dataset D with a set of features chosen for classifi-

cation.
The goal is to arrange these features into a decision tree that achieves
high classification accuracy.
A decision tree works by recursively splitting the data into smaller sub-
sets.
When all samples in a subset belong to the same class, the node is said
to be pure, and further splitting is unnecessary.
In practice, complete purity is rare, so we must decide whether to stop
with an imperfect split or continue growing the tree with additional
features.
CART adopt a greedy(i.e., non backtracking) approach in which deci-
sion trees are constructed in a top-down recursive divide-and-conquer
manner

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 41 / 104

CART Strategy

The basic CART strategy for recursively defining a tree is:

Given the data at a node, either declare it a leaf or choose a property
to split the data into subsets.
In this process, six key questions arise:
1 How many branches should be created from a node?
2 Which property should be tested at a node?
3 When should a node be declared a leaf?
4 How can we prune a tree that has grown too large?
5 If a leaf node remains impure, how should its category be assigned?
6 How should missing data be handled?

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 42 / 104

Number of Splits

The number of splits at a node, called the branching factor B, is

usually determined by the designer based on the test selection
method.
The branching factor can vary across different parts of the tree.
Any split with more than two branches can be represented as a
sequence of binary splits.
For this reason, Decision Hierarchy System primarily focuses on binary
tree learning.
However, in some cases, using 3- or 4-way splits may be preferred, as
binary tests or inferences can be computationally expensive.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 43 / 104

Principle of Tree Creation

The core principle in building decision trees is simplicity: prefer small,

compact trees with fewer nodes.
At each node N, we select a property test T that makes the
descendant nodes as pure as possible.
Let i(N) represent the impurity of node N:
i(N) = 0 if all samples in the node belong to one category (pure).
i(N) is large when categories are equally represented (impure).
A widely used impurity measure is Entropy:
X
i(N) = − P(ωj ) log P(ωj )
j

Entropy reaches its minimum when the node contains only one class.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 44 / 104

Principle of Tree Creation
For the two-class case, Variance Impurity is defined as:

i(N) = P(ω1 )P(ω2 )

For multi-class problems, the Gini Impurity is commonly used:

X X
i(N) = P(ωi )P(ωj ) = 1 − P 2 (ωj )
i̸=j j

This represents the expected error rate if the class label is chosen
randomly according to the class distribution at node N.
Another measure is the Misclassification Impurity: Valid for two
category only
i(N) = 1 − max P(ωj )
j

This gives the minimum probability that a training sample at node N

will be misclassified.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 45 / 104
Principle of Tree Creation

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 46 / 104

Feature Selection at a Node

Key Question: Given a partial tree down to node N, which feature

should be selected for the property test T ?
Heuristic: choose the feature that produces the largest decrease in
impurity.
The impurity gradient is defined as:

∆i(N) = i(N) − PL i(NL ) − (1 − PL )i(NR )

where:
NL , NR : left and right child nodes.
PL : fraction of samples directed to the left subtree by test T .
Strategy: select the feature that maximizes ∆i(N).
If entropy impurity is used, this is equivalent to selecting the feature
with the highest Information Gain.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 47 / 104

Binary and Multi-Class Splits

In the binary case, feature selection reduces to a one-dimensional

optimization problem (which may have multiple optima).
For higher branching factors, the problem becomes
higher-dimensional.
In multi-class binary tree construction, the twoing criterion is often
used:
Goal: find a split that best separates the c categories into two groups.
Define a candidate “supercategory” C1 (subset of categories) and C2
(the remainder).
Search must consider both features and category groupings.
This approach follows a local, greedy optimization strategy:
No guarantee of achieving the global optimum in accuracy.
No guarantee of obtaining the smallest possible tree.
In practice, the specific choice of impurity function has little effect on
the final classifier’s accuracy.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 48 / 104

When to Stop Splitting?

If the tree grows until each leaf has only one sample (minimum
impurity), it will be overfitted and fail to generalize.
If the tree is stopped too early, training error remains high and
performance suffers.
Common strategies to decide when to stop splitting:
1 Use cross-validation to determine optimal stopping.
2 Set a threshold on the impurity gradient.
3 Add a tree-complexity penalty term and minimize.
4 Apply a statistical test on the impurity gradient.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 49 / 104

Stopping Criterion Using Threshold β

Splitting is stopped if the best candidate split at a node reduces

impurity by less than a preset threshold β:

max ∆i(s) ≤ β
s

Benefits:
Unlike cross-validation, the tree is trained on the full dataset.
Leaf nodes can occur at different depths, which adapts to varying data
complexity.
Drawback:
Choosing an appropriate value for β is non-trivial.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 50 / 104

Stopping with a Complexity Term

Define a global criterion function that balances complexity and

accuracy: X
α · size + i(N)
leaf nodes

where:
size: number of nodes or links in the tree.
α: positive constant controlling the trade-off.
Splitting continues until this global criterion is minimized.
With entropy impurity, this measure is related to the Minimum
Description Length (MDL) principle.
The sum of leaf node impurities represents the uncertainty of the
training data given the tree model.
Drawback: The challenge is how to appropriately set the constant α.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 51 / 104

Stopping with a Complexity Term

Imagine trying to divide apples (class 1) and oranges (class 2).

A good split: All apples on one side, oranges on the other (very
informative).
Candidate split x1 < 0.25
A random split: Apples and oranges scattered left and right by chance
(not useful).

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 52 / 104

Statistical Testing for Stopping Splits

During tree construction, estimate the distribution of impurity

gradients ∆i across the current nodes.
For any candidate split, test if ∆i is significantly different from zero.
Possible approaches:
Use a Chi-squared test.
More generally, apply a hypothesis testing framework to check
whether a split is better than a random split.
Example: Suppose there are n samples at node N.
A candidate split s sends Pn samples to the left and (1 − P)n samples
to the right.
Under a random split:
Pn1 of ω1 samples go left.
Pn2 of ω2 samples go left.
(1 − P)n1 of ω1 samples go right.
(1 − P)n2 of ω2 samples go right.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 53 / 104

Chi-Squared Test for Splitting

The Chi-squared statistic measures how much a candidate split s

deviates from a random split:
2
2
X (niL − nie )2
χ =
nie
i=1

where:
niL = number of ωi patterns sent to the left under split s.
nie = Pni = expected number sent left under a random split.
Interpretation:
Larger χ2
greater deviation from random splitting.
If χ2 exceeds a critical value (based on significance level), reject the
null hypothesis of randomness.
In this case, accept split s as meaningful.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 54 / 104

Pruning in Decision Trees

Stopping-split criteria bias trees toward large impurity reductions near

the root.
These methods do not account for possible future splits deeper in the
tree.
Pruning is an alternative strategy:
First, grow the tree fully (exhaustive construction).
Then, consider all pairs of neighboring leaf nodes for elimination.
If eliminating the pair only slightly increases impurity, replace them
with their common ancestor node as a leaf.
Characteristics of pruning:
Often produces unbalanced trees.
Avoids the local nature of early stopping.
Uses the full training dataset.
Involves higher computational cost.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 55 / 104

Pruning in Decision Trees

Is Warm-blooded?

Yes No

Is it Pet? Snake (Leaf)

Dog (Leaf) Wolf (Leaf)

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 56 / 104

Pruning in Decision Trees

Is Warm-blooded?

Warm-blooded
Snake (Leaf)
Animal (Leaf)
Will we be able to distinguish Dog and Wolf then?

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 57 / 104

Decision Tree Effect of Noise

Consider 16 data points. Suppose the x2 value of the last point in the red
class is affected by some noise. As a result, two possible decision trees are
generated, with x2 values of 0.36 and 0.32.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 58 / 104

Decision Tree Effect of Noise

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 59 / 104

Decision Tree Effect of Noise

Note how the DT drastically changed due to the change in a single point.
Hence DT’s are highly susceptible to noise.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 60 / 104
Decision Tree Variable Selection

As we know from Ugly Duckling and various empirical evidence, the

selection of features will ultimately play a major role in accuracy, gen-
eralization, and complexity
Furthermore, the use of multiple variables in selecting a decision rule
may greatly improve the accuracy and generalization.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 61 / 104

Decision Tree Variable Selection

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 62 / 104

Decision Tree Variable Selection

Note how the DT drastically changed and improved the separability when
used multi-variate
Dr. Jayakrishnan decision criteria.
Ananadakrishnan 24ASD641 Pattern Recognition 63 / 104
Decision Tree Complexity

Training Time Complexity:

Root node: O(d · n log n) for sorting and evaluating splits
Balanced binary tree: O(d · n(log n)2 )
Prediction Time Complexity: O(log n) for balanced trees
Space Complexity:
Number of nodes ≈ 2n − 1 ⇒ O(n)
Includes memory for features, thresholds, and tree structure
Factors Affecting Complexity:

Number of features d more candidate splits

Number of samples n longer training, deeper tree

Tree depth more nodes, higher memory, risk of overfitting
Split type binary splits simpler; multi-way splits increase branching
Key Points:
Decision trees are fast for prediction but can be expensive to train
Complexity depends on samples, features, and tree depth

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 64 / 104

Decision Tree Points to Note

Problem Link + Additional Learning:

YouTube Video Link

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 65 / 104

Algorithm-Independent Machine Learning

Many pattern recognition algorithms exist, so people often ask: “Which

one is best?”
Some select algorithm with lower computational cost (faster)
Some select algorithm based on data (discrete vs continuous)
What if there are datasets for which these factors does not matter?
William of Ockham (or Occam), a 14th-century English philosopher
and theologian
Occam’s Razor is that the simpler models often generalize better.
Two models perform equally well on training data but Occam’s razor
suggests the simpler model will likely perform better on unseen data.
In physics, some laws (e.g., conservation of energy) hold regardless of
conditions. Now what about Pattern Recognition? Do we have general
principles holds regardless of algorithm?
We look for algorithm-independent rules that guide learning and clas-
sification.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 66 / 104
Algorithm-Independent Machine Learning

Bayes error is the theoretical minimum error a classifier can achieve.

Even the best classifier will have to face this (unavoidable classification
error).
Apples weigh 150–200g and Oranges weigh 180–230g. Overlap is there
in range 180–200g. Even the best classifier will suffer.
If your classifier’s error is close to the Bayes error, it’s nearly optimal.
Some techniques or principles in machine learning work regardless of
which algorithm you use.
Instead of focusing on one algorithm, you can evaluate, validate, or
improve models in a way that applies to all classifiers.
Example is k-fold cross validation, bagging, boosting, which can be
applied with any classifier.
Algorithm-independent methods let you measure and improve models
reliably, no matter which learning algorithm you use.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 67 / 104
Algorithm-Independent Machine Learning

Some techniques or principles in machine learning work regardless of

which algorithm you use.
Instead of focusing on one algorithm, you can evaluate, validate, or
improve models in a way that applies to all classifiers.
Example is k-fold cross validation, bagging, boosting, which can be
applied with any classifier.
Algorithm-independent methods let you measure and improve models
reliably, no matter which learning algorithm you use.
No classifier is inherently best. performance depends on type of prob-
lem, data, distribution and prior knowledge.
Lack of inherent superiority of any classifier

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 68 / 104

No Free Lunch Theorem

No learning algorithm is inherently superior to any other when

averaged over all possible problems.
Apparent superiority arises from:
The nature of the specific problem
Data distribution
Amount of training data
Cost/reward functions
No Free Lunch (NFL) says not to trust universal claims of one
algorithm is better than other.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 69 / 104

Implications of No Free Lunch

NFL emphasizes that no algorithm is better for generalization in every

case.
Error on unseen data is a better measure than training error (General-
ization).
Many algorithms can fit training data perfectly, but generalization dif-
fers.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 70 / 104

No Free Lunch Theorem – Formal Equations
Consider learning algorithms P(h|D) trained on dataset D. Let F be the
true target function and h(x) the hypothesis produced.
Many target function, many hypothesis
Expected generalization error:
XX
E [E |D] = P(x)[1 − δ(F (x), h(x))]P(h|D)P(F |D)
h,F x ∈D
/

δ(F (x), h(x)) = 1 if F (x) = h(x), else 0.

Measures alignment between algorithm hypothesis and true target.

Off-training-set error for a candidate algorithm Pk (h|D) :

X
Ek (E |F , n) = P(x)[1 − δ(F (x), h(x))]Pk (h(x)|D)
x ∈D
/

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 71 / 104

No Free Lunch Theorem
For any two learning algorithms P1 (h|D) and P2 (h|D), the following are
true, independent of the sampling distribution P(x) and the number of
training points n:
1 Uniformly averaged over all target functions F :

E1 (E | F , n) − E2 (E | F , n) = 0
2 For any fixed training set D, uniformly averaged over F :
E1 (E | F , D) − E2 (E | F , D) = 0
3 Uniformly averaged over all priors P(F ):
E1 (E | n) − E2 (E | n) = 0
4 For any fixed training set D, uniformly averaged over P(F ):
E1 (E | D) − E2 (E | D) = 0
P1 (h|D), P2 (h|D): Probability of hypotheses produced by the algorithms after training on D.
D: Training dataset (inputs and outputs).
h(x): Hypothesis (model) produced by algorithm.
F : True target function mapping inputs to outputs.
E1 (E |F , n), E2 (E |F , n): Expected generalization error of the algorithms on F with n training points.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 72 / 104
Understanding

Averaged over all target functions:

XX
P(D|F ) [E1 (E |F , n) − E2 (E |F , n)] = 0
F D

For fixed training set D:

X
[E1 (E |F , D) − E2 (E |F , D)] = 0
F

Algorithm performance depends on the problem and data distribution.

Focus on evaluation metrics and unseen data performance rather than
training accuracy.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 73 / 104

No Free Lunch Theorem – Illustration

D is training set remaining is test set.

In this case test errors of h1 and h2 are
0.4 and 0.6 respectively.
Clearly h1 is better.
Now think of 25 target functions

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 74 / 104

Ugly Duckling Theorem
Importance of domain knowledge
Without prior assumptions, all patterns are equally similar.
No feature set or representation is inherently “better.”
Feature importance or similarity depends on assumptions about the
problem or prior knowledge.
Helps avoid bias when designing classifiers.
Example:
Attributes: f1 = blind in right eye, f2 = blind in left eye
Person A: {1, 0}, Person B: {0, 1}, Person C: {1, 1}
Without assumptions, mathematically all pairs share the same
number of predicates
Hence, no principled reason to say A&B are more similar than A&C
Theorem Statement:
For a finite set of predicates, the number of shared predicates
between any two distinct patterns is constant.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 75 / 104
Ugly Duckling Theorem Example

Three people A, B, C. Features are f1 = blind in right eye, and f2 =

blind in left eye.
Feature vectors:
A: {1, 0}, B: {0, 1}, C: {1, 1}
Intuition (considering only f1 and f2 ):
A&C and B&C seem closer than A&B
Lets add more predicates (logical statements):
p3 : Blind in at least one eye
p4 : Blind in exactly one eye
p5 : Bot blind in both eyes
Shared predicates count:
A&B: 3, A&C: 2, B&C: 2
Without assuming which features matter, no pair is inherently
“closer” all pairs are equally similar when considering all predicates.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 76 / 104

Bias-Variance Trade-off: Overview

There is no universally best classifier; performance depends on the

problem distribution.
Two measures of match between algorithm and problem:
Bias: Accuracy of the model. High bias = poor match.
Variance: Precision of the model. High variance = weak match.
Bias and variance are interdependent (bias-variance trade-off).

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 77 / 104

Bias and Variance in Regression

Setup: Let F (x) be the true function with noise. Estimate it from a
dataset D using a model g (x; D).
The expected mean-square error over all training sets D is:
2
ED (g (x; D) − F (x))2 = (ED [g (x; D)] − F (x)) + ED (g (x; D) − ED [g (x; D)])2

| {z } | {z }
Bias2 Variance

Bias 2 + Variance + Noise Explanation of terms:

ED [·]: Expectation over all training sets D.
Bias2 : Square of difference between average prediction and true function.
Variance: How much predictions vary across different training sets.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 78 / 104

Bias-Variance Dilemma in Regression
9.3. BIAS AND VARIANCE 17
a) b) c) d)
g(x) = fixed g(x) = fixed g(x) = a0 + a1x + a0x2 +a3x3 g(x) = a0 + a1x
learned learned
y y y y

g(x) g(x)
F(x) F(x) F(x) g(x) F(x)
D1 g(x)

x x x x

y y y y

g(x) g(x)
F(x) F(x) F(x) F(x)
D2 g(x)
g(x)

x x x x

y y y y

g(x) g(x) F(x)

F(x) F(x) F(x)
D3
g(x)

g(x)
x x x x

p p p p

bias bias bias bias

E E E E
variance

variance

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition variance 79 / 104

Bias-Variance Dilemma in Regression

Low bias: Model fits data well on average.

Low variance: Model predictions do not change much across datasets.
Trade-off: Flexible models
lower bias but higher variance; rigid
models
higher bias, lower variance.
Example models:
a) Fixed linear model high bias, zero variance
b) Better fixed model lower bias, zero variance
c) Cubic model, trainable low bias, moderate variance
d) Linear model, trainable intermediate bias and variance

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 80 / 104

Bias and Variance in Classification

Two-class problem: y ∈ {0, 1}, with true discriminant function:

F (x) = Pr[y = 1 | x] = 1 − Pr[y = 0 | x]

Estimate g (x; D) minimizing mean-square error:

ED (g (x; D) − y )2

Boundary error: Probability that prediction differs from Bayes classifier:

R
 1/2 p(g (x; D)) dg F (x) ≥ 1/2
−∞
Pr[g (x; D) = yB ] = R
 ∞ p(g (x; D)) dg F (x) < 1/2
1/2

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 81 / 104

Boundary Bias and Variance in Classification

Assume p(g (x; D)) is Gaussian:

!
ED [g (x; D)] − 1/2
Pr[g (x; D) = yB ] = Φ sgn[F (x) − 1/2] p
Var[g (x; D)]
Explanation of terms:
Φ[t]: Standard normal cumulative distribution function
sgn[F (x) − 1/2]: Sign indicating which side of the decision boundary
ED [g (x; D)] − 1/2: Boundary bias
Var[g (x; D)]: Variance
Key insight: In classification, variance usually dominates bias; low
variance is crucial for accurate classification.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 82 / 104

Bias-Variance Dilemma in Classification
9.4. *RESAMPLING FOR ESTIMATING STATISTICS 21

truth

a) b) c)
2
σi1 σi12 σi1
( ) ( 0
)
2

Σi = σi21 σi2
2 Σi = 0 σi22
Σi = ( 1
0
0
1 )
low Bias high
x2 x2 x2

x1 x1 x1
x2 x2 x2

x2 x1 x x1 x x1
2 2

x1 x1 x1

x2 x2 x2
distributions
boundary

x1 x1 x1
p p p
histograms
error

EB E EB E EB E

high Variance low

Figure 9.5: The (boundary) bias-variance tradeoﬀ in classiﬁcation can be illustrated

with a two-dimensional Gaussian problem. The figure at the top shows the (true)
Figure 2
decision boundary of the Bayes classifier. The nine figures in the middle show nine
different learned decision boundaries. Each row corresponds to a different training set
of n = 8 points selected randomly from the true distributions and labeled according
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 83 / 104
Practical Implications

Flexible models (many parameters) lower bias, higher variance

Rigid models (few parameters) higher bias, lower variance
Large training sets reduce variance
Accurate prior knowledge reduces bias
Matching model complexity to the unknown true distribution is
critical for minimizing generalization error

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 84 / 104

Jackknife Resampling: Motivation
Estimating error for statistics beyond the mean (e.g., median, mode,
percentiles).
For the mean of a dataset D = {x1 , x2 , . . . , xn }:
n
1X
µ̂ = xi
n
i=1

Variance of the mean:

n
2 1 X
σ̂ = (xi − µ̂)2
n(n − 1)
i=1

For statistics like the median or mode, error estimation is not

straightforward.
Jackknife and Bootstrap resampling help generalize variance
estimation to any statistic.
Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 85 / 104
Leave-One-Out Concept

Remove one data point at a time and compute the statistic.

Leave-one-out mean:
1 X nx̄ − xi
µ(i) = xj =
n−1 n−1
j̸=i

This is the mean of the dataset if the i-th point is left out.
Repeat for all i = 1, . . . , n to get n leave-one-out estimates.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 86 / 104

Jackknife Estimate of the Mean

The jackknife estimate of the mean is the average of all leave-one-out

means:
n
1X
µ(·) = µ(i)
n
i=1

Interestingly, µ̂ = µ(·) , so the mean estimate remains the same.

The power of jackknife is in estimating variance for any statistic.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 87 / 104

Jackknife Estimate of Variance

Variance of the estimate:

n
n−1X
Var[µ̂] = (µ(i) − µ(·) )2
n
i=1

Equivalent to the traditional variance for the mean.

Advantage: can be generalized to other statistics.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 88 / 104

Generalization to Any Statistic

For any statistic θ̂ (median, mode, percentiles):

θ̂(i) = θ̂(x1 , . . . , xi−1 , xi+1 , . . . , xn )

n
1X
θ̂(·) = θ̂(i)
n
i=1

Compute statistic leaving out each data point in turn.

Jackknife variance:
n
n−1X
Var[θ̂] = (θ̂(i) − θ̂(·) )2
n
i=1

This method provides an error estimate for any statistic.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 89 / 104

Jackknife Bias Estimate: Introduction

Bias of an estimator:
bias = θ − E[θ̂]

Measures the difference between the true value θ and the expected
value of the estimator θ̂.
Jackknife can estimate bias for any statistic, not just the mean.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 90 / 104

Jackknife Bias: Computation

Sequentially delete each point xi from the dataset D and compute the
leave-one-out estimate θ̂(·) .
Jackknife estimate of bias:

biasjack = (n − 1) θ̂(·) − θ̂

Bias-corrected estimate of θ:

θ̃ = θ̂ − biasjack = nθ̂ − (n − 1)θ̂(·)

Benefit: Provides an approximately unbiased estimate of the true statistic.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 91 / 104

Jackknife Variance Estimate: Introduction

Traditional variance:

Var[θ̂] = E (θ̂ − E[θ̂])2

Measures how much the estimator θ̂ varies across different samples.

Jackknife provides an analogous way to estimate variance for any
statistic.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 92 / 104

Jackknife Variance: Computation

Jackknife variance estimate:

n
n−1X
Varjack [θ̂] = (θ̂(i) − θ̂(·) )2
n
i=1

θ̂(i) = statistic computed leaving out i-th data point.

θ̂(·) = n1 ni=1 θ̂(i) = average of leave-one-out estimates.
P

Provides a reliable estimate of the variance for arbitrary statistics.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 93 / 104

Jackknife Example: Mode Estimation

Dataset: D = {0, 10, 10, 10, 20, 20} (n = 6)

Goal: Estimate the mode of the dataset using jackknife.
Standard mode: θ̂ = 10 (most frequent value).
Jackknife considers leave-one-out resampling to provide bias and
variance estimates.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 94 / 104

Leave-One-Out Mode Estimates

Leave-one-out estimates: θ̂(i)

Compute mode after removing each data point:

θ̂(1) = 10, θ̂(2,3,4) = 15, θ̂(5,6) = 10

Note: When two peaks are equal, the mode is taken as the midpoint.
Jackknife estimate of the mode:
6
1X 10 + 15 + 15 + 15 + 10 + 10
θ̂(·) = θ̂(i) = = 12.5
6 6
i=1

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 95 / 104

Interpretation of Jackknife Mode

Standard mode θ̂ = 10 ignores skew in the distribution.

Jackknife estimate θ̂(·) = 12.5 accounts for the full skew.
The difference indicates the bias in the naive mode estimate.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 96 / 104

Jackknife Bias of the Mode

Bias estimate:

biasjack = (n − 1)(θ̂(·) − θ̂) = 5(12.5 − 10) = 12.5

This shows how much the naive mode underestimates the ”true
center” considering all points.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 97 / 104

Jackknife Variance of the Mode

Variance estimate:
n
n−1X
Varjack [θ̂] = (θ̂(i) − θ̂(·) )2
n
i=1

5h i
= (10 − 12.5)2 + 3(15 − 12.5)2 + 2(10 − 12.5)2 = 31.25
6
√
Standard deviation: 31.25 ≈ 5.6
Twice this width can be used as a tolerance range for the mode.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 98 / 104

Summary: Jackknife Mode Example

Dataset: D = {0, 10, 10, 10, 20, 20}

Mode: θ̂ = 10
Jackknife estimate: θ̂(·) = 12.5
Bias: 12.5, Variance: 31.25, Std. deviation: 5.6
p
Visualization: Histogram with red bar indicating ±2 Varjack shows
traditional mode lies within tolerance of jackknife estimate.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 99 / 104

Bootstrap Resampling: Introduction

Bootstrap: A resampling method to estimate statistics and their error.

Create a bootstrap dataset by randomly selecting n points from the
original dataset D with replacement.
Some points may be duplicated; some may be omitted.
Repeat this process independently B times to get B bootstrap
datasets.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 100 / 104

Bootstrap Estimate of a Statistic

Notation: θ̂∗ (b) = estimate of statistic θ on bootstrap sample b.

Bootstrap estimate of θ:
B
∗ 1 X ∗
θ̂(·) = θ̂ (b)
B
b=1

Essentially the mean of all B bootstrap estimates.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 101 / 104

Bootstrap Bias Estimate

Bias of the statistic:

∗
biasboot = θ̂(·) − θ̂

Measures the difference between the bootstrap mean and the original
estimate.
Can be applied to complex statistics like the trimmed mean.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 102 / 104

Bootstrap Variance Estimate

Variance of the statistic:

B
1 X ∗ ∗
2
Varboot [θ] = θ̂ (b) − θ̂(·)
B
b=1

For the mean, as B → ∞, Varboot converges to the traditional

variance of the mean.
Larger B
more accurate estimate; smaller B faster computation
but noisier estimate.
Advantage over jackknife: B can be adjusted based on computational
resources.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 103 / 104

Bootstrap vs Jackknife

Jackknife: Leave-one-out, requires exactly n repetitions,

deterministic.
Bootstrap: Random sampling with replacement, B repetitions,
adjustable.
Bootstrap works well for statistics that are difficult to analyze
analytically (e.g., median, trimmed mean, percentiles).
Jackknife is simpler but may underestimate variance for highly skewed
statistics.

Dr. Jayakrishnan Ananadakrishnan 24ASD641 Pattern Recognition 104 / 104

Linear Discriminant Functions Guide
No ratings yet
Linear Discriminant Functions Guide
41 pages
Linear Discriminant
No ratings yet
Linear Discriminant
25 pages
Hota ML LDF
No ratings yet
Hota ML LDF
28 pages
CpE646 9v3 PDF
No ratings yet
CpE646 9v3 PDF
45 pages
Supervised Machine Learning Guide
No ratings yet
Supervised Machine Learning Guide
74 pages
IIT Madras Notes Machine Learning
No ratings yet
IIT Madras Notes Machine Learning
13 pages
Linear Classifiers PPT 1
No ratings yet
Linear Classifiers PPT 1
14 pages
Bayesian
No ratings yet
Bayesian
21 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Probabilities in Linear Classification
No ratings yet
Probabilities in Linear Classification
40 pages
Pattern Recognition & Image Processing
No ratings yet
Pattern Recognition & Image Processing
24 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
Linear Models for Classification
No ratings yet
Linear Models for Classification
72 pages
Linear Discriminant Analysis Guide
No ratings yet
Linear Discriminant Analysis Guide
10 pages
Clustering
No ratings yet
Clustering
104 pages
Discriminant Functions in Bayes Theory
No ratings yet
Discriminant Functions in Bayes Theory
17 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
No ratings yet
Linear Classifier: Linear Discriminant Function: Compiled by Lakshmi Manasa, CED16I033
31 pages
Lect 13 - Bayes Decistion Theory - Derivation
No ratings yet
Lect 13 - Bayes Decistion Theory - Derivation
25 pages
Inf2b Learn Note10 2up
No ratings yet
Inf2b Learn Note10 2up
7 pages
Intro to Pattern Recognition
No ratings yet
Intro to Pattern Recognition
9 pages
Pattern Recognition Overview by Theodoridis
No ratings yet
Pattern Recognition Overview by Theodoridis
80 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Linear Models for Classification Techniques
No ratings yet
Linear Models for Classification Techniques
21 pages
Course MDA-12
No ratings yet
Course MDA-12
48 pages
PR 2 Unit
No ratings yet
PR 2 Unit
13 pages
Machine Learning Classifiers Guide
No ratings yet
Machine Learning Classifiers Guide
12 pages
Discriminant Functions in Classification
No ratings yet
Discriminant Functions in Classification
28 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
No ratings yet
Linear Discriminant Functions Lesson 26: Characterization of The Decision Boundary
7 pages
Week3 Summary Detail
No ratings yet
Week3 Summary Detail
9 pages
ML - Discriminant Functions
No ratings yet
ML - Discriminant Functions
17 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Minimum-Distance Pattern Classification
No ratings yet
Minimum-Distance Pattern Classification
104 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Lecture 6 Pattern Recognition Topic
No ratings yet
Lecture 6 Pattern Recognition Topic
15 pages
Linear Classifiers
No ratings yet
Linear Classifiers
48 pages
ML 41
No ratings yet
ML 41
49 pages
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
No ratings yet
Linear and Quadratic Discriminant Analysis: Tutorial: Benyamin Ghojogh
16 pages
Baysean Linear Quadratic Classifier
No ratings yet
Baysean Linear Quadratic Classifier
14 pages
ML 4
No ratings yet
ML 4
101 pages
Fish Species Classification Guide
No ratings yet
Fish Species Classification Guide
141 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
Linear Discriminat Analysis
No ratings yet
Linear Discriminat Analysis
23 pages
Inf2b Learn10 Notes Nup
No ratings yet
Inf2b Learn10 Notes Nup
6 pages
Lec 9
No ratings yet
Lec 9
15 pages
Chap12 DiscriminantAnalysis
No ratings yet
Chap12 DiscriminantAnalysis
30 pages
Chapter - 5 (New) PDF
No ratings yet
Chapter - 5 (New) PDF
17 pages
CSE 473: Linear Classifiers Overview
No ratings yet
CSE 473: Linear Classifiers Overview
43 pages
Linear Classifiers and Nearest Neighbors
No ratings yet
Linear Classifiers and Nearest Neighbors
15 pages
Two-Class Pattern Classification Tutorial
No ratings yet
Two-Class Pattern Classification Tutorial
14 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
Group 4
No ratings yet
Group 4
17 pages
Reviewer in Differential Calculus
No ratings yet
Reviewer in Differential Calculus
2 pages
Computational Thinking in The Primary Mathematics Classroom: A Systematic Review
No ratings yet
Computational Thinking in The Primary Mathematics Classroom: A Systematic Review
23 pages
Python Arrays vs. Lists Explained
No ratings yet
Python Arrays vs. Lists Explained
3 pages
Thermal Studies of Non-Circular Journal Bearing Profiles: Offset-Halves and Elliptical
No ratings yet
Thermal Studies of Non-Circular Journal Bearing Profiles: Offset-Halves and Elliptical
22 pages
Later Wittgenstein
No ratings yet
Later Wittgenstein
24 pages
Advanced Wave Physics Course
No ratings yet
Advanced Wave Physics Course
23 pages
Chapter-1 of F.mensuration
No ratings yet
Chapter-1 of F.mensuration
8 pages
Seismic Vulnerability Functions of Multi-Storey Buildings Estimation and Applications
No ratings yet
Seismic Vulnerability Functions of Multi-Storey Buildings Estimation and Applications
15 pages
Calculating The Stereotaxic Coordinates of Rat Brain
No ratings yet
Calculating The Stereotaxic Coordinates of Rat Brain
9 pages
Manual 06 en Sheeting-Check
No ratings yet
Manual 06 en Sheeting-Check
13 pages
Dissipation Factor
No ratings yet
Dissipation Factor
2 pages
B. P. Lathi, Zhi Ding - Modern Digital and Analog Communication Systems-Oxford University Press (2009)
No ratings yet
B. P. Lathi, Zhi Ding - Modern Digital and Analog Communication Systems-Oxford University Press (2009)
927 pages
Grading Rubric
No ratings yet
Grading Rubric
1 page
DAA Notes
No ratings yet
DAA Notes
272 pages
Special Types of Matrices
No ratings yet
Special Types of Matrices
2 pages
Stata Package for Entropy Balancing
No ratings yet
Stata Package for Entropy Balancing
18 pages
Chapter 6 Tutorial Solutions
No ratings yet
Chapter 6 Tutorial Solutions
3 pages
Grade X Holiday Homework Guide
No ratings yet
Grade X Holiday Homework Guide
12 pages
Eliminating Left Recursion in Grammar
No ratings yet
Eliminating Left Recursion in Grammar
10 pages
Review U1 L 1 Motion and Speed - Answer Key
No ratings yet
Review U1 L 1 Motion and Speed - Answer Key
5 pages
As 4 - Tugas 1 - 30201900087
No ratings yet
As 4 - Tugas 1 - 30201900087
10 pages
A Comprehensive Analysis of The Structure-Function Relationship in Proteins Based On Local Structure Similarity
No ratings yet
A Comprehensive Analysis of The Structure-Function Relationship in Proteins Based On Local Structure Similarity
9 pages
Kunii-Levenspiel Model for Fluidized Beds
No ratings yet
Kunii-Levenspiel Model for Fluidized Beds
28 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
41 pages
UPTU 2025 Odd Semester Exam Schedule
No ratings yet
UPTU 2025 Odd Semester Exam Schedule
74 pages
Water Treatment Fault Detection
No ratings yet
Water Treatment Fault Detection
6 pages
Solving Problems Involving Functions
No ratings yet
Solving Problems Involving Functions
17 pages
Example SBT All Branches Graph 08232024 ١
No ratings yet
Example SBT All Branches Graph 08232024 ١
163 pages