Pattern Recognition
Pattern Recognition
Category “A”
Category “B”
Classification vs Clustering
Category “A”
Clustering (unknown categories)
(Unsupervised Classification)
Category “B”
What is a Pattern?
• A pattern could be an object x1
x
or event. 2
x=.
• Typically, represented by a .
vector x of numbers. xn
• Problem Analysis:
▪ set up a camera and take some Sensing
sample images to extract
features Segmentation
▪ Consider features such as
length, lightness, width, and Feature Extraction
number shape of fins,
position of mouth, etc.
Length As A Discriminator
Lightness Width
Width And Lightness
•Treat features as a N-tuple (two-dimensional vector)
•Create a scatter plot
•Draw a line (regression) separating two classes
• However, our satisfaction is premature
because the central aim of designing a
classifier is to correctly classify novel input
Issue of generalization!
x1 x2 x1 and x2 x1 x2 x1 or x2 x1 x2 x1 xor x2
0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0
x1 x2 x1 and x2 x1 x2 x1 or x2 x1 x2 x1 xor x2
0 0 0 0 0 0 0 0 0
0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
1 1 1 1 1 1 1 1 0
x1 x1 x1
x2 x2 x2
Pattern Recognition
• Pattern Recognition involves the
development, study and application of a
broad range of tools and procedures for
analyzing raw data
– Machine Learning
– Data Mining
– Image Processing
– ...
Patterns
• PR algorithms classify “patterns” into “classes”
• Classes have a:
– Likelihood - P(ci)
– Class conditional density P(x|ci)
• Example:
– Four coins – penny, nickel, dime, and quarter
– Measurements – weight, color, size, . . .
– A given weight has a certain likelihood of being the class of
pennies, nickels, dimes, or quarters
Key Challenges
• Intra-class variability
• Inter-class variability
25
License Plate Recognition
License Plate Reading System
• Automatically detect and read the license plates on cars
29
Face Detection
Land Cover Classification
(from aerial or satellite images)
Gestalt Laws
1. Law of Proximity:
– Elements that are
closer together will be
perceived as a
coherent object.
– On the top, there
appears to be three
horizontal rows, while
on the bottom, the
grouping appears to be
columns
Gestalt Laws
• Law of Similarity:
– Elements that look
similar will be
perceived as part of
the same form.
– There seems to be a
triangle in the square.
Neisser example
• - Look for the “X”
OOPOPOPOP
POPPOPPPO
OOPPOXPOP
OOPOPOPOP
P O P P O P P PO
Neisser example
• Look for the “X”
NNZNZNZNZ
ZNZZNZZNN
NNNZNXNZN
NNZNZNZNZ
Z N Z Z N Z Z NN
Feature Vector
• What makes a “good” feature vector?
– The quality of a feature vector is related to its
ability to discriminate examples from different
classes
• Examples from the same class should have similar
feature values
• Examples from different classes have different feature
values
x xx x
x x
x
Good features Bad features
Feature Properties
• Features fall into two broad categories
– Linearly separable
– Non-linearly separable
xx x
x
x Linear Separability
x
x x
Non-linear Separability
The vertical face-finding part of Rowley, Baluja and Kanade’ssystem
Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley,
S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998,
copyright 1998, IEEE
LeNet is used to classify handwritten digits. Notice that the
test error rate is not the same as the training error rate, because
the test set consists of items not in the training set. Not all
classification schemes necessarily have small test error when they
have small training error.
Figure from “Gradient-Based Learning Appliedto Document
Recognition”, Y.Lecun et al Proc. IEEE, 1998 copyright1998, IEEE
Applications
• Pattern Recognition plays a role in a wide
range of computer tasks:
– Image Analysis
– Fingerprint Analysis
– Weather Forecasting
– Medical Diagnosis
– Computer Security
– Lie Detection
– ...
Simple Pattern Line Features
Recognition Character Vertical Horizontal Oblique Curved
Problem L 1 1 0 0
P 1 0 0 1
O 0 0 0 1
E 1 3 0 0
• Consider the problemof Q 0 0 1 1
recognizing the capital
letters: L, P,O, E, Q
– Determine a sufficient Start
set of features
– Design a tree-structured
y n
V>0?
classifier
y
y n O>0?
C>0?
Q n
P y n
H>1? O
E L
A Simple Neuron
• Here the top row is 2 errors away from a ‘T’ and 3 errors away
from an H. So the top output is a black.
• The middle row is 1 error away from both T and H, so the
output is random.
• The bottom row is 1 error away from T and 2 away from H.
Therefore the output is black.
• Since the input resembles a ‘T’ more than an ‘H’ the output of
the network is in favor of a ‘T’.
XOR
?
Input x 1 T =?
?
Output = x1 xor x2
? T =?
?
Input x2 T =?
?
x1 x2 x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
XOR
1
Input x 1 T =1
1
Output = x1 xor x2
-1 T =1
-1
Input x2 T =1
1
x1 x2 x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
Hamming Network
• Hamming distance of two vectors, x and y of
dimension n,
– Number of bits in disagreement.
– In bipolar:
x T y = i xi y i = a − d
where : a is number of bits in agreement in x and y
d is number of bits differring in x and y
d = n − a hamming distance
xT y = 2a − n
a = 0.5(xT y + n)
− d = a − n = 0.5(xT y + n) − n = 0.5(xT y) − 0.5n
(negative) distance between x and y can be
determied by xT y and n
Hamming Network
• Hamming network: net computes – d between an input
i and each of the P vectors i1,…, iP of dimension n
– n input nodes, P output nodes, one for each of P stored
vector ip whose output = – d(i, ip)
– Weights and bias:
1iT − n
W= 1 , = 1 ⁝
⁝
2 T 2 − n
iP
– Output of the net:
1 i1Ti − n
o =Wi + =
⁝ ,
2 T
iP i − n
where ok = 0.5(iTk i − n)is the negative distance between i and i k
• Example: i1 = ( 1,−1,−1, 1, 1)
– Three stored vectors: i2 = (−1, 1 −1, 1,−1)
i3 = ( 1,−1, 1,−1, 1)
– Input vector: i = ( 1, 1, 1,−1,−1)
– Distance: (4, 3, 2)
– Output vector o1 = 0.5[(1,−1,−1,1,1)(1,1,1,−1,−1) − 5]
= 0.5[(1−1 −1 −1 −1) − 5] = −4
o2 = 0.5[(−1, 1 −1,1,−1)(1,1,1,−1,−1) − 5]
= 0.5[(−1 + 1 −1 −1 + 1) − 5] = −3
o3 = 0.5[(1,−1, 1,−1, 1)(1,1,1,−1,−1) − 5]
= 0.5[(1−1 + 1 −1 + 1) − 5] = −2
– If we what the vector with smallest distance to I to win,
put a Maxnet on top of the Hamming net. We have a
associate memory: input pattern recalls the stored vector
that is closest to it.