0% found this document useful (0 votes)
45 views

Pattern Recognition

Uploaded by

Waseem Qassab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Pattern Recognition

Uploaded by

Waseem Qassab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Pattern Recognition

What is pattern recognition?


• A pattern is an object, process or event that
can be given a name.
• A pattern class (or category) is a set of
patterns sharing common attributes and
usually originating from the same source.
• During recognition (or classification) given
objects are assigned to prescribed classes.
• A classifier is a machine which performs
classification.
Pattern Recognition/Classification
• Assign an object or an event (pattern) to one
of several known categories (or classes).

Category “A”

Category “B”
Classification vs Clustering

Classification (known categories)


(Supervised Classification)

Category “A”
Clustering (unknown categories)
(Unsupervised Classification)

Category “B”
What is a Pattern?
• A pattern could be an object  x1 
x
or event.  2
x=. 
• Typically, represented by a . 
 
vector x of numbers. xn

biometric patterns hand gesture patterns


Pattern Class

• A collection of “similar” objects.

Female class Male class


Main Objectives

• Find a “way” to separate the data belonging to


different classes.
• Given “new” data, assign them to the closest
category.
Gender Classification
Recognition or Understanding?
• Which of these images are most scenic?

• How can we develop a system to automatically determine


scenic beauty? (Hint: feature combination)
• Solutions to such problems require good feature
extraction and good decision theory.
Image Processing Example
• Sorting Fish: incoming fish are
sorted according to species
using optical sensing (sea bass
or salmon?)

• Problem Analysis:
▪ set up a camera and take some Sensing
sample images to extract
features Segmentation
▪ Consider features such as
length, lightness, width, and Feature Extraction
number shape of fins,
position of mouth, etc.
Length As A Discriminator

• Conclusion: Length is a poor discriminator


Add Another Feature

•Lightness is a better feature than length because it reduces the


misclassification error.
•Can we combine features in such a way that we improve
performance?
• Threshold decision boundary and cost relationship

– Move our decision boundary toward smaller values of


lightness in order to minimize the cost (reduce the number
of sea bass that are classified salmon!)

Task of decision theory


• Adopt the lightness and add the width of the
fish

Fish xT = [x1, x2]

Lightness Width
Width And Lightness
•Treat features as a N-tuple (two-dimensional vector)
•Create a scatter plot
•Draw a line (regression) separating two classes
• However, our satisfaction is premature
because the central aim of designing a
classifier is to correctly classify novel input

Issue of generalization!

Ideally, the best decision boundary should be the


one which provides an optimal performance
such as in the next slide.
Overfitting and underfitting

Problem: how rich class of classifications q(x;θ) touse.

underfitting good fit overfitting

Problem of generalization: a small emprical risk Remp does not imply


small true expected risk R.
Generalization And Risk

•How much can we trust isolated


data points?

•Optimal decision surface is a line

•Optimal decision surface still a


line
Linearly Separable

x1 x2 x1 and x2 x1 x2 x1 or x2 x1 x2 x1 xor x2

0 0 0 0 0 0 0 0 0

0 1 0 0 1 1 0 1 1

1 0 0 1 0 1 1 0 1

1 1 1 1 1 1 1 1 0

A data set is linearly separable if you can separate one


example type from the other

Which of these are linearly separable?


Which of these are linearly separable?

x1 x2 x1 and x2 x1 x2 x1 or x2 x1 x2 x1 xor x2

0 0 0 0 0 0 0 0 0

0 1 0 0 1 1 0 1 1

1 0 0 1 0 1 1 0 1

1 1 1 1 1 1 1 1 0

x1 x1 x1

x2 x2 x2
Pattern Recognition
• Pattern Recognition involves the
development, study and application of a
broad range of tools and procedures for
analyzing raw data
– Machine Learning
– Data Mining
– Image Processing
– ...
Patterns
• PR algorithms classify “patterns” into “classes”

• Patterns have “measurements” or “features”

• Classes have a:
– Likelihood - P(ci)
– Class conditional density P(x|ci)

• Example:
– Four coins – penny, nickel, dime, and quarter
– Measurements – weight, color, size, . . .
– A given weight has a certain likelihood of being the class of
pennies, nickels, dimes, or quarters
Key Challenges

• Intra-class variability

The letter “T” in different typefaces

• Inter-class variability

Letters/Numbers that look similar


Pattern Recognition
Applications

25
License Plate Recognition
License Plate Reading System
• Automatically detect and read the license plates on cars

• Modules: (i) acquisition, (ii) enhancement, (iii)


segmentation, character recognition
• Should work in real time
Biometric Recognition
Fingerprint Classification

29
Face Detection
Land Cover Classification
(from aerial or satellite images)
Gestalt Laws

1. Law of Proximity:
– Elements that are
closer together will be
perceived as a
coherent object.
– On the top, there
appears to be three
horizontal rows, while
on the bottom, the
grouping appears to be
columns
Gestalt Laws

• Law of Similarity:
– Elements that look
similar will be
perceived as part of
the same form.
– There seems to be a
triangle in the square.
Neisser example
• - Look for the “X”

OOPOPOPOP
POPPOPPPO
OOPPOXPOP
OOPOPOPOP
P O P P O P P PO
Neisser example
• Look for the “X”

NNZNZNZNZ
ZNZZNZZNN
NNNZNXNZN
NNZNZNZNZ
Z N Z Z N Z Z NN
Feature Vector
• What makes a “good” feature vector?
– The quality of a feature vector is related to its
ability to discriminate examples from different
classes
• Examples from the same class should have similar
feature values
• Examples from different classes have different feature
values

x xx x
x x
x
Good features Bad features
Feature Properties
• Features fall into two broad categories
– Linearly separable
– Non-linearly separable
xx x
x
x Linear Separability
x
x x
Non-linear Separability
The vertical face-finding part of Rowley, Baluja and Kanade’ssystem
Figure from “Rotation invariant neural-network based face detection,” H.A. Rowley,
S. Baluja and T. Kanade, Proc. Computer Vision and Pattern Recognition, 1998,
copyright 1998, IEEE
LeNet is used to classify handwritten digits. Notice that the
test error rate is not the same as the training error rate, because
the test set consists of items not in the training set. Not all
classification schemes necessarily have small test error when they
have small training error.
Figure from “Gradient-Based Learning Appliedto Document
Recognition”, Y.Lecun et al Proc. IEEE, 1998 copyright1998, IEEE
Applications
• Pattern Recognition plays a role in a wide
range of computer tasks:
– Image Analysis
– Fingerprint Analysis
– Weather Forecasting
– Medical Diagnosis
– Computer Security
– Lie Detection
– ...
Simple Pattern Line Features
Recognition Character Vertical Horizontal Oblique Curved
Problem L 1 1 0 0
P 1 0 0 1
O 0 0 0 1
E 1 3 0 0
• Consider the problemof Q 0 0 1 1
recognizing the capital
letters: L, P,O, E, Q
– Determine a sufficient Start
set of features
– Design a tree-structured
y n
V>0?
classifier
y
y n O>0?
C>0?
Q n
P y n
H>1? O

E L
A Simple Neuron

• An artificial neuron is a device with many inputs and


one output.
• The neuron has two modes of operation: the training
mode and the using mode.
A Simple Neuron
• In the training mode, the neuron can be trained to fire (or
not), for particular input patterns.
• In the using mode, when a taught input pattern is detected at
the input, its associated output becomes the current output. If
the input pattern does not belong in the taught list of input
patterns, the firing rule is used to determine whether to fire or
not.
• The firing rule is an important concept in neural networks and
accounts for their high flexibility. A firing rule determines how
one calculates whether a neuron should fire for any input
pattern. It relates to all the input patterns, not only the ones
on which the node was trained on previously.
Pattern Recognition

• Suppose a network is trained to recognize the


patterns T and H. The associated patterns are
all black and all white respectively as shown
above.
Pattern Recognition

Since the input pattern looks more like a ‘T’, when


the network classifies it, it sees the input closely
resembling ‘T’ and outputs the pattern that
represents a ‘T’.
Pattern Recognition

The input pattern here closely resembles ‘H’


with a slight difference. The network in this
case classifies it as an ‘H’ and outputs the
pattern representing an ‘H’.
Pattern Recognition

• Here the top row is 2 errors away from a ‘T’ and 3 errors away
from an H. So the top output is a black.
• The middle row is 1 error away from both T and H, so the
output is random.
• The bottom row is 1 error away from T and 2 away from H.
Therefore the output is black.
• Since the input resembles a ‘T’ more than an ‘H’ the output of
the network is in favor of a ‘T’.
XOR
?
Input x 1 T =?
?
Output = x1 xor x2
? T =?
?

Input x2 T =?
?
x1 x2 x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
XOR
1
Input x 1 T =1
1
Output = x1 xor x2
-1 T =1
-1

Input x2 T =1
1
x1 x2 x1 xor x2
0 0 0
0 1 1
1 0 1
1 1 0
Hamming Network
• Hamming distance of two vectors, x and y of
dimension n,
– Number of bits in disagreement.
– In bipolar:
x T  y =  i xi  y i = a − d
where : a is number of bits in agreement in x and y
d is number of bits differring in x and y
d = n − a hamming distance
xT  y = 2a − n
a = 0.5(xT  y + n)
− d = a − n = 0.5(xT  y + n) − n = 0.5(xT  y) − 0.5n
(negative) distance between x and y can be
determied by xT  y and n
Hamming Network
• Hamming network: net computes – d between an input
i and each of the P vectors i1,…, iP of dimension n
– n input nodes, P output nodes, one for each of P stored
vector ip whose output = – d(i, ip)
– Weights and bias:
 1iT  − n 
W= 1 ,  = 1 ⁝ 

2  T  2  − n 
 iP   
– Output of the net:
1 i1Ti − n 
o =Wi +  =
⁝ ,
2  T 
 iP i − n 
where ok = 0.5(iTk i − n)is the negative distance between i and i k
• Example: i1 = ( 1,−1,−1, 1, 1)
– Three stored vectors: i2 = (−1, 1 −1, 1,−1)
i3 = ( 1,−1, 1,−1, 1)
– Input vector: i = ( 1, 1, 1,−1,−1)
– Distance: (4, 3, 2)
– Output vector o1 = 0.5[(1,−1,−1,1,1)(1,1,1,−1,−1) − 5]
= 0.5[(1−1 −1 −1 −1) − 5] = −4
o2 = 0.5[(−1, 1 −1,1,−1)(1,1,1,−1,−1) − 5]
= 0.5[(−1 + 1 −1 −1 + 1) − 5] = −3
o3 = 0.5[(1,−1, 1,−1, 1)(1,1,1,−1,−1) − 5]
= 0.5[(1−1 + 1 −1 + 1) − 5] = −2
– If we what the vector with smallest distance to I to win,
put a Maxnet on top of the Hamming net. We have a
associate memory: input pattern recalls the stored vector
that is closest to it.

You might also like