Introduction
Introduction
Recognition
Chapter 1 (Duda et al.)
1
Text
2
What is a Pattern?
3
What is Pattern Recognition?
4
What is a Pattern? (con’t)
• Loan/Credit card applications
• Income, # of dependents, mortgage amount credit worthiness
classification.
• Dating services
• Age, hobbies, income “desirability” classification
• Web documents
• Key-word based descriptions (e.g., documents containing “football”,
“NFL”) document classification.
5
Pattern Class
• A collection of “similar” objects.
Female Male
6
How do we model a Pattern Class?
• Typically, using a statistical model.
• probability density function (e.g., Gaussian)
male
Gender Classification female
7
How do we model a Pattern Class?
(cont’d)
• Key challenges:
• Intra-class variability
• Inter-class variability
9
Classification vs Clustering
• Classification (known categories)
• Clustering (unknown categories)
Category “A”
Category “B”
Clustering
Classification (Recognition) (Unsupervised Classification)
(Supervised Classification)
10
Pattern Recognition
Applications
11
Handwriting Recognition
12
Handwriting Recognition (cont’d)
13
License Plate Recognition
14
Biometric Recognition
15
Fingerprint Classification
16
Face Detection
17
Gender Classification
18
Autonomous Systems
19
Medical Applications
20
Land Classification
(from aerial or satellite images)
21
“Hot” Applications
• Recommendation systems
• Amazon, Netflix
• Targeted advertising
22
The Netflix Prize
23
Main Classification Approaches
x: input vector (pattern)
•Generative
– Model the joint probability, p(x, y)
– Make predictions by using Bayes rules to calculate p(ylx)
– Pick the most likely label y
•Discriminative
– Estimate p(ylx) directly (e.g., learn a direct map from inputs x to the
class labels y)
– Pick the most likely label y
24
Complexity of PR – An Example
Problem: Sorting
incoming fish on a
conveyor belt.
Assumption: Two
kind of fish:
(1) sea bass
(2) salmon
26
Pre-processing Step
Example
27
Feature Extraction
• Assume a fisherman told us that a sea bass is
generally longer than a salmon.
28
“Length” Histograms
threshold l*
29
“Average Lightness” Histograms
• Consider a different feature such as “average
lightness”
threshold x*
• It seems easier to choose the threshold x* but we
still cannot make a perfect decision.
30
Multiple Features
• To improve recognition accuracy, we might have to
use more than one features at a time.
• Single features might not yield the best performance.
• Using combinations of features might yield better
performance.
x1 x1 : lightness
x x : width
2 2
• How many features should we choose?
31
Classification
• Partition the feature space into two regions by
finding the decision boundary that minimizes the
error.
32
PR System – Two Phases
Test Phase Training Phase
33
Sensors & Preprocessing
• Sensing:
• Use a sensor (camera or microphone) for data capture.
• PR depends on bandwidth, resolution, sensitivity, distortion of the sensor.
• Pre-processing:
• Removal of noise in data.
• Segmentation (i.e., isolation of patterns of interest from background).
34
Training/Test data
• How do we know that we have collected an adequately large and
representative set of examples for training/testing the system?
35
Feature Extraction
• How to choose a good set of features?
• Discriminative features
36
How Many Features?
• Does adding more features always improve performance?
• It might be difficult and computationally expensive to extract certain
features.
• Correlated features might not improve performance.
• “Curse” of dimensionality.
37
Curse of Dimensionality
• Adding too many features can, paradoxically, lead to a
worsening of performance.
• Divide each of the input features into a number of intervals, so that
the value of a feature can be specified approximately by saying in
which interval it lies.
38
Missing Features
• Certain features might be missing (e.g., due to occlusion).
• How should we train the classifier with missing features ?
• How should the classifier make the best decision with missing
features ?
39
Complexity
• We can get perfect classification performance on the
training data by choosing complex models.
• Complex models are tuned to the particular training
samples, rather than on the characteristics of the true
model.
overfitting
41
More on model complexity
• Consider the following 10 sample points (blue circles)
assuming some noise.
• Green curve is the true function that generated the
data.
42
More on model complexity
(cont’d)
Polynomial curve fitting: polynomials having various
orders, shown as red curves, fitted to the set of 10 sample
points.
43
More on complexity (cont’d)
Polynomial curve fitting: 9’th order polynomials fitted to
15 and 100 sample points.
44
Ensembles of Classifiers
• Performance can be improved using a
"pool" of classifiers.
45
PR System (cont’d)
• Post-processing:
• Exploit context to improve performance.
46
Cost of miss-classifications
47
Cost of miss-classifications (cont’d)
48
Computational Complexity
• How does an algorithm scale with the number of:
• features
• patterns
• categories
• Consider tradeoffs between computational complexity and
performance.
49
Would it be possible to build a
“general purpose” PR system?
• Humans have the ability to switch rapidly and
seamlessly between different pattern recognition
tasks.
• It is very difficult to design a system that is capable
of performing a variety of classification tasks.
• Different decision tasks may require different features.
• Different features might yield different solutions.
• Different tradeoffs exist for different tasks.
50