0% found this document useful (0 votes)
6 views

Lecture25 Spring2018

Uploaded by

Balaji Nadipilli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture25 Spring2018

Uploaded by

Balaji Nadipilli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Deep learning for visual

recognition
Tues April 23
Kristen Grauman
UT Austin
Last time
• Supervised classification continued
• Nearest neighbors
• Support vector machines
• HoG pedestrians example
• Kernels
• Multi-class from binary classifiers
Recalll: Examples of kernel functions
 Linear: T
K ( xi , x j )  xi x j

2
xi  x j
 Gaussian RBF: K ( xi ,x j )  exp( )
2 2

 Histogram intersection:
K ( xi , x j )   min( xi (k ), x j (k ))
k

• Kernels go beyond vector space data


• Kernels also exist for “structured” input spaces like
sets, graphs, trees…
Discriminative classification with
sets of features?
• Each instance is unordered set of vectors
• Varying number of vectors per instance

Slide credit: Kristen Grauman


Partially matching sets of features

Optimal match: O(m3)


Greedy match: O(m2 log m)
Pyramid match: O(m)

(m=num pts)

We introduce an approximate matching kernel that


makes it practical to compare large sets of features
based on their partial correspondences.

[Previous work: Indyk & Thaper, Bartal, Charikar, Agarwal &


Varadarajan, …]
Slide credit: Kristen Grauman
Pyramid match: main idea

Feature space partitions


serve to “match” the local
descriptors within
successively wider regions.

descriptor
space

Slide credit: Kristen Grauman


Pyramid match: main idea

Histogram intersection
counts number of possible
matches at a given
partitioning.
Slide credit: Kristen Grauman
Pyramid match

measures number of newly matched


difficulty of a pairs at level
match at level

• For similarity, weights inversely proportional to bin size


(or may be learned)
• Normalize these kernel values to avoid favoring large sets

[Grauman & Darrell, ICCV 2005] Slide credit: Kristen Grauman


Pyramid match
Optimal match: O(m3)
Pyramid match: O(mL)

optimal partial
matching
The Pyramid Match Kernel: Efficient
Learning with Sets of Features. K.
Grauman and T. Darrell. Journal of
Machine Learning Research (JMLR), 8
(Apr): 725--760, 2007.
BoW Issue:
No spatial layout preserved!

Too much? Too little?

Slide credit: Kristen Grauman


Spatial pyramid match
• Make a pyramid of bag-of-words histograms.
• Provides some loose (global) spatial layout
information

[Lazebnik, Schmid & Ponce, CVPR 2006]


Spatial pyramid match
• Make a pyramid of bag-of-words histograms.
• Provides some loose (global) spatial layout
information

Sum over PMKs


computed in image
coordinate space,
one per word.

[Lazebnik, Schmid & Ponce, CVPR 2006]


Spatial pyramid match
• Can capture scene categories well---texture-like patterns
but with some variability in the positions of all the local
pieces.
Spatial pyramid match
• Can capture scene categories well---texture-like patterns
but with some variability in the positions of all the local
pieces.
• Sensitive to global shifts of the view

Confusion table
Today
• (Deep) Neural networks
• Convolutional neural networks
Traditional Image Categorization: Training phase

Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Slide credit: Jia-Bin Huang


Traditional Image Categorization: Testing phase

Training Training
Images
Training Labels

Image Classifier Trained


Features Training Classifier

Testing
Image Trained Prediction
Features Classifier Outdoor
Test Image Slide credit: Jia-Bin Huang
Features have been key

SIFT [Lowe IJCV 04] HOG [Dalal and Triggs CVPR 05]

SPM [Lazebnik et al. CVPR 06] Textons

and many others:


SURF, MSER, LBP, Color-SIFT, Color histogram, GLOH, …..
Learning a Hierarchy of Feature Extractors

• Each layer of hierarchy extracts features from output


of previous layer
• All the way from pixels  classifier
• Layers have the (nearly) same structure

Image/video
Image/Video Labels
Simple
Pixels Layer 1 Layer 2 Layer 3
Classifier

• Train all layers jointly


Slide: Rob Fergus
Learning Feature Hierarchy
Goal: Learn useful higher-level features from images
Feature representation

3rd layer
Input data “Objects”

2nd layer
“Object parts”

1st layer
“Edges”
Lee et al., ICML
2009; CACM 2011
Pixels

Slide: Rob Fergus


Learning Feature Hierarchy
• Better performance

• Other domains (unclear how to hand engineer):


– Kinect
– Video
– Multi spectral

• Feature computation time


– Dozens of features regularly used [e.g., MKL]
– Getting prohibitive for large datasets (10’s sec /image)

Slide: R. Fergus
Biological neuron and Perceptrons

A biological neuron An artificial neuron (Perceptron)


- a linear classifier

Slide credit: Jia-Bin Huang


Simple, Complex and Hypercomplex cells

David H. Hubel and Torsten Wiesel

Suggested a hierarchy of feature detectors


in the visual cortex, with higher level features
responding to patterns of activation in lower
level cells, and propagating activation
upwards to still higher level cells.

David Hubel's Eye, Brain, and Vision Slide credit: Jia-Bin Huang
Hubel/Wiesel Architecture and Multi-layer Neural Network

Hubel and Weisel’s architecture Multi-layer Neural Network


- A non-linear classifier

Slide credit: Jia-Bin Huang


Neuron: Linear Perceptron
 Inputs are feature values
 Each feature has a weight
 Sum is the activation

 If the activation is:


 Positive, output +1
 Negative, output -1

Slide credit: Pieter Abeel and Dan Klein


Two-layer perceptron network

Slide credit: Pieter Abeel and Dan Klein


Two-layer perceptron network

Slide credit: Pieter Abeel and Dan Klein


Two-layer perceptron network

Slide credit: Pieter Abeel and Dan Klein


Learning w
 Training examples

 Objective: a misclassification loss

 Procedure:
 Gradient descent / hill climbing

Slide credit: Pieter Abeel and Dan Klein


Hill climbing
 Simple, general idea:
 Start wherever
 Repeat: move to the best
neighboring state
 If no neighbors better than
current, quit
 Neighbors = small
perturbations of w
 What’s bad?
 Complete?
 Optimal?

Slide credit: Pieter Abeel and Dan Klein


Two-layer perceptron network

Slide credit: Pieter Abeel and Dan Klein


Two-layer perceptron network

Slide credit: Pieter Abeel and Dan Klein


Two-layer neural network

Slide credit: Pieter Abeel and Dan Klein


Neural network properties
 Theorem (Universal function approximators): A
two-layer network with a sufficient number of
neurons can approximate any continuous
function to any desired accuracy

 Practical considerations:
 Can be seen as learning the features
 Large number of neurons
 Danger for overfitting
 Hill-climbing procedure can get stuck in bad local
optima
Approximation by Superpositions of Sigmoidal Function,1989 Slide credit: Pieter Abeel and Dan Klein
Today
• (Deep) Neural networks
• Convolutional neural networks
Significant recent impact on the field

Big labeled Deep learning


datasets
ImageNet top-5 error (%)
30
25
GPU technology 20
15
10
5
0
1 2 3 4 5 6
Slide credit: Dinesh Jayaraman
Convolutional Neural Networks
(CNN, ConvNet, DCN)

• CNN = a multi-layer neural network with


– Local connectivity:
• Neurons in a layer are only connected to a small region
of the layer before it
– Share weight parameters across spatial positions:
• Learning shift-invariant filter kernels

Image credit: A. Karpathy


Jia-Bin Huang and Derek Hoiem, UIUC
LeNet [LeCun et al. 1998]

Gradient-based learning applied to document


recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-1 from 1993
Jia-Bin Huang and Derek Hoiem, UIUC
What is a Convolution?
• Weighted moving sum

.
.
.

Input Feature Activation Map


slide credit: S. Lazebnik
Convolutional Neural Networks
Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution
(Learned)

Input Image slide credit: S. Lazebnik


Convolutional Neural Networks
Feature maps

Normalization

Spatial pooling

Non-linearity
.
.
Convolution .
(Learned)

Input Feature Map


Input Image slide credit: S. Lazebnik
Convolutional Neural Networks
Feature maps

Normalization Rectified Linear Unit (ReLU)

Spatial pooling

Non-linearity

Convolution
(Learned)

Input Image slide credit: S. Lazebnik


Convolutional Neural Networks
Feature maps

Normalization
Max pooling

Spatial pooling

Non-linearity

Convolution Max-pooling: a non-linear down-sampling


(Learned)
Provide translation invariance
Input Image slide credit: S. Lazebnik
Convolutional Neural Networks
Feature maps

Normalization

Spatial pooling

Non-linearity

Convolution
(Learned)

Input Image slide credit: S. Lazebnik


Engineered vs. learned features
Label
Convolutional filters are trained in a Dense
supervised manner by back-propagating
Dense
classification error
Dense

Convolution/pool

Label Convolution/pool

Classifier Convolution/pool

Pooling Convolution/pool

Feature extraction Convolution/pool

Image Image
Jia-Bin Huang and Derek Hoiem, UIUC
SIFT Descriptor
Lowe [IJCV 2004]
Image
Pixels Apply
oriented filters

Spatial pool
(Sum)

Feature
Normalize to unit Vector
length
slide credit: R. Fergus
Spatial Pyramid Matching
Lazebnik,
Schmid,
SIFT Filter with Ponce
Features Visual Words [CVPR 2006]

Max

Multi-scale
spatial pool Classifier
(Sum)
slide credit: R. Fergus
Visualizing what was learned
• What do the learned filters look like?

Typical first layer filters


https://www.wired.com/2012/06/google-x-neural-network/
Application: ImageNet

• ~14 million labeled images, 20k classes

• Images gathered from Internet

• Human labels via Amazon Turk

[Deng et al. CVPR 2009]

https://sites.google.com/site/deeplearningcvpr2014 Slide: R. Fergus


AlexNet
• Similar framework to LeCun’98 but:
• Bigger model (7 hidden layers, 650,000 units, 60,000,000 params)
• More data (106 vs. 103 images)
• GPU implementation (50x speedup over CPU)
• Trained on two GPUs for a week

A. Krizhevsky, I. Sutskever, and G. Hinton,


ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012
Jia-Bin Huang and Derek Hoiem, UIUC
ImageNet Classification Challenge

AlexNet

http://image-net.org/challenges/talks/2016/ILSVRC2016_10_09_clsloc.pdf
Industry Deployment
• Used in Facebook, Google, Microsoft
• Image Recognition, Speech Recognition, ….
• Fast at test time

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face


Verification, CVPR’14
Slide: R. Fergus
Recap
• Neural networks / multi-layer perceptrons
– View of neural networks as learning hierarchy of
features
• Convolutional neural networks
– Architecture of network accounts for image
structure
– “End-to-end” recognition from pixels
– Together with big (labeled) data and lots of
computation  major success on benchmarks,
image classification and beyond

You might also like