0% found this document useful (0 votes)

14 views

Lecture4 - Convnets For CV Slide

The document discusses convolutional neural network architectures. It covers topics like local connectivity, parameter sharing, convolution, pooling, and examples of convolutional networks applied to tasks like object detection and recognition.

Uploaded by

mohdharislcp

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Lecture4 - Convnets For CV Slide

Uploaded by

mohdharislcp

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

10707

Deep Learning: Spring 2021

Andrej Risteski
Machine Learning Department

Lecture 4:
Convolutional
architectures
Used Resources

Disclaimer: Some material in this lecture was borrowed from:

Hugo Larochelle’s class on Neural Networks:

https://sites.google.com/site/deeplearningsummerschool2016/

Rob Fergus’ CIFAR/MLSS tutorial on ConvNets:

http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf

Marc'Aurelio Ranzato’s CVPR 2014 tutorial on Convolutional Nets

https://sites.google.com/site/lsvrtutorialcvpr14/home/deeplearning
Neural networks for vision
Prototypical task in vision is object recognition: given an input
image, identify what kind of object it contains.

Are feedforward networks the right architecture for this?

Desiderata for networks for vision
֍ Inputs are very high-dimensional: 150 x 150 pixels = 22500 inputs,
or 3 x 22500 if RGB pixels instead of grayscale.
֍ Should leverage the spatial locality (in the pixel sense) of data
֍ Build in invariance to natural variations: translation, illumination,
etc.

Convolutional architectures are designed for this:

֍ Local connectivity (reflects spatial locality and decreases # params)
֍ Parameter sharing (further decreases # params)
֍ Convolution
֍ Pooling / subsampling hidden units
Local Connectivity
Use local connectivity of hidden units
֍ Each hidden unit is connected only to a sub-
region (patch) of the input image.

It is connected to all channels: 1 if grayscale, 3

(R, G, B) if color image

Why this is a good idea:

֍ Fully connected layer has a lot of parameters

to fit, which requires a lot of training data

֍ Image data isn’t arbitrary: neighboring pixels

are “meaningfully related” – e.g. if a node is
to be a “dog nose” detector – need to look at
small patch of pixels.
Decrease in # of parameters
Convolutional: 200x200 image, 40K
Fully connected: 200x200 image,
hidden units, window size 10x10,
40K hidden units, ~2B parameters!
~4M parameters!
Parameter Sharing
Prior approach makes weights sensitive to translations: e.g.

Might learn weights for But not learn weights

nose detector here for a nose detector here
Parameter Sharing
Share matrix of parameters across some units
֍ Units that are organized into the “feature map” share parameters

֍ Hidden units within a feature map cover different positions in the

image

same color
= Wij is the matrix connecting
same matrix of the ith input channel with the
connection jth feature map
Desiderata for networks for vision
Our goal is to design neural networks that are specifically adapted
for such problems

֍ Must deal with very high-dimensional inputs: 150 x 150 pixels =

22500 inputs, or 3 x 22500 if RGB pixels
֍ Can exploit the 2D topology of pixels (or 3D for video data)
֍ Can build in invariance to certain variations: translation,
illumination, etc.

Convolutional networks leverage these ideas

֍ Local connectivity
֍ Parameter sharing
֍ Convolution
֍ Pooling / subsampling hidden units
Discrete Convolution
Each feature map forms a 2D grid of features
Can be computed with a discrete convolution ( ) of a kernel matrix kij

- xi is the ith channel of input

- kij is the convolution kernel
- gj is a learned scaling factor
- yj is the hidden layer

Can add bias

Jarret et al. 2009
Discrete Convolution

Example:
Discrete Convolution

Example:
with rows and columns flipped
Discrete Convolution

Example: 1 x 0 + 0.5 x 80 + 0.25 x 20 + 0 x 40 = 45

Discrete Convolution

Example: 1 x 80 + 0.5 x 40 + 0.25 x 40 + 0 x 0 = 110

Discrete Convolution

Example: 1 x 20 + 0.5 x 40 + 0.25 x 0 + 0 x 0 = 40

Discrete Convolution

Example: 1 x 40 + 0.5 x 0 + 0.25 x 0 + 0 x 40 = 40

Example of a convolution
Adding non-linearity
With a non-linearity, we get a detector of a feature at any position in
the image:
Adding non-linearity
With a non-linearity, we get a detector of a feature at any position in
the image:
Example of ReLU non-linearity

From Rob Fergus tutorial (http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf)

Padding
Can use ‘‘zero padding’’ to allow going over the borders
The picture so far

From Rob Fergus tutorial (http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf)

Desiderata for networks for vision
Our goal is to design neural networks that are specifically adapted
for such problems

֍ Must deal with very high-dimensional inputs: 150 x 150 pixels =

22500 inputs, or 3 x 22500 if RGB pixels
֍ Can exploit the 2D topology of pixels (or 3D for video data)
֍ Can build in invariance to certain variations: translation,
illumination, etc.

Convolutional networks leverage these ideas

֍ Local connectivity
֍ Parameter sharing
֍ Convolution
֍ Pooling / subsampling hidden units
Pooling
Pool hidden units in same neighborhood
Pooling is performed in non-overlapping neighborhoods (subsampling)

Why pooling?
֍ Introduces invariance to local translations
֍ Reduces the number of hidden units in hidden layer
Example: Pooling

Can we make the detection robust to the

exact location of the eye?
Example: Pooling
By “pooling” (e.g., taking max) filter
responses at different locations we gain
robustness to the exact spatial location of
features.
Translation Invariance
Illustration of local translation invariance

Both images result in the same feature map after pooling/subsampling

Convolutional Network
Convolutional neural network alternates between
convolutional and pooling layers

From Yann LeCun’s slides

Generating Additional Examples
Elastic Distortions
Can add ‘‘elastic’’ deformations (useful in character recognition)

We can do this by applying a ‘‘distortion field’’ to the image

A distortion field specifies where to displace each pixel value

Bishop’s book
Elastic Distortions
Can add ‘‘elastic’’ deformations (useful in character recognition)

We can do this by applying a ‘‘distortion field’’ to the image

A distortion field specifies where to displace each pixel value

Bishop’s book
Elastic Distortions
Can add ‘‘elastic’’ deformations (useful in character recognition)

We can do this by applying a ‘‘distortion field’’ to the image

A distortion field specifies where to displace each pixel value

Bishop’s book
Conv Nets: Examples
Optical Character Recognition, House Number and Traffic Sign
classification
Conv Nets: Examples
Pedestrian detection

Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013

Conv Nets: Examples
Object Detection

Sermanet et al. “OverFeat: Integrated recognition, localization” arxiv 2013

Girshick et al. “Rich feature hierarchies for accurate object detection” arxiv 2013
Szegedy et al. “DNN for object detection” NIPS 2013
ImageNet Dataset
~14 million images, 20k classes

Examples of “Hammer”

Deng et al. “Imagenet: a large scale hierarchical image database” CVPR 2009
Important Breakthroughs
Deep Convolutional Nets for Vision (Supervised)
Krizhevsky, A., Sutskever, I. and Hinton, G. E., ImageNet Classification with Deep
Convolutional Neural Networks, NIPS, 2012.

~14 million images, 20k classes

Architecture

How can we select the “right” architecture:

Manual tuning of features is now replaced with the manual tuning of
architectures

֍ Depth
֍ Width
֍ Parameter count
How to Choose Architecture

Many hyper-parameters:
Number of layers, number of feature maps

֍ Cross Validation

֍ Grid Search (need lots of GPUs)

֍ Smarter Strategies

Random search [Bergstra & Bengio JMLR 2012]

Bayesian Optimization
Famous
architectures
AlexNet
8 layers total, ~60 million parameters Softmax Output

Layer 7: Full
Trained on Imagenet
dataset [Deng et al. CVPR’09] Layer 6: Full

18.2% top-5 error Layer 5: Conv + Pool

Layer 4: Conv

Layer 3: Conv

Layer 2: Conv + Pool

Layer 1: Conv + Pool

[From Rob Fergus’ CIFAR 2016 tutorial] Input Image

AlexNet: ablation study
Remove top fully connected layer 7 Softmax Output

Drop ~16 million parameters

Layer 6: Full
Only 1.1% drop in performance!
Layer 5: Conv + Pool

Layer 4: Conv

Layer 3: Conv

Layer 2: Conv + Pool

Layer 1: Conv + Pool

[From Rob Fergus’ CIFAR 2016 tutorial] Input Image

AlexNet: ablation study
Remove both fully connected layers 6,7 Softmax Output

Drop ~50 million parameters

5.7% drop in performance!

Layer 5: Conv + Pool

Layer 4: Conv

Layer 3: Conv

Layer 2: Conv + Pool

Layer 1: Conv + Pool

[From Rob Fergus’ CIFAR 2016 tutorial] Input Image

AlexNet: ablation study
Remove upper feature extractor layers Softmax Output

(Layers 3 & 4)
Layer 7: Full

Drop ~1 million parameters Layer 6: Full

3% drop in performance. Layer 5: Conv + Pool

Layer 2: Conv + Pool

Layer 1: Conv + Pool

[From Rob Fergus’ CIFAR 2016 tutorial] Input Image

AlexNet: ablation study
Let us remove upper feature extractor layers and Softmax Output

fully connected:

֍ Layers 3,4, 6 and 7

Layer 5: Conv + Pool

33.5% drop in performance!

Depth of the network is the key.

Layer 2: Conv + Pool

Layer 1: Conv + Pool

[From Rob Fergus’ CIFAR 2016 tutorial] Input Image

AlexNet: intermediate features

[From Rob Fergus’ CIFAR 2016 tutorial]

AlexNet: translation invariance

[From Rob Fergus’ CIFAR 2016 tutorial]

AlexNet: translation invariance

[From Rob Fergus’ CIFAR 2016 tutorial]

AlexNet: scaling invariance

[From Rob Fergus’ CIFAR 2016 tutorial]

AlexNet: rotation invariance

[From Rob Fergus’ CIFAR 2016 tutorial]

GoogLeNet
Issue: multiscale nature of images

https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202

Larger kernel good for global features, and smaller kernel for local features.

Idea: have multiple different-size kernels at any given level.

GoogLeNet

24-layer model that uses so-called inception module.

[Going Deep with Convolutions, Szegedy et al., arXiv:1409.4842, 2014]

GoogLeNet
GoogLeNet inception module:

֍ Multiple filter scales at each layer

number
of filters 1x1

3x3

5x5

[Going Deep with Convolutions, Szegedy et al., arXiv:1409.4842, 2014]

GoogLeNet
GoogLeNet inception module:

֍ Multiple filter scales at each layer

֍ Dimensionality reduction to keep computational requirements down

number
of filters 1x1

3x3

5x5

[Going Deep with Convolutions, Szegedy et al., arXiv:1409.4842, 2014]

GoogLeNet

֍ Width of inception modules ranges from 256 filters (in early modules)
to 1024 in top inception modules.
֍ Can remove fully connected layers on top completely
֍ Number of parameters is reduced to 5 million
֍ 6.7% top-5 validation error on Imagnet

[Going Deep with Convolutions, Szegedy et al., arXiv:1409.4842, 2014]

Residual Networks
Really, really deep convnets do not train well, E.g. CIFAR10:

Reason: gradients involve multiplications of a # of matrices proportional to depth.

Vanishing/exploding gradients: gradients get very small/very large.

Key idea: introduce “identity shortcut” connection, skipping

one or more layers.
Intuition: network can easily simulate shallower network
(at initialization, F is not too far from 0 map), so
performance should not degrade by going deeper.

[He, Zhang, Ren, Sun, CVPR 2016]

Residual Networks
Really, really deep convnets do not train well,
E.g. CIFAR10:

Key idea: introduce “identity

shortcut”

With ensembling, 3.57% top-5

test error on ImageNet

[He, Zhang, Ren, Sun, CVPR 2016]

Dense Convolutional Networks
Information in ResNets is only carried implicitly, through addition.
Idea: explicitly forward output of layer to *all* future layers (by concatenation).

Intuition: helps vanishing gradients; encourage reuse features (& hence reduce
parameter count);

Full architecture for Imagenet

[Huang, Liu, Weinberger, van der Maaten, CVPR 2017]
Debugging hints
֍ Check gradients numerically by finite differences

֍ Visualize features (feature maps need to be uncorrelated) and

have high variance

Good training: hidden units

are sparse across samples

[From Marc'Aurelio Ranzato, CVPR 2014 tutorial]

Debugging hints
֍ Check gradients numerically by finite differences

֍ Visualize features (feature maps need to be uncorrelated) and

have high variance

Bad training: many hidden

units ignore the input and/or
exhibit strong correlations

[From Marc'Aurelio Ranzato, CVPR 2014 tutorial]

Debugging hints
֍ Check gradients numerically by finite differences

֍ Visualize features (feature maps need to be uncorrelated) and

have high variance

֍ Visualize parameters: learned features should exhibit structure

and should be uncorrelated and are uncorrelated

[From Marc'Aurelio Ranzato, CVPR 2014 tutorial]

Debugging hints
֍ Check gradients numerically by finite differences

֍ Visualize features (feature maps need to be uncorrelated) and

have high variance

֍ Visualize parameters: learned features should exhibit structure

and should be uncorrelated and are uncorrelated

֍ Measure error on both training and validation set

֍ Test on a small subset of the data and check the error → 0.

[From Marc'Aurelio Ranzato, CVPR 2014 tutorial]

When it does not work
Training diverges:
֍ Learning rate may be too large → decrease learning rate
֍ Backprop is buggy → numerical gradient checking

Parameters collapse / loss is minimized but accuracy is low

֍ Check loss function: is it appropriate for the task you want to solve?
֍ Does it have degenerate solutions?

Network is underperforming
֍ Compute flops and nr. params. → if too small, make net larger
֍ Visualize hidden units/params → fix optimization

Network is too slow

֍ GPU, distrib. framework, make net smaller

[From Marc'Aurelio Ranzato, CVPR 2014 tutorial]

Atlas Copco Catalog
100% (2)
Atlas Copco Catalog
48 pages
Tank Venting Calculation Guides - Cheresources
No ratings yet
Tank Venting Calculation Guides - Cheresources
4 pages
CNN lec 1
No ratings yet
CNN lec 1
157 pages
CNN2
No ratings yet
CNN2
70 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Intro_DL_02
No ratings yet
Intro_DL_02
49 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Lec 10
No ratings yet
Lec 10
74 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
22
No ratings yet
22
17 pages
DL6 - Convnets 4
No ratings yet
DL6 - Convnets 4
57 pages
Neural Denoising For Deep Z Monte Carlo Renderings Paper
No ratings yet
Neural Denoising For Deep Z Monte Carlo Renderings Paper
18 pages
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
No ratings yet
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
20 pages
Guetreur
No ratings yet
Guetreur
22 pages
EECE 5639 Computer Vision I: Filtering, Probability Review
No ratings yet
EECE 5639 Computer Vision I: Filtering, Probability Review
54 pages
Reversible_Column_Networks
No ratings yet
Reversible_Column_Networks
24 pages
Lec9 CNN 25jan18
No ratings yet
Lec9 CNN 25jan18
111 pages
CS585 Lecture October15th
No ratings yet
CS585 Lecture October15th
162 pages
dl_mod4
No ratings yet
dl_mod4
18 pages
Dip Unit 4
No ratings yet
Dip Unit 4
58 pages
Unit- 4 Deep Learning (1)
No ratings yet
Unit- 4 Deep Learning (1)
14 pages
Icml09 ConvolutionalDeepBeliefNetworks
No ratings yet
Icml09 ConvolutionalDeepBeliefNetworks
8 pages
Module 4 Continued
No ratings yet
Module 4 Continued
244 pages
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
No ratings yet
Convolutional Neural Networks: Computer Vision CS 543 / ECE 549 University of Illinois Jia-Bin Huang
76 pages
DenseCap - Fully Convolutional Localization Networks For Dense Captioning
No ratings yet
DenseCap - Fully Convolutional Localization Networks For Dense Captioning
10 pages
CVlecture 6
No ratings yet
CVlecture 6
33 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
8-Image Detection and Segmentation
No ratings yet
8-Image Detection and Segmentation
73 pages
3. Varience of Basic Convolution Function
No ratings yet
3. Varience of Basic Convolution Function
22 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Res Net 2
No ratings yet
Res Net 2
40 pages
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
No ratings yet
Neural Networks and Deep Learning (PE - V) (18CSE23) Unit - 4
11 pages
INFO AI Ch4
No ratings yet
INFO AI Ch4
90 pages
Dense Captioning - Public
No ratings yet
Dense Captioning - Public
53 pages
Theory Multiresolution Signal Decomposition: The Wavelet Representation
No ratings yet
Theory Multiresolution Signal Decomposition: The Wavelet Representation
20 pages
Presentation-2 CDVAE(05-31-2024)
No ratings yet
Presentation-2 CDVAE(05-31-2024)
33 pages
Nips Eyebm
No ratings yet
Nips Eyebm
9 pages
DLT Unit-4
No ratings yet
DLT Unit-4
25 pages
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
78 pages
Chapter Convolutional Neural Networks
No ratings yet
Chapter Convolutional Neural Networks
7 pages
DL_Unit III
No ratings yet
DL_Unit III
43 pages
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
No ratings yet
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
14 pages
Amazon Inventory Reconciliation Using AI: ST ND RD
No ratings yet
Amazon Inventory Reconciliation Using AI: ST ND RD
6 pages
IFAN Slides
No ratings yet
IFAN Slides
24 pages
L09 Convolutional Networks
No ratings yet
L09 Convolutional Networks
9 pages
12. Object Detection-compressed
No ratings yet
12. Object Detection-compressed
80 pages
Robust Watermarking Techniques
No ratings yet
Robust Watermarking Techniques
54 pages
Cnnbasics 171028092801
No ratings yet
Cnnbasics 171028092801
43 pages
visualProcessing
No ratings yet
visualProcessing
25 pages
Comprehensive Notes on Advanced CNN Concepts & Vision Tasks
No ratings yet
Comprehensive Notes on Advanced CNN Concepts & Vision Tasks
5 pages
Understanding The Graphics Pipeline
No ratings yet
Understanding The Graphics Pipeline
35 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
19 pages
Exam Questions by Subject
No ratings yet
Exam Questions by Subject
15 pages
Adaptiv_Kernel_Upsampl
No ratings yet
Adaptiv_Kernel_Upsampl
4 pages
RBF.ppt
No ratings yet
RBF.ppt
45 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Curriculum PDF
No ratings yet
Curriculum PDF
3 pages
History of Mountain Province
No ratings yet
History of Mountain Province
7 pages
USR8BX-Manual 12-05
No ratings yet
USR8BX-Manual 12-05
65 pages
Management of Postoperative Enterocutaneous Fi Stulae in Children A Decade Experience in A Single Centre
No ratings yet
Management of Postoperative Enterocutaneous Fi Stulae in Children A Decade Experience in A Single Centre
8 pages
A Novel Dataset of Guava Fruit For Grading and Classification
No ratings yet
A Novel Dataset of Guava Fruit For Grading and Classification
5 pages
Syllabus Gtu
No ratings yet
Syllabus Gtu
3 pages
Mag5100 Mcerts
No ratings yet
Mag5100 Mcerts
5 pages
For Service Technician'S Use Only: Voltage Measurement Safety Information
No ratings yet
For Service Technician'S Use Only: Voltage Measurement Safety Information
48 pages
Pathology Assignment 3
No ratings yet
Pathology Assignment 3
3 pages
Infographics Global Gateway Flagship Projects 2023 2024 Eu Africa - en
No ratings yet
Infographics Global Gateway Flagship Projects 2023 2024 Eu Africa - en
6 pages
Buck Rogers Battle For The 25 Century: Object
No ratings yet
Buck Rogers Battle For The 25 Century: Object
6 pages
Lotader: Ethylene - Acrylic Ester - Maleic Anhydride Terpolymer
No ratings yet
Lotader: Ethylene - Acrylic Ester - Maleic Anhydride Terpolymer
2 pages
Tough NUTS From 07-Jan AITS
No ratings yet
Tough NUTS From 07-Jan AITS
5 pages
Responsible and Safe Driving With Off-Road Vehicles (4x4) : Program Description
No ratings yet
Responsible and Safe Driving With Off-Road Vehicles (4x4) : Program Description
2 pages
RogawskiET3E InstructorsSolutionsManual ch05
No ratings yet
RogawskiET3E InstructorsSolutionsManual ch05
180 pages
Alos Palsar: Technical Outline and Mission Concepts
No ratings yet
Alos Palsar: Technical Outline and Mission Concepts
7 pages
TIDEL Park Parag
100% (2)
TIDEL Park Parag
3 pages
Rate Analysis For Concrete Production at Site
100% (1)
Rate Analysis For Concrete Production at Site
5 pages
9204 BP12A MMJV SDW FE T1 000 66654 Rev.000
No ratings yet
9204 BP12A MMJV SDW FE T1 000 66654 Rev.000
1 page
Star Rating of CFL in Sri Lanka
No ratings yet
Star Rating of CFL in Sri Lanka
3 pages
9.1 HSE Inspection Procedure
No ratings yet
9.1 HSE Inspection Procedure
5 pages
Sugar Common Pool Inventory 2007-2010
No ratings yet
Sugar Common Pool Inventory 2007-2010
17 pages
2nd Lec
No ratings yet
2nd Lec
15 pages
Table 4.30.14.5 (B) Conversion Table of Polyphase Design
No ratings yet
Table 4.30.14.5 (B) Conversion Table of Polyphase Design
1 page
Teasing Pleasing Family
No ratings yet
Teasing Pleasing Family
71 pages
Au umn Leaves: 뚱치땅치 Fingerstyli…
No ratings yet
Au umn Leaves: 뚱치땅치 Fingerstyli…
1 page
Optimal Operation of Water Distribution Networks by Predictive Control Using MINLP
No ratings yet
Optimal Operation of Water Distribution Networks by Predictive Control Using MINLP
12 pages
ACB Dan FIBC
No ratings yet
ACB Dan FIBC
21 pages