0% found this document useful (0 votes)
22 views88 pages

Convolutional Neural Networks

Uploaded by

duckv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views88 pages

Convolutional Neural Networks

Uploaded by

duckv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Convolutional Neural Networks

Eunbyung Park
Assistant Professor
Department of Artificial Intelligence
Eunbyung Park (silverbottlep.github.io)
Convolution
1D Convolution
• Convolution is a mathematical operation on two functions (𝑓 , 𝑔) that produces a third
function 𝑓 ∗ 𝑔


𝑓 ∗ 𝑔 𝑡 ≔ න 𝑓 𝑡 − 𝜏 𝑔 𝜏 𝑑𝜏
−∞

𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
𝜏
1D Convolution 𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
• Flip the filter and sliding 𝜏

3 𝑓[𝑡]
1 2

−1 𝑡
𝑔[𝑡]
1 2 1

0⋅1+0⋅2+1⋅1=1 𝑓 ∗ 𝑔 [𝑡]

𝑡
1D Convolution 𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
• Flip the filter and sliding 𝜏

3 𝑓[𝑡]
1 2

−1 𝑡
𝑔[𝑡]
1 2 1

0⋅1+1⋅2+3⋅1=5

𝑓 ∗ 𝑔 [𝑡]

𝑡
1D Convolution 𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
• Flip the filter and sliding 𝜏

3 𝑓[𝑡]
1 2

−1 𝑡
𝑔[𝑡]
1 2 1

𝑡
9=1⋅1+3⋅2+2⋅1

𝑓 ∗ 𝑔 [𝑡]

𝑡
1D Convolution 𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
• Flip the filter and sliding 𝜏

3 𝑓[𝑡]
1 2

−1 𝑡
𝑔[𝑡]
1 2 1

6 = 3 ⋅ 1 + 2 ⋅ 2 + (−1) ⋅ 1
𝑓 ∗ 𝑔 [𝑡]

𝑡
1D Convolution 𝑓 ∗ 𝑔 [𝑡] ≔ ෍ 𝑓 𝑡 − 𝜏 𝑔 𝜏
• Flip the filter and sliding 𝜏

3 𝑓[𝑡]
1 2

−1 𝑡
𝑔[𝑡]
1 2 1

𝑓 ∗ 𝑔 [𝑡]
0 = 2 ⋅ 1 + (−1) ⋅ 2 + 0 ⋅ 1

𝑡
1D Convolution
• Example

Convolution - Wikipedia
1D Convolution
• Gaussian filter

𝑓(𝑡)

𝑓 𝑡 ∗ 𝑔(𝑡)

𝑔(𝑡)
2D Convolution

∞ ∞
𝑓 ∗ 𝑔 𝑠, 𝑡 ≔ න න 𝑓 𝑠 − 𝜏1 , 𝑡 − 𝜏2 𝑔 𝜏1 , 𝜏2 𝑑𝜏1 𝑑𝜏2
−∞ −∞

𝑓 ∗ 𝑔 [𝑠, 𝑡] ≔ ෍ ෍ 𝑓 𝑠 − 𝜏1 , 𝑡 − 𝜏2 𝑔 𝜏1 , 𝜏2
𝜏1 𝜏2
2D Convolution
• One input channel, e.g. gray color image
• Padding=1, stride=1

1 0 3 0 20 0 0 0 0
1 0 3 1 33 2 3 3 0 16
3 0 1 3 11 2 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• One input channel, e.g. gray color image
• Padding=1, stride=1

0 10 30 20 0 0 0
0 11 33 32 3 3 0 16 28
0 33 11 12 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• One input channel, e.g. gray color image
• Padding=1, stride=1

0 0 10 30 20 0 0
0 1 13 32 33 3 0 16 28 24
0 3 31 12 11 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution

File:2D Convolution Animation.gif - Wikimedia Commons


2D Convolution
• 2D gaussian filter

File:2D Convolution Animation.gif - Wikimedia Commons


2D Convolution
• 2D gaussian filter

2-D Gaussian filtering of images - MATLAB imgaussfilt (mathworks.com)


Convolutional Neural Networks
Convolutional Neural Network

Feature map

filters

Input
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=1

1 0 3 0 20 0 0 0 0
1 0 3 1 33 2 3 3 0 16
3 0 1 3 11 2 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=1

0 10 30 20 0 0 0
0 11 33 32 3 3 0 16 28
0 33 11 12 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=1

0 0 10 30 20 0 0
0 1 13 32 33 3 0 16 28 24
0 3 31 12 11 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=1

0 0 0 0 0 0 0
0 1 3 2 3 3 0 16 28 24 28 16
0 3 1 2 1 1 0 1 3 2 27 41 38 37 21
0 3 3 3 1 2 0 ∗ 1 3 3 = 33 40 33 25 18
0 2 2 1 2 1 0 3 1 1 32 40 37 29 17
0 2 3 2 1 2 0 25 27 21 20 12
filter
0 0 0 0 0 0 0
Output
Input
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=2

10 30 20 0 0 0 0
10 31 33 2 3 3 0
30 13 11 2 1 1 0 16
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=2

0 0 10 30 20 0 0
0 1 13 32 33 3 0
0 3 31 12 11 1 0 16 24
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=2

0 0 0 0 10 30 20

0 1 3 2 13 33 30

0 3 1 2 31 11 10 16 24 16
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=2

0 0 0 0 0 0 0
0 1 3 2 3 3 0
10 33 21 2 1 1 0 16 24 16
10 33 33 3 1 2 0 33
30 12 12 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=1, output_channel=1, padding=1, stride=2

0 0 0 0 0 0 0
0 1 3 2 3 3 0
0 3 1 2 1 1 0 1 3 2 16 24 16
0 3 3 3 1 2 0 ∗ 1 3 3 = 33 33 18
0 2 2 1 2 1 0 3 1 1 25 21 12
0 2 3 2 1 2 0
filter Output
0 0 0 0 0 0 0

Input
2D Convolution
• Input_channel=1, output_channel=1, padding=2, stride=1

0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 4 8 14 12 12 9
0 0 1 3 2 3 3 0 0 6 16 28 24 28 16 6
0 0 3 1 2 1 1 0 0 1 3 2 14 27 41 38 37 21 10
0 0 3 3 3 1 2 0 0 ∗ 1 3 3 = 17 33 40 33 25 18 6
0 0 2 2 1 2 1 0 0 3 1 1 14 32 40 37 29 17 9
0 0 2 3 2 1 2 0 0 10 25 27 21 20 12 3
0 0 0 0 0 0 0 0 0 4 12 15 11 9 7 2
0 0 0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=1, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=1, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=1, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=1, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=2, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=2, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=2, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=3, output_channel=2, padding=1, stride=1

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0

0 0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0
2D Convolution
• Input_channel=64, output_channel=64, kernel_size=3, padding=1, stride=1

64

3
3
64 64

64

2D Convolution
• Input_channel=64, output_channel=64, kernel_size=3, padding=1, stride=1


2D Convolution
• Input_channel=64, output_channel=64, kernel_size=3, padding=1, stride=1


2D Convolution
• Input_channel=64, output_channel=64, kernel_size=3, padding=1, stride=1


Convolutions in PyTorch
Max Pooling
• Pooling a maximum value given the window
• Used to reduce the size of feature maps
• Example) stride=2, padding=1

0 0 0 0 0 0 0
0 1 3 2 3 3 0
3
0 3 1 2 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
Max Pooling
• Pooling a maximum value given the window
• Used to reduce the size of feature maps
• Example) stride=2, padding=1

0 0 0 0 0 0 0
0 1 3 2 3 3 0
3 3
0 3 1 2 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
Max Pooling
• Pooling a maximum value given the window
• Used to reduce the size of feature maps
• Example) stride=2, padding=1

0 0 0 0 0 0 0
0 1 3 2 3 3 0
3 3 3
0 3 1 2 1 1 0
0 3 3 3 1 2 0
0 2 2 1 2 1 0
0 2 3 2 1 2 0
0 0 0 0 0 0 0
Max Pooling in PyTorch
AlexNet

Dropout Dropout
Understanding AlexNet | LearnOpenCV #
Fully Connected Layer vs Convolutional Layer
• Translation equivariance and parameter sharing

𝑊𝑥

𝑊 ∈ ℝ131072×131072 32
32
Flatten FC Layer Reshape

64
64

64
64

32 ⋅ 64 ⋅ 64 = 131072 32 ⋅ 64 ⋅ 64 = 131072
Fully Connected Layer vs Convolutional Layer
• Translation equivariance and parameter sharing

32

3
3
32
32

64
64 32

64
64

𝑊 ∈ ℝ32×32×3×3
Visualization of Learned Filter
• First layer conv filters

CS231n Convolutional Neural Networks for Visual Recognition


Visualization of Learned Feature Maps

Applied Deep Learning - Part 4: Convolutional Neural Networks | by Arden Dertat | Towards Data Science
Visualization of Learned Feature Maps

Applied Deep Learning - Part 4: Convolutional Neural Networks | by Arden Dertat | Towards Data Science
ImageNet Large Scale Visual Recognition Challenge

(ILSVRC)
ILSVRC
• ImageNet is an image database organized according to the WordNet
hierarchy (nouns)
• 1000 object classes
• About 1.2M training images, 50K validation images, 100K test images

• The ImageNet Large Scale Visual Recognition Challenge (ILSVRC)


• 8 years history (2010 – 2017)
• It was the most powerful driving force to facilitate deep learning research
Classification Results

Beyond ILSVRC workshop 2017 (image-net.org)


Classification Results
Human: 0.05

AlexNet

ZFNet
GoogleNet Trimps-Soushen
VGGNet (Inception + WRN)
ResNet SENet

Beyond ILSVRC workshop 2017 (image-net.org)


AlexNet
• The winner of ILSVRC 2012
• It changed the entire computer vision research
ZFNet
• The winner of ILSVRC 2013
• The network architectures were developed by using the visualization
techniques
• Visualizing and Understanding Convolutional Networks, Zeiler et al, ECCV 2014

• Reduced the 1st layer filter size from 11x11 to 7x7


• 1st layer stride from 4 -> 2
VGGNet

Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet | by Khush Patel | Towards Data Science
VGGNet

Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet | by Khush Patel | Towards Data Science
VGGNet

Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet | by Khush Patel | Towards Data Science
GoogLeNet
• Winner of ISLVRC 2014
Max pooling
• Also called ‘Inception’
Concatenation

Convolution

Going deeper with convolutions, Szegedy et al, CVPR 2015


GoogLeNet
• Inception module

Going deeper with convolutions, Szegedy et al, CVPR 2015


GoogLeNet
• ILSVRC 2014 classification results

Going deeper with convolutions, Szegedy et al, CVPR 2015


ResNet
• The winner of ILSVRC 2015
• Residual building block

Deep residual learning for image recognition, He et al, CVPR 2016


ResNet

Deep residual learning for image recognition, He et al, CVPR 2016


ResNet
• Training on ImageNet

Deep residual learning for image recognition, He et al, CVPR 2016


ResNet
• ILSVRC 2015 classification results

Deep residual learning for image recognition, He et al, CVPR 2016


The Key Ingredients of Training CNNs
Drop-out/Drop-path
Dropout
• Turning off neurons w/ given probability (e.g. 0.5)
• Every iterations, new network architectures emerge

Improving neural networks by preventing co-adaptation of feature detectors, hinton et al, arXiv 2012
Dropout
• A simple way to train deep neural
networks for improving generalization
performance
• Avoiding co-adaptations: a hidden unit
cannot rely on other hidden units being
present
• Model averaging

Improving neural networks by preventing co-adaptation of feature detectors, hinton et al, arXiv 2012
Stochastic Depth (a.k.a DropPath)
• Training short networks and use deep networks at test time
• During training, randomly drop a subset of layers and bypass them with identity
function

Deep Networks with Stochastic Depth, Huang et al, ECCV 2016


Stochastic Depth (a.k.a DropPath)
• Linearly decaying ‘drop probability’
• Later layers will be dropped more frequently

Deep Networks with Stochastic Depth, Huang et al, ECCV 2016


Stochastic Depth (a.k.a DropPath)

Deep Networks with Stochastic Depth, Huang et al, ECCV 2016


Normalization Methods
Batch Normalization
• Normalizing training sets

𝑥2

𝑥1

Deep Learning Specialization - DeepLearning.AI


Batch Normalization
• Subtracting the mean

𝑥2

𝑁
1 (𝑖)
𝜇1 = ෍ 𝑥1
𝑁
𝑖=1

(𝑖) 𝑖
𝑥1 ≔ 𝑥1 − 𝜇1
𝑥1

Deep Learning Specialization - DeepLearning.AI


Batch Normalization
• Divide by standard deviation

𝑥2
𝑥1 ~𝑁(0,1)
𝑁
𝑥2 ~𝑁(0,1) 2
1 𝑖 2
𝜎1 = ෍ 𝑥1
𝑁
𝑖=1

(𝑖) 𝑖
𝑥1 ≔ 𝑥1 /𝜎1
𝑥1

Deep Learning Specialization - DeepLearning.AI


Batch Normalization
• Standardization

mean
𝑥−𝜇
𝑧=
𝜎
Standard deviation

𝑧 ~𝑁(0,1)
Batch Normalization
• When un-normalized, the loss surface is more skewed (elongated)
• Input feature scales are very different each other
Batch Normalization
• Normalizing inputs (also hidden
units) based on mini-batch
statistics
• Computing mean and variance
given the current batch
• During testing, we may not have
enough batch size for this (e.g. 1
batch), using mean and variance
from the training phase

Batch normalization: accelerating deep network training by reducing internal covariate shift, Ioffe et al, ICML 2015
Batch Normalization

Batch normalization: accelerating deep network training by reducing internal covariate shift, Ioffe et al, ICML 2015
Batch Normalization in CNN

𝑀: 128 The number of input channels


𝐷𝐹 : 64 The size of a feature map
𝐵 : 32 The mini-batch size

𝜇 ∈ ℝ? , 𝜎 2 ∈ ℝ?
Why Batch Normalization Works?
1. Normalization usually makes loss surface less ‘skewed’
2. BN may reduce the internal covariate shift
• [1502.03167] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (arxiv.org)

3. BN makes loss surface smoother


• [1805.11604] How Does Batch Normalization Help Optimization? (arxiv.org)
Layer Normalization
• Batch normalization is dependent on the mini-batch size
• What about the network size is too big, so only few mini-batch sizes are allowed?
• It is not obvious how to apply batch normalization to RNNs
• Noticeable computation

𝐻 𝐻
1 1 2
𝜇 = ෍ 𝑎𝑖𝑙
𝑙
𝜎 =𝑙
෍ 𝑎𝑖𝑙 − 𝜇 𝑙
𝐻 𝐻
𝑖=1 𝑖=1

𝐻: the number of hidden units in a layer

Layer normalization, Ba et al, arXiv 2016


Other Normalizaion Methods

Group normalization, Wu et al, ECCV 2018


Other Normalizaion Methods

Group normalization, Wu et al, ECCV 2018

You might also like