Deep learning
Dr. Aissa Boulmerka
[email protected] 2023-2024
1
CHAPTER 8
CONVOLUTIONAL NEURAL NETWORKS (CNNS)
“FOUNDATIONS OF CNN”
2
Computer Vision Problems
Image Classification
Neural Style Transfer
Cat? (0/1)
Object detection
3
Deep Learning on large images
Image Classification
Cat? (0/1)
64x64x3
𝑥1
𝑥2
⋮ ⋮ ⋮ 𝑦
𝑥𝑛
3M 1000 ⇒ 3𝐵 parameters!
1000 × 1000 × 3 = 3M 4
Computer Vision Problem
vertical edges
horizontal edges 5
Vertical edge detection
Convolution
1 10 -1
10 10
-1 -1
0 -1
3 0 1 2 7 4
1 10 10
-1 10
-1 0
-1 -1
1 5 8 9 3 1 1 0 -1
-5 -4 0 8
11 1100 1100
-1
-1 1100
-1
-1 00
-1
-1 -1
-1
2 7 2 5 1 3 -10 -2 2 3
1 10 -1
10 10
-1 0
-1 -1 ∗ 1 0 -1 =
0 1 3 1 7 8 1 0 -1
0 -2 -4 -7
4 2 1 6 2 8 𝟑×𝟑 -3 -2 -3 -16
2 4 5 2 3 9 Filter 𝟒×𝟒
Kernel
𝟔×𝟔
6
Vertical edge detection
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 ∗ = 0 30 30 0
10 10 10 0 0 0 1 0 -1
0 30 30 0
10 10 10 0 0 0
∗
7
Vertical edge detection examples
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
∗ 1 0 -1 =
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
∗ 1 0 -1 =
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
8
Vertical and Horizontal Edge Detection
1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
10 10 10 0 0 0 0 0 0 0
1 1 1
10 10 10 0 0 0 30 10 -10 -30
∗ 0 0 0 =
0 0 0 10 10 10 30 10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
9
Learning to detect edges
1 0 -1 1 0 -1 3 0 -3
1 0 -1 2 0 -2 10 0 -10
1 0 -1 1 0 -1 3 0 -3
Sobel filter Scharr filter
3 0 1 2 7 4
1 5 8 9 3 1
𝑤1 𝑤2 𝑤3
2 7 2 5 1 3
0 1 3 1 7 8 * 𝑤4 𝑤5 𝑤6
𝑤7 𝑤8 𝑤9
=
4 2 1 6 2 8
2 4 5 2 3 9
10
Padding
∗ =
𝟑×𝟑
𝒇×𝒇 𝟔×𝟔
𝟔×𝟔
𝒏×𝒏 with padding:
𝐩 = padding = 𝟏 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 × 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏
𝟔+𝟐−𝟑+𝟏 × 𝟔+𝟐−𝟑+𝟏
=𝟔×𝟔 11
Valid and Same convolutions
“Valid” : No padding
𝒏×𝒏 ∗ 𝒇×𝒇 → 𝒏−𝒇+𝟏 × 𝒏−𝒇+𝟏
𝟔×𝟔 ∗ 𝟑×𝟑 → 𝟒 × 𝟒
“Same”: Pad so that output size is the same as the input size.
𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 × 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 𝑓 is usually odd
𝟑×𝟑⟹𝒑=𝟏
𝒇−𝟏 𝟓×𝟓⟹𝒑=𝟐
𝒑=
𝟐 𝟕×𝟕⟹𝒑=𝟑
⋮ 12
Strided convolution
2 3 3 4 7 43 4 4 6 34 2 4 9 4
6 1 6 0 9 21 8 0 7 12 4 0 3 2
91 100 83
3 -13 4 40 8 -143 3 40 8 -134 9 40 7 43 3 4 4
7 1 8 0 3 21 6 0 6 12 3 0 4 2 ∗ 1 0 2 = 69 91 127
4 -13 2 04 1 -134 8 40 3 -134 4 40 6 43 -1 0 3 44 72 74
3 1 2 0 4 12 1 0 9 12 8 0 3 2 𝟑×𝟑 𝟑×𝟑
0 -1 1 0 3 -13 9 0 2 -13 1 0 4 3
𝟕×𝟕
𝒏×𝒏 Stride s 𝑛:2𝑝 ;𝑓 𝑛:2𝑝 ;𝑓
Padding 𝒑 * 𝒔=𝟐 +1 × +1
𝑠 𝑠
𝒛 = 𝒇𝒍𝒐𝒐𝒓(𝒛) 𝟕+𝟎 −𝟑 𝟒
+𝟏= +𝟏=𝟑
𝟐 𝟐 13
Summary of convolutions
𝑛 × 𝑛 image 𝑓 × 𝑓 filter
padding p stride s
Output size:
𝑛:2𝑝 ;𝑓 𝑛:2𝑝 ;𝑓
+1 × +1
𝑠 𝑠
14
Technical note on cross-correlation vs. convolution
Convolution in math textbook:
2 3 7 4 6 2
6 6 9 8 7 4
3 4 5
3 4 8 3 8 9
∗ 1 0 2 =
7 8 3 6 6 3
-1 9 7
4 2 1 8 3 4
3 2 4 1 9 8 7 2 5
Associativity:
9 0 4 𝐴∗𝐵 ∗𝐶 =𝐴∗ 𝐵∗𝐶
-1 1 3
15
Convolutions on RGB images
* =
𝟑×𝟑×𝟑
𝟔×𝟔×𝟑 𝟒×𝟒
Height × Witdh × #channels
16
Convolutions on RGB image
∗ =
𝟒×𝟒
𝟑×𝟑×𝟑
𝟔×𝟔×𝟑
17
Multiple filters
Vertical edges
∗ =
3×3×3 4×4
Horizontal edges
6×6×3
∗ = 4×4×2
3×3×3
4×4
Summary: 𝒏 × 𝒏 × 𝒏𝒄 ∗ 𝒇 × 𝒇 × 𝒏𝒄 → 𝒏 − 𝒇 + 𝟏 × 𝒏 − 𝒇 + 𝟏 × 𝒏𝒄′
𝟔×𝟔×𝟑 ∗ 𝟑×𝟑×𝟑 → 𝟒×𝟒×𝟐 # filters
18
Example of a layer
𝑤 [1] 𝑤 [1] 𝑎[0]
∗ ⟶ RELU( +𝑏1 )
3×3×3 4×4
6×6×3 ∗ ⟶ RELU( +𝑏2 ) 4×4×2
𝑎[0] 𝑎[1]
3×3×3
𝑧 [1] = 𝑤 [1] 𝑎[0] + 𝑏 [1] 4×4
𝑎[1] = 𝑔(𝑧 1
)
𝒂[𝟎] ⟶ 𝒂[𝟏]
6×6×3 4×4×2 19
Number of parameters in one layer
If you have 10 filters that are 𝟑 × 𝟑 × 𝟑 in one layer of a
neural network, how many parameters does that layer have?
⋯
1 2 10
3×3×3
27 parameters + 1 bias
=> 28 parameters
280 parameters
20
Summary of notation
If layer 𝑙 is a convolution layer: [𝑙;1] [𝑙;1] [𝑙;1]
Input: 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
𝑓 [𝑙] = filter size
[𝑙] [𝑙] [𝑙]
𝑝[𝑙] = padding Output: 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
𝑠 [𝑙] = stride
[𝑙−1]
[𝑙] [𝑙] 𝑛 :2𝑝[𝑙] ;𝑓[𝑙]
𝑛𝑐 = number of filters 𝑛𝐻 = 𝐻 +1
𝑠 [𝑙]
[𝑙;1]
Each filter is : 𝑓 [𝑙] × 𝑓 [𝑙] × 𝑛𝑐
[𝑙] [𝑙] [𝑙]
Activations : 𝑎[𝑙] → 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
[𝑙] [𝑙] [𝑙]
𝐴[𝑙] → 𝑚 × 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
[𝑙;1] [𝑙]
Weights : 𝑓 [𝑙] × 𝑓 [𝑙] × 𝑛𝑐 × 𝑛𝑐
[𝑙]
Bias : 𝑛𝑐
21
Example ConvNet
Flatten
𝑓 [1] = 3 𝑓 [2] = 5 𝑓 [3] = 5 ⋮ 𝑦
𝑠 [1] = 1 𝑠 [2] = 2 𝑠 [3] = 2
𝑝[1] = 0 37 × 37 × 10 𝑝[2] = 0 17 × 17 × 20 𝑝[3] = 0 7 × 7 × 40
Logistic
39 × 39 × 3
[0] [0] 10 filters [1] [1]
𝑛𝐻 = 𝑛𝑊 = 37 20 filters [2] [2]
𝑛𝐻 = 𝑛𝑊 = 17 40 filters Softmax
𝑛𝐻 = 𝑛𝑊 = 39
[1] [2]
[0]
𝑛𝐶 = 3 𝑛𝐶 = 10 𝑛𝐶 = 20
1960
22
Types of layer in a convolutional network
- Convolution (CONV)
- Pooling (POOL)
- Fully connected (FC)
23
Pooling layer: Max pooling
1 3 2 1
2 9 1 1 9 2
1 3 2 3 6 3
5 6 1 2 2×2
Hyperparameters:
4×4 𝒇 = 𝟐
𝒔 = 𝟐
24
Pooling layer: Max pooling
1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 𝟑×𝟑×𝟐
𝟓×𝟓×𝟐 𝑛−𝑓 Hyperparameters:
+1 𝒇 =𝟑
𝑠 𝒔 =𝟏
25
Pooling layer: Average pooling
1 3 2 1
2 9 1 1 3.75 1.25
1 4 2 3
4 2
5 6 1 2
𝑛 −𝑓
+1
𝑠
26
Summary of pooling
Hyperparameters:
𝑛𝐻 × 𝑛 𝑊 × 𝑛𝐶
f : filter size
s : stride
𝑛𝐻 − 𝑓 𝑛𝑊 − 𝑓
Max or average pooling +1 × + 1 × 𝑛𝑐
𝑠 𝑠
No parameters to learn!
27
Neural network example
(LeNet-5)
FC3 FC4
CONV1 CONV2 POOL2
POOL1
𝑓=5 𝑓=2 𝑓=5 𝑓=2
= Softmax
(10 outputs)
𝑠=1 𝑠=2 𝑠=1 𝑠=2
32 × 32 × 3
28 × 28 × 8 14 × 14 × 8 10 × 10 × 16 5 × 5 × 16
400 120 84
Layer 1 Layer 2
CONV-POOL-CONV-POOL-FC-FC-SOFTMAX
28
Neural network example
Activation shape Activation Size # parameters
Input: (32,32,3) 3,072 0
29
Neural network example
Activation shape Activation Size # parameters
Input: (32,32,3) 3,072 0
CONV1 (f=5, s=1) (28,28,8) 6,272 608
POOL1 (14,14,8) 1,568 0
CONV2 (f=5, s=1) (10,10,16) 1,600 3216
POOL2 (5,5,16) 400 0
FC3 (120,1) 120 48,120
FC4 (84,1) 84 10,164
Softmax (10,1) 10 850
30
Why convolutions
5 × 5 = 25 + 1
𝒇=𝟓 = 26 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑝𝑒𝑟 𝑓𝑖𝑙𝑡𝑒𝑟
6 filters 6 × 26 = 156 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝟑𝟐 × 𝟑𝟐 × 𝟔
𝟑𝟐 × 𝟑𝟐 × 𝟑
3072 × 4704 ≈ 14𝑀
⋮ ⋮
3072 4704
31
Why convolutions
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 ∗ 1 0 -1 = 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
Parameter sharing: A feature detector (such as a vertical edge detector) that’s
useful in one part of the image is probably useful in another part of the image.
Sparsity of connections: In each layer, each output value depends only on a
small number of inputs.
32
Putting it together
Training set (𝑥 1 , 𝑦 1
) … (𝑥 𝑚
,𝑦 𝑚
).
𝑚
1
Cost 𝐽 = 𝑚
ℒ(𝑦 𝑖 , 𝑦 𝑖 )
𝑖<1
Use gradient descent to optimize parameters to reduce 𝐽
33
References
Andrew Ng. Deep learning. Coursera.
Geoffrey Hinton. Neural Networks for Machine Learning.
Kevin P. Murphy. Probabilistic Machine Learning An Introduction. MIT
Press, 2022.
MIT Deep Learning 6.S191 (http://introtodeeplearning.com/)
34