0% found this document useful (0 votes)
17 views34 pages

Cours 8 A

This document discusses the foundations of Convolutional Neural Networks (CNNs) and their application in computer vision problems such as image classification and object detection. It covers key concepts including convolution operations, edge detection, padding, and the architecture of CNNs, along with examples of how filters are applied to images. Additionally, it explains the parameters involved in CNN layers and provides a summary of the notation used in the context of convolution layers.

Uploaded by

lahlou khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views34 pages

Cours 8 A

This document discusses the foundations of Convolutional Neural Networks (CNNs) and their application in computer vision problems such as image classification and object detection. It covers key concepts including convolution operations, edge detection, padding, and the architecture of CNNs, along with examples of how filters are applied to images. Additionally, it explains the parameters involved in CNN layers and provides a summary of the notation used in the context of convolution layers.

Uploaded by

lahlou khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Deep learning

Dr. Aissa Boulmerka


[email protected]

2023-2024

1
CHAPTER 8
CONVOLUTIONAL NEURAL NETWORKS (CNNS)
“FOUNDATIONS OF CNN”

2
Computer Vision Problems

Image Classification
Neural Style Transfer
Cat? (0/1)

Object detection

3
Deep Learning on large images
Image Classification

Cat? (0/1)

64x64x3
𝑥1
𝑥2
⋮ ⋮ ⋮ 𝑦
𝑥𝑛
3M 1000 ⇒ 3𝐵 parameters!
1000 × 1000 × 3 = 3M 4
Computer Vision Problem

vertical edges

horizontal edges 5
Vertical edge detection

Convolution
1 10 -1
10 10
-1 -1
0 -1
3 0 1 2 7 4
1 10 10
-1 10
-1 0
-1 -1
1 5 8 9 3 1 1 0 -1
-5 -4 0 8
11 1100 1100
-1
-1 1100
-1
-1 00
-1
-1 -1
-1
2 7 2 5 1 3 -10 -2 2 3
1 10 -1
10 10
-1 0
-1 -1 ∗ 1 0 -1 =
0 1 3 1 7 8 1 0 -1
0 -2 -4 -7

4 2 1 6 2 8 𝟑×𝟑 -3 -2 -3 -16

2 4 5 2 3 9 Filter 𝟒×𝟒
Kernel
𝟔×𝟔

6
Vertical edge detection

10 10 10 0 0 0

10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 ∗ = 0 30 30 0
10 10 10 0 0 0 1 0 -1
0 30 30 0
10 10 10 0 0 0


7
Vertical edge detection examples
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
∗ 1 0 -1 =
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0

0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
∗ 1 0 -1 =
0 0 0 10 10 10 0 -30 -30 0
1 0 -1
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10
8
Vertical and Horizontal Edge Detection

1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
10 10 10 0 0 0 0 0 0 0
1 1 1
10 10 10 0 0 0 30 10 -10 -30
∗ 0 0 0 =
0 0 0 10 10 10 30 10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
9
Learning to detect edges

1 0 -1 1 0 -1 3 0 -3

1 0 -1 2 0 -2 10 0 -10

1 0 -1 1 0 -1 3 0 -3

Sobel filter Scharr filter

3 0 1 2 7 4
1 5 8 9 3 1
𝑤1 𝑤2 𝑤3
2 7 2 5 1 3
0 1 3 1 7 8 * 𝑤4 𝑤5 𝑤6
𝑤7 𝑤8 𝑤9
=
4 2 1 6 2 8
2 4 5 2 3 9
10
Padding

∗ =

𝟑×𝟑
𝒇×𝒇 𝟔×𝟔
𝟔×𝟔
𝒏×𝒏 with padding:
𝐩 = padding = 𝟏 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 × 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏
𝟔+𝟐−𝟑+𝟏 × 𝟔+𝟐−𝟑+𝟏
=𝟔×𝟔 11
Valid and Same convolutions

“Valid” : No padding
𝒏×𝒏 ∗ 𝒇×𝒇 → 𝒏−𝒇+𝟏 × 𝒏−𝒇+𝟏

𝟔×𝟔 ∗ 𝟑×𝟑 → 𝟒 × 𝟒

“Same”: Pad so that output size is the same as the input size.

𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 × 𝒏 + 𝟐𝒑 − 𝒇 + 𝟏 𝑓 is usually odd
𝟑×𝟑⟹𝒑=𝟏
𝒇−𝟏 𝟓×𝟓⟹𝒑=𝟐
𝒑=
𝟐 𝟕×𝟕⟹𝒑=𝟑
⋮ 12
Strided convolution

2 3 3 4 7 43 4 4 6 34 2 4 9 4
6 1 6 0 9 21 8 0 7 12 4 0 3 2
91 100 83
3 -13 4 40 8 -143 3 40 8 -134 9 40 7 43 3 4 4
7 1 8 0 3 21 6 0 6 12 3 0 4 2 ∗ 1 0 2 = 69 91 127

4 -13 2 04 1 -134 8 40 3 -134 4 40 6 43 -1 0 3 44 72 74

3 1 2 0 4 12 1 0 9 12 8 0 3 2 𝟑×𝟑 𝟑×𝟑

0 -1 1 0 3 -13 9 0 2 -13 1 0 4 3
𝟕×𝟕
𝒏×𝒏 Stride s 𝑛:2𝑝 ;𝑓 𝑛:2𝑝 ;𝑓
Padding 𝒑 * 𝒔=𝟐 +1 × +1
𝑠 𝑠

𝒛 = 𝒇𝒍𝒐𝒐𝒓(𝒛) 𝟕+𝟎 −𝟑 𝟒
+𝟏= +𝟏=𝟑
𝟐 𝟐 13
Summary of convolutions

𝑛 × 𝑛 image 𝑓 × 𝑓 filter

padding p stride s

Output size:

𝑛:2𝑝 ;𝑓 𝑛:2𝑝 ;𝑓
+1 × +1
𝑠 𝑠

14
Technical note on cross-correlation vs. convolution

Convolution in math textbook:

2 3 7 4 6 2
6 6 9 8 7 4
3 4 5
3 4 8 3 8 9
∗ 1 0 2 =
7 8 3 6 6 3
-1 9 7
4 2 1 8 3 4
3 2 4 1 9 8 7 2 5
Associativity:
9 0 4 𝐴∗𝐵 ∗𝐶 =𝐴∗ 𝐵∗𝐶
-1 1 3
15
Convolutions on RGB images

* =
𝟑×𝟑×𝟑
𝟔×𝟔×𝟑 𝟒×𝟒
Height × Witdh × #channels

16
Convolutions on RGB image

∗ =

𝟒×𝟒
𝟑×𝟑×𝟑

𝟔×𝟔×𝟑

17
Multiple filters

Vertical edges

∗ =

3×3×3 4×4
Horizontal edges

6×6×3
∗ = 4×4×2

3×3×3
4×4
Summary: 𝒏 × 𝒏 × 𝒏𝒄 ∗ 𝒇 × 𝒇 × 𝒏𝒄 → 𝒏 − 𝒇 + 𝟏 × 𝒏 − 𝒇 + 𝟏 × 𝒏𝒄′
𝟔×𝟔×𝟑 ∗ 𝟑×𝟑×𝟑 → 𝟒×𝟒×𝟐 # filters
18
Example of a layer
𝑤 [1] 𝑤 [1] 𝑎[0]

∗ ⟶ RELU( +𝑏1 )

3×3×3 4×4

6×6×3 ∗ ⟶ RELU( +𝑏2 ) 4×4×2


𝑎[0] 𝑎[1]
3×3×3
𝑧 [1] = 𝑤 [1] 𝑎[0] + 𝑏 [1] 4×4
𝑎[1] = 𝑔(𝑧 1
)
𝒂[𝟎] ⟶ 𝒂[𝟏]
6×6×3 4×4×2 19
Number of parameters in one layer

 If you have 10 filters that are 𝟑 × 𝟑 × 𝟑 in one layer of a


neural network, how many parameters does that layer have?


1 2 10
3×3×3
27 parameters + 1 bias
=> 28 parameters

280 parameters
20
Summary of notation
If layer 𝑙 is a convolution layer: [𝑙;1] [𝑙;1] [𝑙;1]
Input: 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
𝑓 [𝑙] = filter size
[𝑙] [𝑙] [𝑙]
𝑝[𝑙] = padding Output: 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
𝑠 [𝑙] = stride
[𝑙−1]
[𝑙] [𝑙] 𝑛 :2𝑝[𝑙] ;𝑓[𝑙]
𝑛𝑐 = number of filters 𝑛𝐻 = 𝐻 +1
𝑠 [𝑙]
[𝑙;1]
Each filter is : 𝑓 [𝑙] × 𝑓 [𝑙] × 𝑛𝑐
[𝑙] [𝑙] [𝑙]
Activations : 𝑎[𝑙] → 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
[𝑙] [𝑙] [𝑙]
𝐴[𝑙] → 𝑚 × 𝑛𝐻 × 𝑛𝑊 × 𝑛𝑐
[𝑙;1] [𝑙]
Weights : 𝑓 [𝑙] × 𝑓 [𝑙] × 𝑛𝑐 × 𝑛𝑐
[𝑙]
Bias : 𝑛𝑐

21
Example ConvNet

Flatten

𝑓 [1] = 3 𝑓 [2] = 5 𝑓 [3] = 5 ⋮ 𝑦


𝑠 [1] = 1 𝑠 [2] = 2 𝑠 [3] = 2
𝑝[1] = 0 37 × 37 × 10 𝑝[2] = 0 17 × 17 × 20 𝑝[3] = 0 7 × 7 × 40
Logistic
39 × 39 × 3
[0] [0] 10 filters [1] [1]
𝑛𝐻 = 𝑛𝑊 = 37 20 filters [2] [2]
𝑛𝐻 = 𝑛𝑊 = 17 40 filters Softmax
𝑛𝐻 = 𝑛𝑊 = 39
[1] [2]
[0]
𝑛𝐶 = 3 𝑛𝐶 = 10 𝑛𝐶 = 20

1960

22
Types of layer in a convolutional network

- Convolution (CONV)

- Pooling (POOL)

- Fully connected (FC)

23
Pooling layer: Max pooling

1 3 2 1
2 9 1 1 9 2
1 3 2 3 6 3
5 6 1 2 2×2
Hyperparameters:
4×4 𝒇 = 𝟐
𝒔 = 𝟐

24
Pooling layer: Max pooling

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9 𝟑×𝟑×𝟐
𝟓×𝟓×𝟐 𝑛−𝑓 Hyperparameters:
+1 𝒇 =𝟑
𝑠 𝒔 =𝟏
25
Pooling layer: Average pooling

1 3 2 1
2 9 1 1 3.75 1.25

1 4 2 3
4 2
5 6 1 2

𝑛 −𝑓
+1
𝑠
26
Summary of pooling

Hyperparameters:
𝑛𝐻 × 𝑛 𝑊 × 𝑛𝐶
f : filter size
s : stride
𝑛𝐻 − 𝑓 𝑛𝑊 − 𝑓
Max or average pooling +1 × + 1 × 𝑛𝑐
𝑠 𝑠

No parameters to learn!

27
Neural network example

(LeNet-5)
FC3 FC4

CONV1 CONV2 POOL2


POOL1

𝑓=5 𝑓=2 𝑓=5 𝑓=2


= Softmax
(10 outputs)
𝑠=1 𝑠=2 𝑠=1 𝑠=2
32 × 32 × 3
28 × 28 × 8 14 × 14 × 8 10 × 10 × 16 5 × 5 × 16

400 120 84
Layer 1 Layer 2

CONV-POOL-CONV-POOL-FC-FC-SOFTMAX

28
Neural network example

Activation shape Activation Size # parameters

Input: (32,32,3) 3,072 0

29
Neural network example

Activation shape Activation Size # parameters

Input: (32,32,3) 3,072 0

CONV1 (f=5, s=1) (28,28,8) 6,272 608

POOL1 (14,14,8) 1,568 0

CONV2 (f=5, s=1) (10,10,16) 1,600 3216

POOL2 (5,5,16) 400 0

FC3 (120,1) 120 48,120

FC4 (84,1) 84 10,164

Softmax (10,1) 10 850

30
Why convolutions

5 × 5 = 25 + 1
𝒇=𝟓 = 26 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑝𝑒𝑟 𝑓𝑖𝑙𝑡𝑒𝑟
6 filters 6 × 26 = 156 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
𝟑𝟐 × 𝟑𝟐 × 𝟔
𝟑𝟐 × 𝟑𝟐 × 𝟑

3072 × 4704 ≈ 14𝑀


⋮ ⋮

3072 4704
31
Why convolutions

10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0 ∗ 1 0 -1 = 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0

Parameter sharing: A feature detector (such as a vertical edge detector) that’s


useful in one part of the image is probably useful in another part of the image.

Sparsity of connections: In each layer, each output value depends only on a


small number of inputs.

32
Putting it together
Training set (𝑥 1 , 𝑦 1
) … (𝑥 𝑚
,𝑦 𝑚
).

𝑚
1
Cost 𝐽 = 𝑚
ℒ(𝑦 𝑖 , 𝑦 𝑖 )
𝑖<1
Use gradient descent to optimize parameters to reduce 𝐽

33
References
 Andrew Ng. Deep learning. Coursera.
 Geoffrey Hinton. Neural Networks for Machine Learning.
 Kevin P. Murphy. Probabilistic Machine Learning An Introduction. MIT
Press, 2022.
 MIT Deep Learning 6.S191 (http://introtodeeplearning.com/)

34

You might also like