DL Lecture Part2
DL Lecture Part2
a 2-weeks lecture
Part 2
Introduction to
Convolutional Neural Networks
Part 1
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
https://github.com/rasbt/stat453-deep-learning-ss20/tree/master/L12-cnns
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
CNNs for Image Classification
output
Image Source:
twitter.com%2Fcats&psig=AOvVaw30_o-PCM-
K21DiMAJQimQ4&ust=1553887775741551
p(y=cat)
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
Object Detection
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779-788).
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Object Segmentation
umbrella.98 bus.99
umbrella.98
person1.00
person1.00
person1.00
backpack1.00
person1.00 person.99
handbag.96 person.99
person1.00 person1.00 person1.00
person1.00 person1.00
person.95 person.98
person1.00
person1.00 person1.00 person.94 person1.00 person1.00 person.89
person1.00 sheep.99
backpack.99
sheep.99 sheep.86
backpack.93 sheep.82 sheep.96
sheep.96 sheep.93 sheep.91 sheep.95 sheep.96 sheep1.00
sheep1.00
sheep.99
sheep1.00
sheep.99
sheep.96
sheep.99
person.99
bottle.99
dining table.96
bottle.99
bottle.99
person.99person1.00
person1.00
traffic light.96 tv.99
chair.98 chair.99
chair.90
dining table.99 chair.96 wine glass.97
chair.86
bottle.99wine glass.93 chair.99
bowl.85 wine glass1.00
elephant1.00
wine glass.99
wine glass1.00
person1.00 chair.96 chair.99 fork.95
person.96
Figure 2. Mask R-CNN results on the COCO test set. These results are based on ResNet-101 [15], achieving a mask AP of 35.7 and
running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.
ingly minor change, RoIAlign has a large impact: it im- 2. Related Work
proves mask accuracy by relative 10% to 50%, showing
He,bigger
Kaiming,
gainsGeorgia Gkioxari,
under stricter Piotr Dollár,
localization and Ross
metrics. Girshick.
Second, we R-CNN:
"Mask The Region-based
R-CNN." CNN
In Proceedings (R-CNN)
of the approach [10]
IEEE International
Conference on Computer
found it essential Vision,mask
to decouple pp. 2961-2969. 2017. we
and class prediction: to bounding-box object detection is to attend to a manage-
predict a binary mask for each class independently, without able number of candidate object regions [33, 16] and evalu-
competition among classes, and rely on the network’s RoI ate convolutional networks [20, 19] independently on each
Sebastian
classification Raschka
branch to predict theSTAT 453: In
category. Intro RoI. R-CNN
to Deep Learning
contrast, and was extendedModels
Generative [14, 9] to allow attending
SS 2020to RoIs 4
Face Recognition
[1]
x
<latexit sha1_base64="p8Wx+cqqkWj+1zNtDaf7R0Gpalg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122y7dbMLuRCyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLEykMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3Q2q4FIo3UKDk7URzGoWSt8LRzdRvPXJtRKzucZzwIKIDJfqCUbRS6+kh871g0i2V3Yo7A1kmXk7KkKPeLX11ejFLI66QSWqM77kJBhnVKJjkk2InNTyhbEQH3LdU0YibIJudOyGnVumRfqxtKSQz9fdERiNjxlFoOyOKQ7PoTcX/PD/F/lWQCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahog3BW3x5mTSrFe+8Ur27KNeu8zgKcAwncAYeXEINbqEODWAwgmd4hTcncV6cd+dj3rri5DNH8AfO5w81Jo97</latexit>
Similarity/
Distance
Score
[2]
x
<latexit sha1_base64="vzgd/QPklE2GpKgvXahAxpOTUdw=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122i7dbMLuRiyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLE8G1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1Q6pRcIkNw43AdqKQRqHAVji6mfqtR1Sax/LejBMMIjqQvM8ZNVZqPT1kfjWYdEtlt+LOQJaJl5My5Kh3S1+dXszSCKVhgmrte25igowqw5nASbGTakwoG9EB+pZKGqEOstm5E3JqlR7px8qWNGSm/p7IaKT1OAptZ0TNUC96U/E/z09N/yrIuExSg5LNF/VTQUxMpr+THlfIjBhbQpni9lbChlRRZmxCRRuCt/jyMmlWK955pXp3Ua5d53EU4BhO4Aw8uIQa3EIdGsBgBM/wCm9O4rw4787HvHXFyWeO4A+czx82rI98</latexit>
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
Lecture Overview
1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Why Image Classification is Hard
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Traditional Approaches
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Traditional Approaches
Sasaki, K., Hashimoto, M., & Nagata, N. (2016). Person Invariant Classification of Subtle Facial Expressions Using Coded Movement Direction of
Keypoints. In Video Analytics. Face and Facial Expression Recognition and Audience Measurement (pp. 61-72). Springer, Cham.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
Traditional Approaches
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
Lecture Overview
1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
Main Concepts Behind
Convolutional Neural Networks
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 13
Convolutional Neural Networks
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
Pooling
Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 16
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 17
Convolutional Neural Networks
Size of the resulting layers
Number of feature detectors
C3: f. maps 16@10x10
INPUT
C1: feature maps
6@28x28
S4: f. maps 16@5x5 Multi-layer perceptron
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84
basically a fully-connected
nowadays called "pooling"
layer + MSE loss
"Feature detectors" (weight matrices) (nowadays better to use
that are being reused ("weight sharing") fc-layer + softmax
=> also called "kernel" or "filter" + cross entropy
Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 18
Weight Sharing
A "feature detector" (filter, kernel) slides over the inputs to generate
a feature map
9
X
The pixels are w j xj
j=1
referred to
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
as "receptive field"
"feature map"
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 19
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map
9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 20
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map
9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
9
X (@1)
wj xj
j=1
<latexit sha1_base64="f26ph3SsblXR0kXxlacC2FehXAE=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaWeSMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaMBqSqqGGlEgiDuMVL3Blcjv35HhKRhcKuGEXE46gbUpxgpLbnmfkvG3E36F3baTs5TeO/220mhbB+l8MHtu2beKlpjwHliT0keTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0lwrliRCeIC6pKlpgDiRTjJ+I4WHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Mh2LMvz5NaqWgfF0s3J/ny5TSOLNgDB6AAbHAKyuAaVEAVYPAInsEreDOejBfj3fiYtGaM6cwu+APj8wfkL5gZ</latexit>
Multiple "feature
detectors" (kernels) are used
to create multiple feature
maps
9
X (@2)
w j xj
<latexit sha1_base64="nCqyd07UuJkUGWlSGLMV2F7bQVM=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaSeTMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaOcVBVVjDQiQVDgMVL3Blcjv35HhKQhv1XDiDgB6nLqU4yUllxzvyXjwE36F3baTs5TeO/220mhXDpK4YPbd828VbTGgPPEnpI8mKLiml+tTojjgHCFGZKyaVuRchIkFMWMpLlWLEmE8AB1SVNTjgIinWT8RgoPtdKBfih0cQXH6u+JBAVSDgNPdwZI9eSsNxL/85qx8s+chPIoVoTjySI/ZlCFcJQJ7FBBsGJDTRAWVN8KcQ8JhJVOLqdDsGdfnie1UtE+LpZuTvLly2kcWbAHDkAB2OAUlME1qIAqwOARPINX8GY8GS/Gu/Exac0Y05ld8AfG5w/luZga</latexit>
j=1
9
X (@3)
wj xj
<latexit sha1_base64="N3BOf0nmcHBzr6vnBzaSoMFhcQo=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiStoC6EohuXFewD2jRMppN22pkkzEzUErJy46+4caGIW7/BnX/j9LHQ1gMXDufcy733eBGjUlnWt7GwuLS8sppZy65vbG5tmzu7NRnGApMqDlkoGh6ShNGAVBVVjDQiQRD3GKl7g6uRX78jQtIwuFXDiDgcdQPqU4yUllzzoCVj7ib9CzttJ+cpvHf77SRfLh2n8MHtu2bOKlhjwHliT0kOTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0mwrliRCeIC6pKlpgDiRTjJ+I4VHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Uh2LMvz5NasWCXCsWbk1z5chpHBuyDQ5AHNjgFZXANKqAKMHgEz+AVvBlPxovxbnxMWheM6cwe+APj8wfnQ5gb</latexit>
j=1
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
Size Before and After Convolutions
padding
W K + 2P
O= +1
<latexit sha1_base64="F3e+5qMk1hWaddof/b46u0hNgJ4=">AAACBXicbZC7SgNBFIbPxluMt1VLLQaDIIhhNwraCEEbwcKI5gLJEmYns8mQ2Qszs0JYtrHxVWwsFLH1Hex8GyfJFpr4w8DHf87hzPndiDOpLOvbyM3NLywu5ZcLK6tr6xvm5lZdhrEgtEZCHoqmiyXlLKA1xRSnzUhQ7LucNtzB5ajeeKBCsjC4V8OIOj7uBcxjBCttdczdG3SO2p7AJGmgI3SNDlG5miZ3qQa7YxatkjUWmgU7gyJkqnbMr3Y3JLFPA0U4lrJlW5FyEiwUI5ymhXYsaYTJAPdoS2OAfSqdZHxFiva100VeKPQLFBq7vycS7Es59F3d6WPVl9O1kflfrRUr78xJWBDFigZkssiLOVIhGkWCukxQovhQAyaC6b8i0sc6EqWDK+gQ7OmTZ6FeLtnHpfLtSbFykcWRhx3YgwOw4RQqcAVVqAGBR3iGV3gznowX4934mLTmjGxmG/7I+PwBia6VZg==</latexit>
S
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 23
Kernel Dimensions and Trainable Parameters
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 24
Backpropagation in CNNs
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 30
Pooling Layers Can Help With Local Invariance
Note that typical pooling layers do not have any learnable parameters
Downside: Information is lost.
May not matter for classification, but applications where relative position is
important (like face recognition)
1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
What a CNN Can See
Simple example: vertical edge detector
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
What a CNN Can See
Simple example: vertical edge detector
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
What a CNN Can See
Simple example: horizontal edge detector
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
repeated in layers 2,3,4,5. The last two layers are fully connected, taking features from
Layer 1
Layer 2
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 48
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Layer 2
Layer 3
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 49
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European
Layer 3 conference on computer vision (pp. 818-833). Springer, Cham.
Layer 4 Layer 5
Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 50
1
Lecture Overview
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 3
no_padding no_strides
4
Padding jargon
• "valid" convolution: no padding
(feature map may shrink)
• "same" convolution: padding such
that the output size is equal to
the input size
• Common kernel size conventions:
• 3x3, 5x5, 7x7 (sometimes
1x1 in later layers to reduce
channels)
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 5
Lecture Overview
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 16
CNNs Architectures Illustrated
1. LeNet-5
2. AlexNet
3. VGG-16
4. ResNet-50
5. Inception-v1
6. Inception-v3
7. Xception
8. Inception-v4
Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
9. Inception-ResNets
10. ResNeXt-50
17
...
LeNet-5 AlexNet VGG-16
ResNet-50 Inception-v1
Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
18
Legend
19
1.LeNet-5 (1998)
20
2.AlexNet (2012)
227x227x3
connected.
● They were the first to implement
● AlexNet just stacked a few more layers Rectified Linear Units (ReLUs) as
onto LeNet-5. activation functions.
● Trained in two GPUs GTX 580 between
5 and 6 days
21
22
3.VGG-16 (2014)
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 23
3.VGG-16 (2014)
PyTorch implementation:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13-cnns-
part2/code/vgg16.ipynb
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
25
4.ResNet-50 (2015)
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 26
4.ResNet-50 (2015)
“with the network depth increasing, accuracy gets GitHub code from keras-team.
Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu
saturated and then degrades rapidly.” Zhang, Shaoqing Ren, Jian Sun. Microsoft. 2016 IEEE Conference on
● Microsoft Research addressed this Computer Vision and Pattern Recognition (CVPR).
problem — using skip connections. ●ResNet 34, 50, 101 up to 152 layers
● Early adopters of batch normalisation ○without compromising generalisation
power
● ~26M parameters. ●152 layers - trained in a cluster of 8
GPUs for 2 to 3 weeks
● The basic building block for ResNets are
●Among the first to use batch
the conv and identity blocks. normalisation.
27
4.ResNet-50 (2015)
28
4.ResNet-50 (2015)
PyTorch implementations of the previous slides:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
blocks.ipynb
PyTorch implementations:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
34.ipynb
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
152.ipynb
29
5.Inception-v1/ GoogLeNet (2014)
Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 30
5.Inception-v1/ GoogLeNet (2014)
"In this paper, we will focus on an efficient deep neural network architecture for computer vision,
codenamed Inception, which derives its name from the Network in network paper by Lin et al [12] in
conjunction with the famous “we need to go deeper” internet meme"
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015.
31
5.Inception-v1/ GoogLeNet (2014)
Full architecture
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015.
32
5.Inception-v1 (2014)
34
Appendix: Network In Network (2014)
● Recalling from convolution:
○ A pixel is a linear combination of the weights in
Paper: Network In Network. Min Lin, Qiang Chen, Shuicheng Yan.
a filter and the current sliding window.
National University of Singapore. arXiv preprint, 2013.
○ NiN proposes a mini neural network with 1
hidden layer instead of this linear combination.
● MLP convolutional layers, 1×1
● One hidden layer network in a CNN. convolutions
○ a.k.a MLPconv is the same as 1×1 convolutions ● Global average pooling (taking
○ Main feature for Inception architectures. average of each feature map, and
feeding the resulting vector into the
softmax layer)
35
LeNet-5 AlexNet VGG-16
ResNet-50 Inception-v1
Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
36
Lecture Overview
● Key idea:
● ✦ Feature extraction layers may be generally useful
● ✦ Use a pre-trained model (e.g., pre-trained on ImageNet)
● ✦ Freeze the weights: Only train last layer (or last few
layers)
● Related approach: Fine-tuning, train a pre-trained network
on your smaller dataset
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 38
Example 3 - Feature Extractor (I)
Etc …..
Example 3 - Feature Extractor (V)
+1 +1
Two Two Four
Image Max Max
Conv3 Conv3 Conv3
224x224
- pool - pool - x1 a1(2) a1(3)
64 128 256
x2 a2(2)
Etc …..
Which Layers to Replace & Train
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision
and Pattern Recognition (pp. 1725-1732).
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 39
Transfer Learning
PyTorch implementation: https://github.com/rasbt/stat453-deep-learning- ss20/blob/master/L13-cnns-
part2/code/vgg16-transferlearning.ipynb
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 40
Transfer Learning
PyTorch implementation: https://github.com/rasbt/stat453-deep-learning- ss20/blob/master/L13-cnns-
part2/code/vgg16-transferlearning.ipynb
Freeze
Replace
Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 41
42
Extra: Useful tools to visualize DL architectures
● Netron
● Tensorboard
● PyTorchViz
● plot_model API by Keras
43
Lecture 13
Introduction to
Convolutional Neural Networks
Part 3
STAT 479: Deep Learning, Spring 2019
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat479-ss2019/
3D
3D
3D
3DConv
3D DenseBlock Temporal
Temporal
Temporal 3D DenseBlock Temporal Temporal
3D
3D DenseBlock
DenseBlock Temporal 3D DenseBlock Temporal
Temporal
Temporal 3D
3D DenseBlock
DenseBlock Temporal
Temporal 3D
3D DenseBlock
DenseBlock
Conv
3DConv
DenseBlock Transition
Transition 3D DenseBlock 3D DenseBlock Temporal 3D DenseBlock
11 3DConv
DenseBlock Transition Action
Conv
Transition 11 Transition 3DConv
DenseBlock Transition
Transition 3DConv
DenseBlock
Conv
Conv Transition Conv Transition
Transition 11 Transition
Transition 11
Conv
Conv11 111 Conv Conv Conv
1 Conv12 111
2
Conv
Conv13 111
3
Conv
Conv14 Label
1*1*T
1*1*T
1*1*T11
1*1*T11
Conv
Conv
Conv
Conv
3D DenseBlock 3*3*T
3*3*T
3*3*T22 Conv Avg 3D DenseBlock
3*3*T22
Conv Conv
Conv
Conv Concat Pooling Conv
Conv
3*3*T
3*3*T
3*3*T33
3*3*T33
Conv
Conv
Conv
Conv
Figure 1: Temporal 3D ConvNet (T3D). Our Temporal Transition Layer (TTL) is applied to our DenseNet3D. T3D uses
video clips as input. The 3D feature-maps from the clips are densely propagated throughout the network. The TTL operates
Diba, Ali,onMohsen Fayyaz,
the different Vivek
temporal Sharma,
depths, Amir Hossein
thus allowing Karami,
the model Mohammad
to capture Mahdi
the appearance Arzani,
and Rahman
temporal Yousefzadeh,
information and Luc
from the short,
Van Gool.
mid, "Temporal 3d convnets:
and long-range New
terms. The architecture
output andistransfer
of the network learning
a video-level for video classification." arXiv preprint arXiv:
prediction.
1711.08200 (2017).
agation, and state-of-the-art performance on image classi- ral depths. The advantage of TTL is that it captures the short,
fication tasks. In specific, (i) we modify 2D DenseNet mid, and long term dynamics, that embody important in-
by replacing the 2D kernels by 3D kernels in the standard formation not captured when working with some fixed tem-
DenseNet architecture and we present it as DenseNet3D; poral depth homogeneously throughout the network. The
Also very popular for Medical Imaging (MRI, CT scans ...)
and (ii) introducing our new Temporal 3D ConvNets (T3D) feature-map of lth layer is fed as input to the TTL layer,
0
by deploying 3D temporal transition layer (TTL) instead of T T L : x ! x , resulting in a dense-aggregated feature rep-
0
transition layer in DenseNet. In both setups, the building 0 0
resentation x , where x 2 Rh⇥w⇥c and x 2 Rh⇥w⇥c .
blocks of the network and the architecture choices proposed In specific, the feature-map from lth , xl is convolved with
in [17] are kept same. K variable 3D convolution kernel temporal depths, result-
Notation. The output feature-maps of the 3D Convolu- ing to intermediate feature-maps {S1 , S2 , . . . , SK }, S1 2
Sebastian
tions and pooling kernels atRaschka
the l layer extractedSTAT
th
for an 479:RDeep
h⇥w⇥c1 Learningh⇥w⇥c2
, S2 2 R , SK SS
2 R2019
h⇥w⇥cK
, where c1 , 32
h⇥w⇥c
ConvNets and 3D Inputs
(concatenated
word embeddings)
This Is my great sentence
https://pytorch.org/docs/stable/nn.html#conv1d
wait
for
the
video
and
do
n't
rent
it
necessary)
Kim, is represented
Y. (2014). as neural networks for sentence
Convolutional thatclassification.
is kept static arXiv
throughout
preprinttraining and one that
arXiv:1408.5882.
is fine-tuned via backpropagation (section 3.2).2
x1:n =Sebastian
x1 x2 Raschka
. . . xn , (1)479:InDeep
STAT the multichannel
Learning architecture,
SS 2019 illustrated in fig- !38
Pre-Trained Models for Text
https://modelzoo.co/model/pytorch-nlp
Introduction to
Recurrent Neural Networks
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
Lecture Slides:
https://github.com/rasbt/stat453-deep-learning-ss20/tree/master/L14-rnns
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
A Classic Approach for Text Classification:
Bag-of-Words Model
vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>
[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
weet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>
'two': 7, ⇤ ⇥
'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
training
class labels
Classifier
Ex.: https://github.com/rasbt/python-machine-learning-book-3rd-
edition/tree/master/ch08 (e.g., logistic regression, MLP, ...)
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
1D CNNs for text (and other sequence data)
T
h
e
s
u
n
i
s ...
s
h
i
n
i
n
g
.
.
.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Lecture Overview
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 4
Sequential data is not i.i.d.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
www.nature.com/scientificreports/
Applications:
Working with Sequential Data
• Text classification
• Speech recognition (acoustic modeling)
• language translation
• ...
Fig 8. Displays the actual data and the predicted data from the four models for each stock index in
sequence modeling
Year 1 from 2010.10.01 to 2011.09.30.
https://doi.org/10.1371/journal.pone.0180944.g008
Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using
stacked autoencoders and long-short term memory." PloS one 12, no. 7 (2017): e0180944.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Overview time step t
Networks we used
previously: also called Recurrent Neural
feedforward neural Network (RNN)
networks
Recurrent edge
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Overview
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Different Types of Sequence Modeling Tasks
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Different Types of Sequence Modeling Tasks
Ex.: sentiment analysis, the input is some text, and the output is a class
label.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
https://www.kdnuggets.com/images/sentiment-fig-1-689.jpg
Different Types of Sequence Modeling Tasks
Ex.: Image captioning, where the input is an image, the output is a text
description of that image
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
https://towardsdatascience.com/image-captioning-in-deep-learning-9cd23fb4d8d2
Different Types of Sequence Modeling Tasks
Many-to-many: Both inputs and outputs are sequences. Can be direct
or delayed.
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
https://static-01.hindawi.com/articles/mpe/volume-2018/3125879/figures/3125879.fig.001.svgz