0% found this document useful (0 votes)

8 views103 pages

DL Lecture Part2

The document outlines a two-week lecture series on deep learning, presented by Dra. Jeaneth Machicao and based on the course STAT 453 by Prof. Sebastian Raschka. It covers various topics including mathematical foundations, neural networks, convolutional neural networks (CNNs), and applications in computer vision and language modeling. The document also includes links to course materials and playlists for further learning.

Uploaded by

jamilfelippe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views103 pages

DL Lecture Part2

Uploaded by

jamilfelippe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Deep learning:

a 2-weeks lecture
Part 2

Presented by: Dra. Jeaneth Machicao October/2020

[email protected]
Course overview: STAT 453: Deep Learning, Spring 2020
by Prof. Sebastian Raschka
Part1: Introduction Part4: DL for computer vision and language modeling
● Introduction to deep learning ● Introduction to convolutional neural networks 1-2
● The brief history of deep learning ○CNNs Architectures Illustrated
● Single-layer neural networks: The perceptron ● Introduction to recurrent neural networks 1-2
● Motivation: cases of use Part5: Deep generative models
● Hands-on ● Autoencoders,
Part2: Mathematical and computational foundations ● Autoregressive models
● Linear algebra and calculus for deep learning ● Variational autoencoders
● Parameter optimization with gradient descent ● Normalizing Flow models
● Automatic differentiation & PyThorch ● Generative adversarial networks
Part3: Introduction to neural networks ● Evaluating generative models
● Multinomial logistic regression
● Multilayer pecerptrons http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
● Regularization https://github.com/rasbt/stat453-deep-learning-ss20
● Input normalization and weight initiliazation
• Course Playlist on youtube:
● Learning rated and advanced optimization algorithms Prof. Dalcimar Casanova
https://www.youtube.com/watch?v=0VD_2t6EdS4&list=PL9At2PVRU0ZqVArhU9QMyI3jSe113_m2-
Prof. Sebastian Raschka
https://www.youtube.com/watch?v=e_I0q3mmfw4&list=PLTKMiZHVd_2JkR6QtQEnml7swCnFBtq4P
Overview of our 2-weeks lecture!

1: Introduction 4: DL for computer vision and language modeling

● Introduction to deep learning ● Introduction to convolutional neural networks 1-2
● The brief history of deep learning ○ CNNs Architectures Illustrated
● Single-layer neural networks: The perceptron ● Introduction to recurrent neural networks 1-2
● Motivation: cases of use
● Hands-on (report)
2: Mathematical and computational foundations ● Deliver report of the hands-on
● Linear algebra and calculus for deep learning
● Parameter optimization with gradient descent
● Automatic differentiation & PyThorch
3: Introduction to neural networks
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
● Multinomial logistic regression
https://github.com/rasbt/stat453-deep-learning-ss20
● Multilayer pecerptrons
● Regularization • Course Playlist on youtube:
● Input normalization and weight initiliazation Prof. Dalcimar Casanova
● Learning rated and advanced optimization algorithms Prof. Sebastian Raschka
Lecture 12

Introduction to
Convolutional Neural Networks
Part 1
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

https://github.com/rasbt/stat453-deep-learning-ss20/tree/master/L12-cnns

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
CNNs for Image Classification

output

Image Source:
twitter.com%2Fcats&psig=AOvVaw30_o-PCM-
K21DiMAJQimQ4&ust=1553887775741551
p(y=cat)

Image Source: https://www.pinterest.com/pin/

244742560974520446

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
Object Detection

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (pp. 779-788).

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Object Segmentation

umbrella.98 bus.99

umbrella.98
person1.00

person1.00
person1.00
backpack1.00
person1.00 person.99
handbag.96 person.99
person1.00 person1.00 person1.00
person1.00 person1.00
person.95 person.98
person1.00
person1.00 person1.00 person.94 person1.00 person1.00 person.89

person1.00 sheep.99
backpack.99
sheep.99 sheep.86
backpack.93 sheep.82 sheep.96
sheep.96 sheep.93 sheep.91 sheep.95 sheep.96 sheep1.00
sheep1.00
sheep.99
sheep1.00
sheep.99
sheep.96

sheep.99

person.99
bottle.99
dining table.96

bottle.99
bottle.99

person.99person1.00
person1.00
traffic light.96 tv.99

chair.98 chair.99
chair.90
dining table.99 chair.96 wine glass.97
chair.86
bottle.99wine glass.93 chair.99
bowl.85 wine glass1.00

elephant1.00
wine glass.99
wine glass1.00
person1.00 chair.96 chair.99 fork.95

person1.00 traffic light.95 bowl.81

person1.00
traffic light.92 traffic light.84
person1.00 person.85
person.96 truck1.00 person.99
motorcycle1.00 person.96person1.00
person.83 person1.00
motorcycle1.00 person.98
person.99 person.91
person.90 person.87 car.99 car.92
person.99
person.92 car.99 car.93
car1.00
motorcycle.95
knife.83

person.96

Figure 2. Mask R-CNN results on the COCO test set. These results are based on ResNet-101 [15], achieving a mask AP of 35.7 and
running at 5 fps. Masks are shown in color, and bounding box, category, and confidences are also shown.

ingly minor change, RoIAlign has a large impact: it im- 2. Related Work
proves mask accuracy by relative 10% to 50%, showing
He,bigger
Kaiming,
gainsGeorgia Gkioxari,
under stricter Piotr Dollár,
localization and Ross
metrics. Girshick.
Second, we R-CNN:
"Mask The Region-based
R-CNN." CNN
In Proceedings (R-CNN)
of the approach [10]
IEEE International
Conference on Computer
found it essential Vision,mask
to decouple pp. 2961-2969. 2017. we
and class prediction: to bounding-box object detection is to attend to a manage-
predict a binary mask for each class independently, without able number of candidate object regions [33, 16] and evalu-
competition among classes, and rely on the network’s RoI ate convolutional networks [20, 19] independently on each
Sebastian
classification Raschka
branch to predict theSTAT 453: In
category. Intro RoI. R-CNN
to Deep Learning
contrast, and was extendedModels
Generative [14, 9] to allow attending
SS 2020to RoIs 4
Face Recognition

[1]
x
<latexit sha1_base64="p8Wx+cqqkWj+1zNtDaf7R0Gpalg=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122y7dbMLuRCyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLEykMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3Q2q4FIo3UKDk7URzGoWSt8LRzdRvPXJtRKzucZzwIKIDJfqCUbRS6+kh871g0i2V3Yo7A1kmXk7KkKPeLX11ejFLI66QSWqM77kJBhnVKJjkk2InNTyhbEQH3LdU0YibIJudOyGnVumRfqxtKSQz9fdERiNjxlFoOyOKQ7PoTcX/PD/F/lWQCZWkyBWbL+qnkmBMpr+TntCcoRxbQpkW9lbChlRThjahog3BW3x5mTSrFe+8Ur27KNeu8zgKcAwncAYeXEINbqEODWAwgmd4hTcncV6cd+dj3rri5DNH8AfO5w81Jo97</latexit>

Similarity/
Distance
Score

[2]
x
<latexit sha1_base64="vzgd/QPklE2GpKgvXahAxpOTUdw=">AAAB7nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/YA0ls122i7dbMLuRiyhP8KLB0W8+nu8+W/ctjlo64OBx3szzMwLE8G1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1Q6pRcIkNw43AdqKQRqHAVji6mfqtR1Sax/LejBMMIjqQvM8ZNVZqPT1kfjWYdEtlt+LOQJaJl5My5Kh3S1+dXszSCKVhgmrte25igowqw5nASbGTakwoG9EB+pZKGqEOstm5E3JqlR7px8qWNGSm/p7IaKT1OAptZ0TNUC96U/E/z09N/yrIuExSg5LNF/VTQUxMpr+THlfIjBhbQpni9lbChlRRZmxCRRuCt/jyMmlWK955pXp3Ua5d53EU4BhO4Aw8uIQa3EIdGsBgBM/wCm9O4rw4787HvHXFyWeO4A+czx82rI98</latexit>

Siamese neural network

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Why Image Classification is Hard

Diﬀerent lighting, contrast, viewpoints, etc.

Image Source: Image Source: https://www.123rf.com/

twitter.com%2Fcats&psig=AOvVaw30_o-PCM- photo_76714328_side-view-of-tabby-cat-face-over-
K21DiMAJQimQ4&ust=1553887775741551 white.html

Or even simple translation This is hard for traditional

methods like multi-layer
perceptrons, because
the prediction is
basically based on a sum
of pixel intensities

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Traditional Approaches

a) Use hand-engineered features

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Traditional Approaches

a) Use hand-engineered features

Sasaki, K., Hashimoto, M., & Nagata, N. (2016). Person Invariant Classification of Subtle Facial Expressions Using Coded Movement Direction of
Keypoints. In Video Analytics. Face and Facial Expression Recognition and Audience Measurement (pp. 61-72). Springer, Cham.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
Traditional Approaches

b) Preprocess images (centering, cropping, etc.)

Image Source: https://www.tokkoro.com/2827328-cat-animals-nature-feline-park-green-trees-grass.html

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
Main Concepts Behind
Convolutional Neural Networks

• Sparse-connectivity: A single element in the feature map is

connected to only a small patch of pixels. (This is very diﬀerent
from connecting to the whole input image, in the case of multi-layer
perceptrons.)

• Parameter-sharing: The same weights are used for diﬀerent

patches of the input image.

• Many layers: Combining extracted local patterns to global patterns

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 13
Convolutional Neural Networks
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections

Convolutions Subsampling Convolutions Subsampling Full connection

Pooling

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections

Convolutions Subsampling Convolutions Subsampling Full connection

"Automatic feature extractor" "Regular classifier"

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 16
Hidden Layers
C3: f. maps 16@10x10
C1: feature maps S4: f. maps 16@5x5
INPUT 6@28x28
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections

Convolutions Subsampling Convolutions Subsampling Full connection

Each "bunch" of feature maps represents one

hidden layer in the neural network.

Counting the FC layers, this network has 5 layers

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 17
Convolutional Neural Networks
Size of the resulting layers
Number of feature detectors
C3: f. maps 16@10x10
INPUT
C1: feature maps
6@28x28
S4: f. maps 16@5x5 Multi-layer perceptron
32x32 S2: f. maps C5: layer F6: layer OUTPUT
6@14x14 120 10
84

Full connection Gaussian connections

Convolutions Subsampling Convolutions Subsampling Full connection

basically a fully-connected
nowadays called "pooling"
layer + MSE loss
"Feature detectors" (weight matrices) (nowadays better to use
that are being reused ("weight sharing") fc-layer + softmax
=> also called "kernel" or "filter" + cross entropy

Yann LeCun, Léon Bottou, Yoshua Bengio and Patrick Haffner: Gradient Based Learning Applied to Document Recognition,
Proceedings of IEEE, 86(11):2278–2324, 1998.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 18
Weight Sharing
A "feature detector" (filter, kernel) slides over the inputs to generate
a feature map

9
X
The pixels are w j xj
j=1
referred to
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>

as "receptive field"

"feature map"

Rationale: A feature detector that works well in one region

may also work well in another region

Plus, it is a nice reduction in parameters to fit

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 19
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

9
X
w j xj
<latexit sha1_base64="A0KexUBWYzFCrOQ6nv7KbgccmW0=">AAAB/3icbVDLSgMxFM34rPU1KrhxEyyCqzJTBXUhFN24rGAf0I5DJk3btElmSDJqGWfhr7hxoYhbf8Odf2PazkJbD1w4nHMv994TRIwq7Tjf1tz8wuLScm4lv7q2vrFpb23XVBhLTKo4ZKFsBEgRRgWpaqoZaUSSIB4wUg8GlyO/fkekoqG40cOIeBx1Be1QjLSRfHu3pWLuJ/1zN71NzlJ47/fhg9/37YJTdMaAs8TNSAFkqPj2V6sd4pgToTFDSjVdJ9JegqSmmJE034oViRAeoC5pGioQJ8pLxven8MAobdgJpSmh4Vj9PZEgrtSQB6aTI91T095I/M9rxrpz6iVURLEmAk8WdWIGdQhHYcA2lQRrNjQEYUnNrRD3kERYm8jyJgR3+uVZUisV3aNi6fq4UL7I4siBPbAPDoELTkAZXIEKqAIMHsEzeAVv1pP1Yr1bH5PWOSub2QF/YH3+AHSflbs=</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 20
Weight Sharing
A "feature detector" (kernel) slides over the inputs to generate
a feature map

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
9
X (@1)
wj xj
j=1
<latexit sha1_base64="f26ph3SsblXR0kXxlacC2FehXAE=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaWeSMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaMBqSqqGGlEgiDuMVL3Blcjv35HhKRhcKuGEXE46gbUpxgpLbnmfkvG3E36F3baTs5TeO/220mhbB+l8MHtu2beKlpjwHliT0keTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0lwrliRCeIC6pKlpgDiRTjJ+I4WHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Mh2LMvz5NaqWgfF0s3J/ny5TSOLNgDB6AAbHAKyuAaVEAVYPAInsEreDOejBfj3fiYtGaM6cwu+APj8wfkL5gZ</latexit>

Multiple "feature
detectors" (kernels) are used
to create multiple feature
maps
9
X (@2)
w j xj
<latexit sha1_base64="nCqyd07UuJkUGWlSGLMV2F7bQVM=">AAACBnicbVDLSsNAFJ3UV62vqEsRBotQNyWpgroQim5cVrAPaNMwmU7aaSeTMDNRS8jKjb/ixoUibv0Gd/6N08dCWw9cOJxzL/fe40WMSmVZ30ZmYXFpeSW7mltb39jcMrd3ajKMBSZVHLJQNDwkCaOcVBVVjDQiQVDgMVL3Blcjv35HhKQhv1XDiDgB6nLqU4yUllxzvyXjwE36F3baTs5TeO/220mhXDpK4YPbd828VbTGgPPEnpI8mKLiml+tTojjgHCFGZKyaVuRchIkFMWMpLlWLEmE8AB1SVNTjgIinWT8RgoPtdKBfih0cQXH6u+JBAVSDgNPdwZI9eSsNxL/85qx8s+chPIoVoTjySI/ZlCFcJQJ7FBBsGJDTRAWVN8KcQ8JhJVOLqdDsGdfnie1UtE+LpZuTvLly2kcWbAHDkAB2OAUlME1qIAqwOARPINX8GY8GS/Gu/Exac0Y05ld8AfG5w/luZga</latexit>
j=1

9
X (@3)
wj xj
<latexit sha1_base64="N3BOf0nmcHBzr6vnBzaSoMFhcQo=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBHqpiStoC6EohuXFewD2jRMppN22pkkzEzUErJy46+4caGIW7/BnX/j9LHQ1gMXDufcy733eBGjUlnWt7GwuLS8sppZy65vbG5tmzu7NRnGApMqDlkoGh6ShNGAVBVVjDQiQRD3GKl7g6uRX78jQtIwuFXDiDgcdQPqU4yUllzzoCVj7ib9CzttJ+cpvHf77SRfLh2n8MHtu2bOKlhjwHliT0kOTFFxza9WJ8QxJ4HCDEnZtK1IOQkSimJG0mwrliRCeIC6pKlpgDiRTjJ+I4VHWulAPxS6AgXH6u+JBHEph9zTnRypnpz1RuJ/XjNW/pmT0CCKFQnwZJEfM6hCOMoEdqggWLGhJggLqm+FuIcEwkonl9Uh2LMvz5NasWCXCsWbk1z5chpHBuyDQ5AHNjgFZXANKqAKMHgEz+AVvBlPxovxbnxMWheM6cwe+APj8wfnQ5gb</latexit>
j=1

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
Size Before and After Convolutions

Feature map size: kernel width

input width

padding
W K + 2P
O= +1
<latexit sha1_base64="F3e+5qMk1hWaddof/b46u0hNgJ4=">AAACBXicbZC7SgNBFIbPxluMt1VLLQaDIIhhNwraCEEbwcKI5gLJEmYns8mQ2Qszs0JYtrHxVWwsFLH1Hex8GyfJFpr4w8DHf87hzPndiDOpLOvbyM3NLywu5ZcLK6tr6xvm5lZdhrEgtEZCHoqmiyXlLKA1xRSnzUhQ7LucNtzB5ajeeKBCsjC4V8OIOj7uBcxjBCttdczdG3SO2p7AJGmgI3SNDlG5miZ3qQa7YxatkjUWmgU7gyJkqnbMr3Y3JLFPA0U4lrJlW5FyEiwUI5ymhXYsaYTJAPdoS2OAfSqdZHxFiva100VeKPQLFBq7vycS7Es59F3d6WPVl9O1kflfrRUr78xJWBDFigZkssiLOVIhGkWCukxQovhQAyaC6b8i0sc6EqWDK+gQ7OmTZ6FeLtnHpfLtSbFykcWRhx3YgwOw4RQqcAVVqAGBR3iGV3gznowX4934mLTmjGxmG/7I+PwBia6VZg==</latexit>
S

output width stride

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 23
Kernel Dimensions and Trainable Parameters

For a grayscale image with a

5x5 feature detector (kernel),
we have the following dimensions
(number of parameters to learn)

What do you think is the output size

for this 28x28 image?

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 24
Backpropagation in CNNs

Same overall concept as before: Multivariable chain rule,

but now with an additional weight sharing constraint

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 30
Pooling Layers Can Help With Local Invariance

Sebastian Raschka, Vahid Mirjalili. Python Machine

Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019. ISBN: 978-1789955750

Downside: Information is lost.

May not matter for classification, but applications where relative position is
important (like face recognition)

In practice for CNNs: some image preprocessing still recommended

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 34
Pooling Layers Can Help With Local Invariance

Note that typical pooling layers do not have any learnable parameters
Downside: Information is lost.
May not matter for classification, but applications where relative position is
important (like face recognition)

In practice for CNNs: some image preprocessing still recommended

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 35
Lecture Overview

1. Image Classification
2. Convolutional Neural Network Basics
3. CNN Architectures
4. What a CNN Can See
5. CNNs in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
What a CNN Can See
Simple example: vertical edge detector

(From classical computer vision research)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
What a CNN Can See
Simple example: vertical edge detector

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
What a CNN Can See
Simple example: horizontal edge detector

A CNN can learn whatever it finds

best based on optimizing the objective
(e.g., minimizing a particular loss
to achieve good classification accuracy)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
repeated in layers 2,3,4,5. The last two layers are fully connected, taking features from

What a CNN Can See

the top convolutional layer as input in vector form (6 · 6 · 256 = 9216 dimensions).
The final layer is a C-way softmax function, C being the number of classes. All filters
and feature maps are square in shape.
Which patterns from the training set activate the feature map?

Layer 1 Layer 2 Layer 3 Layer 4 Layer 5

Fig. 4. Evolution of a randomly chosen subset of model features through training.

Each layer’s features are displayed in a different block. Within each block, we show
a randomly chosen subset of features at epochs [1,2,5,10,20,30,40,64]. The visualiza-
tion shows the strongest activation (across all training examples) for a given feature
map, projected down to pixel space using our deconvnet approach. Color contrast is
artificially enhanced and the figure is best viewed in electronic form.
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. European
occluderIncovers theconference on computer
image region vision (pp.
that appears 818-833).
in the Springer, Cham.
visualization, we see a
strong drop in activity in the feature map. This shows that the visualization
Method: backpropagate
genuinely correspondsstrong
to theactivation signals in
image structure thathidden layers that
stimulates to the input map,
feature images,
thenhence
applyvalidating the to
"unpooling" other
mapvisualizations
the values toshown in Fig.pixel
the original 4 and Fig. for
space 2.
visualization
5 Experiments STAT 453: Intro to Deep Learning and Generative Models
Sebastian Raschka SS 2020 47
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks.
824 In European
M.D. Zeiler andconference
R. Ferguson computer vision (pp. 818-833). Springer, Cham.

Layer 1

Layer 2

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 48
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Layer 2

Layer 3

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 49
What a CNN Can See
Which patterns from the training set activate the feature map?
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional
networks. In European
Layer 3 conference on computer vision (pp. 818-833). Springer, Cham.

Layer 4 Layer 5
Fig. 2. Visualization of features in a fully trained model. For layers 2-5 we show the top
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 50
1
Lecture Overview

1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
○ LeNet-5
○ AlexNet
○ VGG-16
○ ResNet-50
○ Inception-v1
5. Transfer learning 2
Padding

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 3
no_padding no_strides

4
Padding jargon
• "valid" convolution: no padding
(feature map may shrink)
• "same" convolution: padding such
that the output size is equal to
the input size
• Common kernel size conventions:
• 3x3, 5x5, 7x7 (sometimes
1x1 in later layers to reduce
channels)

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 5
Lecture Overview

1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
○ LeNet-5
○ AlexNet
○ VGG-16
○ ResNet-50
○ Inception-v1
5. Transfer learning 6
7
11
Lecture Overview

1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
○ LeNet-5
○ AlexNet
○ VGG-16
○ ResNet-50
○ Inception-v1
5. Transfer learning 12
13
14
Lecture Overview

1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
○ LeNet-5
○ AlexNet
○ VGG-16
○ ResNet-50
○ Inception-v1
5. Transfer learning
15
Common Architectures Revisited
We will discuss some additional common CNN architectures since the field evolved quite a bit since 2012 ...

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 16
CNNs Architectures Illustrated

1. LeNet-5
2. AlexNet
3. VGG-16
4. ResNet-50
5. Inception-v1
6. Inception-v3
7. Xception
8. Inception-v4
Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
9. Inception-ResNets
10. ResNeXt-50
17
...
LeNet-5 AlexNet VGG-16

ResNet-50 Inception-v1

Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
18
Legend

19
1.LeNet-5 (1998)

● ~60,000 parameters. paper: Gradient-Based Learning Applied to Document Recognition.

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner.
● One of the simplest architectures. Proceedings of the IEEE (1998).

● (“5”-layers) 2 convolutional and 3 fully- ● This architecture has become the

connected layers. standard ‘template’: stacking
● Sub-sampling layer and trainable weights convolutions and pooling layers, and
(aka average-pooling layer) ending the network with one or more
fully-connected layers.
○ (trainable weights is not current practice of
designing CNNs nowadays).

20
2.AlexNet (2012)

227x227x3

● ~60M parameters, paper: ImageNet Classification with Deep Convolutional Neural

Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton. University
● 8 layers — 5 convolutional and 3 fully- of Toronto, Canada. NeurIPS 2012

connected.
● They were the first to implement
● AlexNet just stacked a few more layers Rectified Linear Units (ReLUs) as
onto LeNet-5. activation functions.
● Trained in two GPUs GTX 580 between
5 and 6 days

21
22
3.VGG-16 (2014)

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 23
3.VGG-16 (2014)

● CNNs starting to get deeper and deeper.

paper: Very Deep Convolutional Networks for Large-Scale Image
● Creators: Visual Geometry Group (VGG). Recognition. Karen Simonyan, Andrew Zisserman. University of
● 13 convolutional and 3 fully-connected Oxford, UK. arXiv preprint, 2014
layers
○ carrying ReLU tradition from AlexNet. ● The contribution from this paper is the
●Stacks more layers onto AlexNet, and smaller
designing of deeper networks (roughly
size filters
twice as deep as AlexNet).
○ (2×2 and 3×3).
●~138M parameters.
●~500MB of storage space.
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
24
3.VGG-16 (2014)

PyTorch implementation:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13-cnns-
part2/code/vgg16.ipynb

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
25
4.ResNet-50 (2015)

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 26
4.ResNet-50 (2015)

“with the network depth increasing, accuracy gets GitHub code from keras-team.
Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu
saturated and then degrades rapidly.” Zhang, Shaoqing Ren, Jian Sun. Microsoft. 2016 IEEE Conference on
● Microsoft Research addressed this Computer Vision and Pattern Recognition (CVPR).

problem — using skip connections. ●ResNet 34, 50, 101 up to 152 layers
● Early adopters of batch normalisation ○without compromising generalisation
power
● ~26M parameters. ●152 layers - trained in a cluster of 8
GPUs for 2 to 3 weeks
● The basic building block for ResNets are
●Among the first to use batch
the conv and identity blocks. normalisation.
27
4.ResNet-50 (2015)

28
4.ResNet-50 (2015)
PyTorch implementations of the previous slides:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
blocks.ipynb

PyTorch implementations:
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
34.ipynb
https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L13- cnns-part2/code/resnet-
152.ipynb

(Can be substantially improved with more hyperparameter tuning)

29
5.Inception-v1/ GoogLeNet (2014)

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. 30
5.Inception-v1/ GoogLeNet (2014)

"In this paper, we will focus on an efficient deep neural network architecture for computer vision,
codenamed Inception, which derives its name from the Network in network paper by Lin et al [12] in
conjunction with the famous “we need to go deeper” internet meme"

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9. 2015.

31
5.Inception-v1/ GoogLeNet (2014)

Full architecture

32
5.Inception-v1 (2014)

“The main hallmark of this architecture is the improved

utilisation of the computing resources inside the network.”
● 22-layer architecture with 5M parameters.
● Used Network In Network approach
○ using ‘Inception modules’.
Each module presents 3 ideas: paper. Going Deeper with Convolutions. Christian Szegedy, Wei Liu,
Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
1) Parallel towers of convs. with different filters, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Google,
followed by concatenation University of Michigan, University of North Carolina. 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR)
○ captures different features at 1×1, 3×3 and 5×5,
‘clustering’ them. ● Dense modules/blocks instead of stacking
2) 1×1 convs. for dim. reduction (avoid bottlenecks). convolutional layers.
3) Two auxiliary classifiers ● Curiosity: name Inception (from movie).
○ discarded at inference time. 33
5.Inception-v1 (2014)

34
Appendix: Network In Network (2014)
● Recalling from convolution:
○ A pixel is a linear combination of the weights in
Paper: Network In Network. Min Lin, Qiang Chen, Shuicheng Yan.
a filter and the current sliding window.
National University of Singapore. arXiv preprint, 2013.
○ NiN proposes a mini neural network with 1
hidden layer instead of this linear combination.
● MLP convolutional layers, 1×1
● One hidden layer network in a CNN. convolutions
○ a.k.a MLPconv is the same as 1×1 convolutions ● Global average pooling (taking
○ Main feature for Inception architectures. average of each feature map, and
feeding the resulting vector into the
softmax layer)

35
LeNet-5 AlexNet VGG-16

ResNet-50 Inception-v1

Source: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
36
Lecture Overview

1. Padding (control output size in addition to stride)

2. Spatial Dropout and BatchNorm
3. Considerations for CNNs on GPUs
4. Common Architectures
○ LeNet-5
○ AlexNet
○ VGG-16
○ ResNet-50
○ Inception-v1
5. Transfer learning 37
Transfer Learning

● Key idea:
● ✦ Feature extraction layers may be generally useful
● ✦ Use a pre-trained model (e.g., pre-trained on ImageNet)
● ✦ Freeze the weights: Only train last layer (or last few
layers)
● Related approach: Fine-tuning, train a pre-trained network
on your smaller dataset

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 38
Example 3 - Feature Extractor (I)

● Pre-trained VGG19 model

Two Two Four Four Four

Image Max Max Max Max Max 3 FC
Conv3 Conv3 Conv3 Conv3 Conv3
224x224
- pool - pool - pool - pool - pool
64 128 256 512 512
Example 3 - Feature Extractor (II)

● Pre-trained VGG19 model

Two Two Four Four Four

Image Max Max Max Max Max 3 FC
Conv3 Conv3 Conv3 Conv3 Conv3
224x224
- pool - pool - pool - pool - pool
64 128 256 512 512
Example 3 - Feature Extractor (III)

● Pre-trained VGG19 model

Two Two Four Four Four

Image Max Max Max Max Max 3 FC
Conv3 Conv3 Conv3 Conv3 Conv3
224x224
- pool - pool - pool - pool - pool
64 128 256 512 512
Example 3 - Feature Extractor (IV)

● Pre-trained VGG19 model

Two Two Four Four Four

Image Max Max Max Max Max 3 FC
Conv3 Conv3 Conv3 Conv3 Conv3
224x224
- pool - pool - pool - pool - pool
64 128 256 512 512

Etc …..
Example 3 - Feature Extractor (V)

● Pre-trained VGG19 model

Capa 1 Capa 2 Capa 3

+1 +1
Two Two Four
Image Max Max
Conv3 Conv3 Conv3
224x224
- pool - pool - x1 a1(2) a1(3)
64 128 256
x2 a2(2)

Etc …..
Which Layers to Replace & Train
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision
and Pattern Recognition (pp. 1725-1732).

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 39
Transfer Learning
PyTorch implementation: https://github.com/rasbt/stat453-deep-learning- ss20/blob/master/L13-cnns-
part2/code/vgg16-transferlearning.ipynb

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 40
Transfer Learning
PyTorch implementation: https://github.com/rasbt/stat453-deep-learning- ss20/blob/master/L13-cnns-
part2/code/vgg16-transferlearning.ipynb

Freeze

Replace

Adapted from: Sebastian Raschka. STAT 453: Intro to Deep Learning and Generative Models. SS 2020 41
42
Extra: Useful tools to visualize DL architectures
● Netron
● Tensorboard
● PyTorchViz
● plot_model API by Keras

43
Lecture 13

Introduction to
Convolutional Neural Networks
Part 3
STAT 479: Deep Learning, Spring 2019
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat479-ss2019/

Sebastian Raschka STAT 479: Deep Learning SS 2019 1

Additional Concepts to Wrap Up the
Intro to Convolutional Neural Networks

Sebastian Raschka STAT 479: Deep Learning SS 2019 !31

ConvNets and 3D Inputs

3D
3D
3D
3DConv
3D DenseBlock Temporal
Temporal
Temporal 3D DenseBlock Temporal Temporal
3D
3D DenseBlock
DenseBlock Temporal 3D DenseBlock Temporal
Temporal
Temporal 3D
3D DenseBlock
DenseBlock Temporal
Temporal 3D
3D DenseBlock
DenseBlock

Conv
3DConv
DenseBlock Transition
Transition 3D DenseBlock 3D DenseBlock Temporal 3D DenseBlock
11 3DConv
DenseBlock Transition Action

Conv
Transition 11 Transition 3DConv
DenseBlock Transition
Transition 3DConv
DenseBlock

Conv
Conv Transition Conv Transition
Transition 11 Transition
Transition 11
Conv
Conv11 111 Conv Conv Conv
1 Conv12 111
2
Conv
Conv13 111
3
Conv
Conv14 Label

1*1*T
1*1*T
1*1*T11
1*1*T11
Conv
Conv
Conv
Conv

3D DenseBlock 3*3*T
3*3*T
3*3*T22 Conv Avg 3D DenseBlock
3*3*T22
Conv Conv
Conv
Conv Concat Pooling Conv
Conv

3*3*T
3*3*T
3*3*T33
3*3*T33
Conv
Conv
Conv
Conv

3D Temporal Transition Layer

Figure 1: Temporal 3D ConvNet (T3D). Our Temporal Transition Layer (TTL) is applied to our DenseNet3D. T3D uses
video clips as input. The 3D feature-maps from the clips are densely propagated throughout the network. The TTL operates
Diba, Ali,onMohsen Fayyaz,
the different Vivek
temporal Sharma,
depths, Amir Hossein
thus allowing Karami,
the model Mohammad
to capture Mahdi
the appearance Arzani,
and Rahman
temporal Yousefzadeh,
information and Luc
from the short,
Van Gool.
mid, "Temporal 3d convnets:
and long-range New
terms. The architecture
output andistransfer
of the network learning
a video-level for video classification." arXiv preprint arXiv:
prediction.
1711.08200 (2017).

agation, and state-of-the-art performance on image classi- ral depths. The advantage of TTL is that it captures the short,
fication tasks. In specific, (i) we modify 2D DenseNet mid, and long term dynamics, that embody important in-
by replacing the 2D kernels by 3D kernels in the standard formation not captured when working with some fixed tem-
DenseNet architecture and we present it as DenseNet3D; poral depth homogeneously throughout the network. The
Also very popular for Medical Imaging (MRI, CT scans ...)
and (ii) introducing our new Temporal 3D ConvNets (T3D) feature-map of lth layer is fed as input to the TTL layer,
0
by deploying 3D temporal transition layer (TTL) instead of T T L : x ! x , resulting in a dense-aggregated feature rep-
0
transition layer in DenseNet. In both setups, the building 0 0
resentation x , where x 2 Rh⇥w⇥c and x 2 Rh⇥w⇥c .
blocks of the network and the architecture choices proposed In specific, the feature-map from lth , xl is convolved with
in [17] are kept same. K variable 3D convolution kernel temporal depths, result-
Notation. The output feature-maps of the 3D Convolu- ing to intermediate feature-maps {S1 , S2 , . . . , SK }, S1 2
Sebastian
tions and pooling kernels atRaschka
the l layer extractedSTAT
th
for an 479:RDeep
h⇥w⇥c1 Learningh⇥w⇥c2
, S2 2 R , SK SS
2 R2019
h⇥w⇥cK
, where c1 , 32
h⇥w⇥c
ConvNets and 3D Inputs

Same concept as before except

that we now have 3D
images and kernels
n1 ⇥n2 ⇥cin
X2R
<latexit sha1_base64="GzJ9t8GJxsmHc4EQcbLCmf2kfPA=">AAACIXicbVDLSsNAFJ34rPUVdelmsAiuSlIFuyy6cVnFPqCpYTKdtEMnkzBzI5SQX3Hjr7hxoUh34s84fQjaemDg3HPv5c45QSK4Bsf5tFZW19Y3Ngtbxe2d3b19++CwqeNUUdagsYhVOyCaCS5ZAzgI1k4UI1EgWCsYXk/6rUemNI/lPYwS1o1IX/KQUwJG8u2qFxEYBGHWzrHHJZ6VQXaXP2TSd7EHPGIaS7/yQ6mfcZnnvl1yys4UeJm4c1JCc9R9e+z1YppGTAIVROuO6yTQzYgCTgXLi16qWULokPRZx1BJzLFuNnWY41Oj9HAYK/Mk4Kn6eyMjkdajKDCTEwN6sTcR/+t1UgirXWMoSYFJOjsUpgJDjCdx4R5XjIIYGUKo4uavmA6IIhRMqEUTgrtoeZk0K2X3vFy5vSjVruZxFNAxOkFnyEWXqIZuUB01EEVP6AW9oXfr2Xq1PqzxbHTFmu8coT+wvr4Bx1+j5A==</latexit>

m1 ⇥m2 ⇥cin ⇥cout cout

W2R
<latexit sha1_base64="IvH8iHAHqViLIgIXdealOqNYh9w=">AAACMHicbVBNSwMxEM36bf2qevQSLIKnslsFPYoe9FjFtkJ3XbJpVoNJdklmhbLsT/LiT9GLgiJe/RVm2wq1+iDw5s0Mk/eiVHADrvvqTE3PzM7NLyxWlpZXVteq6xttk2SashZNRKKvImKY4Iq1gINgV6lmREaCdaK7k7LfuWfa8ERdQj9lgSQ3isecErBSWD31JYHbKM47Bfa5wsMyyi+K61yGHvaBS2awDBs/lIY5V8VYlWRQFGG15tbdAfBf4o1IDY3QDKtPfi+hmWQKqCDGdD03hSAnGjgVrKj4mWEpoXfkhnUtVcReC/KB4QLvWKWH40TbpwAP1PGNnEhj+jKyk6UfM9krxf963Qziw8D6SzNgig4PxZnAkOAyPdzjmlEQfUsI1dz+FdNbogkFm3HFhuBNWv5L2o26t1dvnO/Xjo5HcSygLbSNdpGHDtAROkNN1EIUPaBn9IbenUfnxflwPoejU85oZxP9gvP1DSPwqkc=</latexit>
b2R
<latexit sha1_base64="IEwnrR7t13ocT+tV1TkUvtNEGRY=">AAACDHicbVDLSgMxFM3UV62vqks3wVJwVWaqoMuiG5dV7APasWTSTBuaSYYkI5QwH+DGX3HjQhG3foA7/8ZMOwttPRA4Ofde7rkniBlV2nW/ncLK6tr6RnGztLW9s7tX3j9oK5FITFpYMCG7AVKEUU5ammpGurEkKAoY6QSTq6zeeSBSUcHv9DQmfoRGnIYUI22lQbnSj5AeB6EJUtinHM6/gblN7w0eGJHoNLVdbs2dAS4TLycVkKM5KH/1hwInEeEaM6RUz3Nj7RskNcWMpKV+okiM8ASNSM9SjiKifDM7JoVVqwxhKKR9XMOZ+nvCoEipaWT9VjOvarGWif/VeokOL3xDeZxowvF8UZgwqAXMkoFDKgnWbGoJwpJarxCPkURY2/xKNgRv8eRl0q7XvNNa/eas0rjM4yiCI3AMToAHzkEDXIMmaAEMHsEzeAVvzpPz4rw7H/PWgpPPHII/cD5/ANp9nCQ=</latexit>

Sebastian Raschka STAT 479: Deep Learning SS 2019 33

ConvNets for Text with 1D Convolutions

We can think of text as image with width 1

(concatenated
word embeddings)
This Is my great sentence

https://pytorch.org/docs/stable/nn.html#conv1d

Sebastian Raschka STAT 479: Deep Learning SS 2019 36

CNNs for Text (with 2D Convolutions)

Good results have also been achieved by representing a sentence

as a matrix of word vectors and applying 2D convolutions
(where each filter uses a diﬀerent kernel size)

wait
for
the
video
and
do
n't
rent
it

n x k representation of Convolutional layer with Max-over-time Fully connected layer

sentence with static and multiple filter widths and pooling with dropout and
non-static channels feature maps softmax output

Figure 1: Model architecture with two channels for an example sentence.

necessary)
Kim, is represented
Y. (2014). as neural networks for sentence
Convolutional thatclassification.
is kept static arXiv
throughout
preprinttraining and one that
arXiv:1408.5882.
is fine-tuned via backpropagation (section 3.2).2
x1:n =Sebastian
x1 x2 Raschka
. . . xn , (1)479:InDeep
STAT the multichannel
Learning architecture,
SS 2019 illustrated in fig- !38
Pre-Trained Models for Text

https://modelzoo.co/model/pytorch-nlp

Sebastian Raschka STAT 479: Deep Learning SS 2019 45

Lecture 14

Introduction to
Recurrent Neural Networks
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Lecture Slides:
https://github.com/rasbt/stat453-deep-learning-ss20/tree/master/L14-rnns

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
A Classic Approach for Text Classification:
Bag-of-Words Model

vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
weet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>

'two': 7, ⇤ ⇥
'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

training
class labels

Classifier
Ex.: https://github.com/rasbt/python-machine-learning-book-3rd-
edition/tree/master/ch08 (e.g., logistic regression, MLP, ...)
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
1D CNNs for text (and other sequence data)

T
h
e

s
u
n

i
s ...
s
h
i
n
i
n
g

.
.
.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Lecture Overview

RNNs and Sequence Modeling Tasks

Backpropagation Through Time
Long-short term memory (LSTM)
Many-to-one Word RNNs
Generating Text with Character RNNs
Attention Mechanisms and Transformers

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 4
Sequential data is not i.i.d.

Figure: Sebastian Raschka, Vahid Mirjalili. Python

Machine Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
www.nature.com/scientificreports/

Applications:
Working with Sequential Data

• Text classification
• Speech recognition (acoustic modeling)
• language translation
• ...

Stock market predictions

Figure
Shen, 5. TheWenzheng
Zhen, basic architectural structure
Bao, and of our Huang.
De-Shuang model KEGRU. (1) We first
"Recurrent
consists of a number
Neural Network for of k-mer sequence
Predicting built byFactor
Transcription splitting DNA sequence.
Binding Sites." (2) B
first step, we
Scientific use the8,pre-trained
reports model
no. 1 (2018): word2vec to learn the k-mer embeddi
15270.
stacked into the embedding matrix that will be used to initialize the embeddin
DNA or (amino acid/protein)
GRU network to solve long-range dependencies problem and to learn feature
sequence. (4) The prediction results were generated by the dense layer and the
loss function to compare the prediction results with the true target labels.

Fig 8. Displays the actual data and the predicted data from the four models for each stock index in
sequence modeling
Year 1 from 2010.10.01 to 2011.09.30.
https://doi.org/10.1371/journal.pone.0180944.g008
Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using
stacked autoencoders and long-short term memory." PloS one 12, no. 7 (2017): e0180944.

one.0180944 July 14, 2017 16 / 24

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Overview time step t

Networks we used
previously: also called Recurrent Neural
feedforward neural Network (RNN)
networks

Figure: Sebastian Raschka, Vahid Mirjalili. Python

Machine Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019

Recurrent edge

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Overview

Figure: Sebastian Raschka, Vahid Mirjalili. Python

Machine Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Diﬀerent Types of Sequence Modeling Tasks

Figure: Sebastian Raschka, Vahid Mirjalili. Python

Machine Learning. 3rd Edition. Birmingham, UK: Packt
Publishing, 2019

Figure based on:

The Unreasonable Eﬀectiveness of Recurrent Neural Networks by Andrej Karpathy (http://karpathy.github.io/2015/05/21/rnn-eﬀectiveness/)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Diﬀerent Types of Sequence Modeling Tasks

Many-to-one: The input data is a sequence, but the output is a fixed-size

vector, not a sequence.

Ex.: sentiment analysis, the input is some text, and the output is a class
label.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
https://www.kdnuggets.com/images/sentiment-fig-1-689.jpg
Diﬀerent Types of Sequence Modeling Tasks

One-to-many: Input data is in a standard format (not a sequence), the

output is a sequence.

Ex.: Image captioning, where the input is an image, the output is a text
description of that image

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
https://towardsdatascience.com/image-captioning-in-deep-learning-9cd23fb4d8d2
Diﬀerent Types of Sequence Modeling Tasks
Many-to-many: Both inputs and outputs are sequences. Can be direct
or delayed.

Ex.: Video-captioning, i.e., describing a sequence of images via text

(direct).

Translating one language into another (delayed)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
https://static-01.hindawi.com/articles/mpe/volume-2018/3125879/figures/3125879.fig.001.svgz

Intro to CNNs for Students
No ratings yet
Intro to CNNs for Students
56 pages
Intro to CNNs for Deep Learning Students
No ratings yet
Intro to CNNs for Deep Learning Students
65 pages
Intro to Deep Learning in Python
No ratings yet
Intro to Deep Learning in Python
67 pages
Deep Learning Insights for Students
No ratings yet
Deep Learning Insights for Students
12 pages
L18 Gan Slides
No ratings yet
L18 Gan Slides
33 pages
Dlincv 161110052148 PDF
No ratings yet
Dlincv 161110052148 PDF
271 pages
Vbook - Pub Deep Learning For Computer Visionpdf
No ratings yet
Vbook - Pub Deep Learning For Computer Visionpdf
24 pages
Deep Learning For Computer Vision PDF
No ratings yet
Deep Learning For Computer Vision PDF
24 pages
Deep Learning For Computer Vision PDF
7% (14)
Deep Learning For Computer Vision PDF
24 pages
80879v00 Deep Learning Ebook
No ratings yet
80879v00 Deep Learning Ebook
15 pages
Class Generative Models
No ratings yet
Class Generative Models
54 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
100 pages
Introducing Deep Learning With MATLAB
No ratings yet
Introducing Deep Learning With MATLAB
15 pages
Lecture 13
No ratings yet
Lecture 13
57 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Building Domain Enriched Deep Learning Algorithms
No ratings yet
Building Domain Enriched Deep Learning Algorithms
35 pages
Deep Learning Course Intro 2020
No ratings yet
Deep Learning Course Intro 2020
77 pages
CERN Deep Learning and Vision
No ratings yet
CERN Deep Learning and Vision
72 pages
MA - Koelbl Memoire CNN
No ratings yet
MA - Koelbl Memoire CNN
79 pages
Lecture06 - Copie
No ratings yet
Lecture06 - Copie
52 pages
Deep Learning Applications
No ratings yet
Deep Learning Applications
309 pages
CNNs & Deep Q Learning Lecture
No ratings yet
CNNs & Deep Q Learning Lecture
68 pages
Mimo DLW23
No ratings yet
Mimo DLW23
103 pages
L09 MLP Slides
No ratings yet
L09 MLP Slides
60 pages
Deep Learning Resources Guide
No ratings yet
Deep Learning Resources Guide
5 pages
Introduction To Deep Learning 17th January 2025
No ratings yet
Introduction To Deep Learning 17th January 2025
60 pages
Lec 1 Intro
No ratings yet
Lec 1 Intro
54 pages
From Scratch or Pretrained An in Depth Analysis of Deep Learning Approaches With Limited Data
No ratings yet
From Scratch or Pretrained An in Depth Analysis of Deep Learning Approaches With Limited Data
10 pages
AML - Lecture - 11 - 19nov24
No ratings yet
AML - Lecture - 11 - 19nov24
103 pages
StatPred Deep Learning Winter 2020 Handout
No ratings yet
StatPred Deep Learning Winter 2020 Handout
17 pages
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
No ratings yet
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
9 pages
L12 Optim Slides
No ratings yet
L12 Optim Slides
55 pages
2019 6S191 L6 PDF
No ratings yet
2019 6S191 L6 PDF
61 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
Deep Learning for Vision Experts
No ratings yet
Deep Learning for Vision Experts
91 pages
DL1 Ver1
No ratings yet
DL1 Ver1
49 pages
LBDL
No ratings yet
LBDL
185 pages
L15 Intro-Rnn Slides
No ratings yet
L15 Intro-Rnn Slides
50 pages
A Review On Basic Deep Learning
No ratings yet
A Review On Basic Deep Learning
9 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
Deep Learning Made Easy With R
100% (4)
Deep Learning Made Easy With R
251 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
Tesis 7
No ratings yet
Tesis 7
76 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
50 pages
01 02 Intro
No ratings yet
01 02 Intro
11 pages
Intro4 ANN Deep CNN PDF
No ratings yet
Intro4 ANN Deep CNN PDF
20 pages
7 CNN
No ratings yet
7 CNN
66 pages
Image Processing with CNNs Overview
No ratings yet
Image Processing with CNNs Overview
63 pages
CSE Deep Learning Seminar Report
No ratings yet
CSE Deep Learning Seminar Report
4 pages
DNN/CNN Toolbox Overview
No ratings yet
DNN/CNN Toolbox Overview
52 pages
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
No ratings yet
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
13 pages
8 Deep Learning CNN
No ratings yet
8 Deep Learning CNN
63 pages
Spelling Connections Grade 4 Homework
50% (2)
Spelling Connections Grade 4 Homework
7 pages
PTP 800 Order Guide Global
No ratings yet
PTP 800 Order Guide Global
29 pages
Writeup1 Drought Prediction
No ratings yet
Writeup1 Drought Prediction
6 pages
Learn Lua in 15 Minutes
No ratings yet
Learn Lua in 15 Minutes
10 pages
E-Marketing Strategies and SEO Insights
No ratings yet
E-Marketing Strategies and SEO Insights
70 pages
Lei Ilima Girls Club Project: Calendar of Events
No ratings yet
Lei Ilima Girls Club Project: Calendar of Events
4 pages
Ec30 230SX2
No ratings yet
Ec30 230SX2
2 pages
Sds Tuball Coat e H2o DBD Beta Eu Eng v3-6
No ratings yet
Sds Tuball Coat e H2o DBD Beta Eu Eng v3-6
12 pages
5511545-0014 - 202505 - Cargo Sales Report
No ratings yet
5511545-0014 - 202505 - Cargo Sales Report
13 pages
Workflow Structure in The Food and Beverage Service
89% (9)
Workflow Structure in The Food and Beverage Service
56 pages
Lighting Solutions Quotation
No ratings yet
Lighting Solutions Quotation
3 pages
Lis Vol 1 (No Key)
No ratings yet
Lis Vol 1 (No Key)
59 pages
CLIMATÉRIO Terapia Não Hormonal NAMS 2023
No ratings yet
CLIMATÉRIO Terapia Não Hormonal NAMS 2023
18 pages
Deutz Engine Repair Manual TCD 3 6 l4
97% (75)
Deutz Engine Repair Manual TCD 3 6 l4
20 pages
Clinical Development: Document Type: Abbreviated Clinical Study Report Development Phase: IV
No ratings yet
Clinical Development: Document Type: Abbreviated Clinical Study Report Development Phase: IV
26 pages
Advanced Calculus and Complex Analysis Sem
No ratings yet
Advanced Calculus and Complex Analysis Sem
4 pages
Concept Paper-Medtech
No ratings yet
Concept Paper-Medtech
7 pages
Pharo Project Roadmap Overview
No ratings yet
Pharo Project Roadmap Overview
39 pages
Plans List 2nd September 2013
No ratings yet
Plans List 2nd September 2013
5 pages
Cisco CCNA 640-802 Training Modules
No ratings yet
Cisco CCNA 640-802 Training Modules
6 pages
CoinGecko 2025 Q1 Crypto Industry Report
No ratings yet
CoinGecko 2025 Q1 Crypto Industry Report
50 pages
Certificate for Mashar Traders Ltd
No ratings yet
Certificate for Mashar Traders Ltd
10 pages
Gaurav Pandey
No ratings yet
Gaurav Pandey
2 pages
Gyprock Installation Guide 2
No ratings yet
Gyprock Installation Guide 2
2 pages
Smart Transportation System Using IOT
No ratings yet
Smart Transportation System Using IOT
6 pages
Mumbai
No ratings yet
Mumbai
214 pages
Lim - Research Paper 1
No ratings yet
Lim - Research Paper 1
4 pages
Class Certification Granted
No ratings yet
Class Certification Granted
28 pages
Form 16: TDS Certificate for Employees
No ratings yet
Form 16: TDS Certificate for Employees
7 pages
Chapter 02
No ratings yet
Chapter 02
25 pages