0% found this document useful (0 votes)

16 views

V02 SS24 DLforCV NN Basics Teil1

Uploaded by

junrunchen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

V02 SS24 DLforCV NN Basics Teil1

Uploaded by

junrunchen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Deep Learning For Computer Vision

Vorlesung SS 2024
Prof. Dr.-Ing. Rainer Stiefelhagen, Dr. Saquib Sarfraz, Dr.-Ing. Alina Roitberg
Computer Vision for HCI Lab – cv:hci, Institut für Anthropomatik & Robotik
ACCESS@KIT – Zentrum für digitale Barrierefreiheit und Assistive Technologien
Institut für Anthropomatik und Robotik, Fakultät für Informatik

KIT – Universität des Landes Baden-Württemberg und

nationales Forschungszentrum in der Helmholtz-Gemeinschaft www.kit.edu
Lecture 2
NEURAL NETWORK BASICS

3 VL - Computer Vision for Human-Computer Interaction Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Next lecture: live demo

This lecture: neural network basics

Next lecture:
Generalization, Overfitting and How to Approach it.
Live demo: training neural networks in Pytorch
Homework: watch the Pytorch introduction video
https://youtu.be/I1WcY1gX8PM

4 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Lecture Outline

Motivation: end-to-end learning vs. handcrafted features

Single-Layer Perceptron and Deep Neural Networks
Forward-pass
Backpropagation

Image source: https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-

9689331ba092

5 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Lecture Outline

Optimization Strategies:
Batch and Stochastic Gradient Descent
Momentum and Nesterov Accelerated Gradient
Adaptive Learning Rate
Activation Functions
Loss Functions

Image source: http://cs231n.stanford.edu/slides/2016/winter1516_lecture3.pdf

6 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Traditional computer vision

Features are not learned!

Feature Learning
Input data
representation algorithm
(pixels)
(hand-crafted) (e.g., SVM)

Stapler

7 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Popular computer vision features

SIFT

HoG

Gabor filters
and many others…
SURF, LBP, color histograms, GLOH

8 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Learning feature hierarchy (1)

Learn a hierarchy from end-to-end

From image pixels to classifier output

Image / Video Simple

Layer 1 Layer 2 Layer 3
pixels Classifier

Train all layers jointly!

9 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Learning feature hierarchy (2)

Learn a hierarchical representation of data

10 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Feature hierarchy
3rd layer
Fill in representation
“Objects”
gap in recognition

2nd layer
“Object parts”

1st layer
“Edges”

Input layer
Lee et al., Unsupervised learning of hierarchical Pixels
representations with convolutional deep belief
networks. CACM 2011.

11 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Taxonomy of feature learning
Supervised

• Logistic Regression • Supervised Deep Neural Net

• Support Vector Machines • Supervised Convolutional Neural Net
• Perceptron • Supervised Recurrent Neural Net
• Semi-supervised deep learning

Shallow Deep
• Stacked De-noising Autoencoder
• Sparse coding
• Deep Belief Nets
• Denoising Autoencoder
• Deep Boltzmann machines
• Restricted Boltzmann machine
• Hierarchical Sparse coding
• Unsupervised deep learning
• Self-supervised deep learning

Unsupervised
12 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Train/Test/Validation Splits in Machine Learning

Training set
A training data set is a data set of examples used during the learning
process and is used to fit the parameters (e.g., weights) of, for example,
a classifier.
For supervised deep learning, annotations for the samples leveraged for
training are used in loss calculation for supervision.
For self-supervised deep learning, no annotations could be
leveraged during training.
Test set
A test data set is a data set that is independent of the training data set,
but that follows the similar probability distribution as the training data set.
Validation set
A validation data set is a data-set of examples used to tune
the hyperparameters (i.e. the architecture) of a classifier. It is sometimes
also called the development set or the "dev set"

13 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Single-Layer Perceptron (1)

Inspired by information processing in biological nervous

systems
Output of a neuron:
spike train

Spike train in electrosensory pyramidal

neuron in fish (Eigenmannia)

Image source: physics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdf

14 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Single-Layer Perceptron (2)

Inspired by information processing in biological nervous

systems
input process output input process output

data
x1 w1 weights activation
w2
x2
.  f
y
.
.
wn
xn bias
b

Image source: physics.gu.se/~frtbm/joomla/media/mydocs/NeuralNetworks.pdf

15 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Single-Layer Perceptron (3)

A very old model (McCulloch-Pitts neuron 1943)

Signal processing: weighted sum of inputs with activation
function 𝑓

weights
x1 w1 activation
w2
x2  y
. f
.
. wn
xn bias
b
16 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Single-Layer Perceptron (4)

Non-linear activation function 𝑓 (e.g. binary step, Sigmoid,

ReLU)
Without activation function (𝑓 𝑥 = 𝑥) the network is a
Linear Regression Model
weights
x1 w1 activation
w2
x2  y
. f
.
. wn
xn bias
b
17 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Multi-Layer Perceptron (1)

Combining layers lets us represent non-linear functions

A single-layer perceptron cannot implement simple
functions such as NOT or XOR
(Explanation and proof:
computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html)
input layer output layer AND OR
hidden layer 1 0 1 1 1
o6 1
x
vv11
y1
0 0 0 0 0 1
x
v2 o
v27
y2 0 1 0 1
x3 o
v38 XOR
Bias 1 1 0
b Not linearly
0 0 1 separable!
0 1
18 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Multi-Layer Perceptron (2)

Combining layers lets us represent non-linear functions

Forward propagation: 𝑦 = 𝑓(𝑊𝑥 + 𝑏)
In the following slides, we will use the ReLU activation
function: 𝑓 𝑥 = max(0, 𝑥)
input layer output layer
hidden layer
x
vv11 o6
y1
x
v2 o
v27
y2
x3 o
v38
Bias
b

Input computation for each node

19 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Alternative graphical representations

20 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Neural Networks

A simple two-hidden layer network

𝒙 – input layer (pixels)

𝒉1 – first layer hidden units
𝒉2 – second layer hidden units
𝒐 – output layer

21 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Forward propagation (1)

Compute network output given input

𝒙 ∈ ℝ𝐷 , 𝑊 1 ∈ ℝ𝑁1 ×𝐷 , 𝒃1 ∈ ℝ𝑁1 , 𝒉1 ∈ ℝ𝑁1

𝒉1 = max 0, 𝑊 1 𝒙 + 𝒃1

Non-linearity 𝑢 = max 0, 𝑣 ,
Rectified Linear Unit (more later)

22 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Forward propagation (2)

Compute network output given input

𝒉2 = max 0, 𝑊 2 𝒉1 + 𝒃2

𝒐 = max 0, 𝑊 3 𝒉2 + 𝒃2

23 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Alternative graphical representations

24 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Reasoning (1)

Q: Why can’t the mapping between layers be linear?

A: Compositions of linear functions is linear, whole network collapses to
regression.

Q: What does a hidden unit do?

A: Can be thought as a classifier or feature computer.

Q: How many layers? How many hidden units?

A: Hyper-parameter setting best done using cross-validation. In general
wider and deeper networks allow for complicated “function” mappings.

25 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Reasoning (2)

Q: Why do we need many layers?

A: Data with hierarchical structure is well exploited with a hierarchical
model architecture where intermediate features can be re-used.

[1 0 1 0 0 0 0 1 1 0 0 … ] truck

[0 1 0 1 0 0 0 1 0 1 1 … ] motorbike

26 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Evolution of Network depth

Why are Neural Networks becoming deeper?

Wide Neural Networks memorize the data and are prone to overfitting
Deep networks learn features at various levels of abstraction and
generalize the data

Image: medium.com/@Lidinwise/the-revolution-of-depth-facf174924f5

27 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
What is a good network for classification?

x h1 h2 o z

Softmax
max(0, W1 x) max(0, W2 h1) max(0, W3 h2)

1 k c Loss
y = [0 0 ... 0 1 0 ... 0 0]

The network output z matches expected output y (ground truth / label)

• In this case y is a one-hot-encoded vector

• 1 at correct class index, 0 everywhere else

For classification, a good model has

predicted class = ground truth k – index of correct class
c – number of all classes

28 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Softmax Layer with 3 classes
o z
Network θ 1 0.23

Softmax
x h1 h2
max(0, W1 x) max(0, W2 h1) max(0, W3 h2)
2 0.63

0.5 0.14 Loss

y = [0 0 1]

Network θ (weights and biases)

o: Output of last linear layer, also called logits
z: Softmax output, class "probabilities"
y: Ground-truth label

29 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Softmax Layer - Equations
o z
Network θ 1 0.23

Softmax
x h1 h2
max(0, W1 x) max(0, W2 h1) max(0, W3 h2)
2 0.63

0.5 0.14 Loss

y = [0 0 1]

Softmax: probability that x belongs to class ck

Used in most classification networks
argmax(z) to get predicted class index
zk =

30 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Image Classifier (in the next lectures...)
o z
x: image
CNN θ 1 0.23 P("human")

Softmax
2 0.63 P("dog")

AlexNet 0.5 P("cat")

0.14
Loss
y = [0 0 1]

Softmax: probability that x belongs to class ck

Used in most classification networks
argmax(z) to get predicted class index
zk =

31 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss modeling

Probability of class 𝑘 given input image 𝒙 (soft-max)

𝑒 𝑜𝑘
𝑝 𝑐𝑘 = 1 𝒙 =
σ𝑗 𝑒 𝑜𝑗

Loss: negative log-likelihood (per sample 𝒙)

𝐿 𝒙, 𝑦; 𝜽 = − ෍ 𝑦𝑗 log 𝑝 𝑐𝑗 𝒙
𝑗

A good model is
𝜽∗ = arg min ෍ 𝐿(𝒙𝑛 , 𝑦 𝑛 ; 𝜽)
𝜽
𝑛

32 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Training

Model θ: Wn and bn for every layer n

Calculate L(x, y; θ)
𝜕𝐿
n n
Update all W and b based on loss 𝜽← 𝜽−𝜂
gradient
𝜕𝜽
𝜕𝐿
𝜽←𝜽−𝜂
𝜕𝜽

How to compute gradients? Backpropagation!

33 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (1)

x h1 h2 o z

Softmax
max(0, W1 x) max(0, W2 h1) max(0, W3 h2)

1 k c Loss
y = [0 0 ... 0 1 0 ... 0 0]

Computational graph a d
x h1 h2 o z L c
W1 W2 W3 y • d is a function of a and c

b1 b2 b3 • and computed directly

• How to compute and ?

34 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (2) How to compute and ?

Forward (black arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

35 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (3) How to compute and ?

Forward (black arrows): Backward with chain rule (red arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

36 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (4) How to compute and ?

Forward (black arrows): Backward with chain rule (red arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

37 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (5) How to compute and ?

Forward (black arrows): Backward with chain rule (red arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

38 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (6) How to compute and ?

Forward (black arrows): Backward with chain rule (red arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

39 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Backpropagation (7) How to compute and ?

Forward (black arrows): Backward with chain rule (red arrows):

zk =

Computational graph

x h1 h2 o z L
W1 W2 W3 y Softmax derivative:

b1 b2 b3
Softmax derivative: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1

40 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Creating non-linear functions

1 input & 1 output

100 hidden units / layer

41 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Optimization with Gradient Descent

Modify the network weight 𝜽 based on gradient of the loss

function:

𝜕𝐿 Increase Decrease
𝜽←𝜽−𝜂 weight here weight here
𝜕𝜽

Do nothing
here

42 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Stochastic Gradient Descent

Also called mini-batch Gradient Descent

Approximate sum with a mini-batch of examples (e.g. 32, 64…)

Allows “incremental” training in batches

Mini-Batch GD will almost certainly converge to the global
minimum for convex error surfaces and to a local minimum for
non-convex surfaces

towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3
43 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Challenges of Gradient Descent

Choosing the right learning rate

𝜕𝐿
Weight update: 𝜽 ← 𝜽 − 𝜂
𝜕𝜽
Learning rate 𝜂 is a crucial hyperparameter
If 𝜂 is too large, the loss will fluctuate around the minimum or, in worst
case, diverge
If 𝜂 is too small, it will converge very slowly

https://medium.com/@hiromi_suenaga/deep-learning-2-part-1-lesson-1-602f73869197
towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-
d0d4059c1c10
44 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Challenges of Gradient Descent

Non convex loss functions

GD will certainly (Batch GD) or almost certainly (Mini-Batch GD)
converge to the global minimum for convex error surfaces and to a
local minimum for non-convex surfaces

Most error functions are highly

non-convex
Numerous suboptimal local
minima are a big problem!

Image source: www.kdnuggets.com/2016/06/visual-explanation-backpropagation-algorithm-neural-

networks.html

45 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
„Classic“ or Batch Gradient Descent

Given a model with parameters 𝜽

A training dataset with 𝑛 examples and
𝐿𝑖 depicting the loss for the i-th example.
In Batch Gradient Descent, we try to minimize 𝐿 𝜃 = σ𝑛𝑖=0 𝐿𝑖 (𝜃)
In other words, we need build a full sum over all examples to update 𝜽

Batch GD is guaranteed to converge to the global minimum for

convex error surfaces and to a local minimum for non-convex
surfaces
However: full sum is expensive when N is large! (→ memory size!)
towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3
46 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Momentum

Momentum hyperparameter μ (usually 0.9)

Takes the gradient from previous steps into account:
𝜕𝐿(𝜽 )
Δt = μ Δt−1 − 𝜂
𝜕𝜽

𝜽 ← 𝜽 + Δt

Accelerates if the gradients changes in the same direction (→faster

convergence) and reduces the updates if the gradient changes
direction (→ less fluctuations)
cs231n.github.io/neural-networks-3/
Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International
conference on machine learning. 2013.
47 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Annealing the learning rate
■ Graudally decreasing learning rate helps training

Example: learning rate decrease in Resnet model training

The starting learning rate is 0.1 and is reduced to 0.01 at 80 epochs
and then to 0.001 at 160 epochs
The effect of first reduction has a clear improvement, the effect of the
second reduction is small

48 cs231n.github.io/neural-networks-3/ Maschinensehen für MMI (Prof. Stiefelhagen)

Example source: https://github.com/gcr/torch-residual-networks Institut für Anthropomatik und Robotik
Nesterov accelerated gradient

■ Modification of the Momentum update

■ Instead of computing gradient at the current position (red dot), calculate
the gradient at the future approximate position (green arrow head) and
then update

𝜕𝐿(𝜽 + μ Δt−1 )
Δt = μ Δt−1 − 𝜂
𝜕(𝜽 + μ Δt−1 )
𝜽 ← 𝜽 + Δt
cs231n.github.io/neural-networks-3/

49 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Adagrad

Adapting the learning rates depending on weights

Each weight 𝜽𝒊 are modified with different learning rate, depending
on the past gradients
Weights with high gradients: learning rate reduced
Weights with small gradient: learning rate increased

𝑮𝒕,𝒊𝒊 is a diagonal matrix where element 𝒊, 𝒊 is the sum of the squares

of the gradients of the corresponding weight 𝜽𝒊 up to time step 𝒕

cs231n.github.io/neural-networks-3/

50 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Further optimization algorithms with adaptive
learning rate

Adadelta
Extension of Adagrad with less aggressive learning rate decay
Adadelta restricts the window of accumulated past gradients to some
fixed size 𝒘

Adam
Makes use of the average of the second moments of the gradients
Similar to Adadelta with Momentum
Currently, one of the popular optimization algorithms

51 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Demo: different GD Optimization Strategies

medium.com/datathings/neural-networks-and-backpropagation-explained-in-a-simple-way-f540a3611f5e

52 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

Non-linear, should be differentiable functions (since training with

Backpropagation)
“Classic” function: Sigmoid, Tanh
Modern functions: ReLU, Leaky ReLU, Maxout and many more…

Image source: https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-

9689331ba092

53 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

Sigmoid function
Formulation:
Large negative numbers become 0 and
large positive numbers become 1
Common in the past, rarely used today

Drawbacks:
Vanishing gradients: functions gradient at either tail of 1 or 0 is
almost zero
Outputs are not zero-centered, which is undesirable since the
input data is often not zero-centered. If the input is always
positive, the weight gradients will become either all positive or
negative resulting in zig-zagging dynamics.
http://cs231n.github.io/neural-networks-1

54 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

Tanh function
Formulation:
large negative numbers become -1 and
large positive numbers become 1
Similar to Sigmoid
In contrast to Sigmoid:
Zero-centered

Drawbacks:
Vanishing gradients (see Sigmoid
function)

http://cs231n.github.io/neural-networks-1

55 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

Rectified Linear Unit (ReLU)

Formulation:
Simple, inexpensive operation (only
comparison, addition and multiplication)
Efficient gradient propagation: No
vanishing gradient

Drawback: Dying ReLU problem

Large gradient flow could cause the weights to update in such a
way that the neuron will never activate again
If this happens, the gradient flowing through the unit will forever
be zero from that point on.

http://cs231n.github.io/neural-networks-1

56 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

Rectified Linear Unit (ReLU)

greatly accelerates (e.g. a factor of 6 in Krizhevsky et al.) the
convergence of stochastic gradient descent compared to the
sigmoid/tanh functions

Figure from Krizhevsky et al. paper

(2012)
6x improvement in convergence
with the ReLU unit compared to
the tanh unit

http://cs231n.github.io/neural-networks-1
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional
neural networks." Advances in neural information processing systems. 2012.
57 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Activation functions

Leaky ReLU
Formulation: ( 1(): indicator function )
α is a small constant
attempt to fix the “dying ReLU” problem: instead of the function being
zero when x < 0, a leaky ReLU will instead have a small negative
slope

http://cs231n.github.io/neural-networks-1
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
58 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Activation functions

Maxout

Formulation:
introduced recently by Goodfellow et al.
Generalizes ReLU and Leaky ReLU
E.g. ReLU is Maxout with 𝑤1 = 0 and 𝑏1 = 0
Fixes the dying ReLU problem

Drawbacks: doubles the number of parameters

http://cs231n.github.io/neural-networks-1

59 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions

In recent years activation functions appear, many of them are based

on ReLU

Overview paper and table source: Mishkin, Dmytro, Nikolay Sergievskiy, and Jiri Matas. "Systematic
evaluation of CNN advances on the ImageNet." arXiv preprint arXiv:1606.02228 (2016).
60 Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik
Activation functions: practical advice

The choice of the activation function matters!

Example: performance comparison with different Activation
Functions on MNIST

Source: https://towardsdatascience.com/exploring-activation-functions-for-neural-networks-73498da59b02

61 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Activation functions: practical advice

What activation function should I use?

Use ReLU! But: be aware, that „dead“ units are possible if the
learning rate is not well-adjusted
If this concerns you, try Leaky
ReLU or Maxout
Possibly try out Tanh, but expect
it to work worse than ReLU or
Maxout
Never use sigmoid

Image source: towardsdatascience.com/activation-

http://cs231n.github.io/neural-networks-1 functions-neural-networks-1cbd9f8d91d6

62 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss functions

Quantifies what it means to have a “good” model

This definition depends on the task!

Different types of Loss functions for different tasks, such as:

Classification
Regression
Metric Learning
Reinforcement Learning

63 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss functions

Classification

Predicting a discrete class label

We had until now:

Loss: negative log-likelihood (per sample 𝒙)

𝐿 𝒙, 𝑦 = − ෍ 𝑦𝑗 log 𝑝 𝑐𝑗 𝒙
𝑗
Used in various multiclass classification methods for NN training

Hinge Loss: used in Support Vector Machines (SVMs)

𝐿 𝑥, 𝑦 = ෍ 𝑚𝑎𝑥(0,1 − 𝑥𝑖 𝑦𝑖 )
𝑗

64 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss functions

Regression

Predicting one or multiple continuous quantities 𝒚𝟏 … 𝒚𝒏

Minimize the distance between the predicted value 𝒙𝒋 true values 𝒚𝒋
L1-Loss (Mean Average Error):
𝐿 𝒙, 𝑦 = ෍(𝒚𝒋 − 𝒙𝒋 )
𝑗

L2-Loss (Mean Square Error):

𝐿 𝒙, 𝑦 = ෍(𝒚𝒋 − 𝒙𝒋 )𝟐
𝑗

65 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss functions

Metric Learning / Similarity Learning

A model for measuring the distance (or similarity) between objects

Example: Triplet Loss

Input: three images – an Anchor image 𝑥𝑎 , a Positive 𝑥𝑝 (similar) and a
Negative 𝑥𝑛 (dissimilar) example

Find a model, that would produce a representation {𝑥𝑎 , 𝑥𝑛, 𝑥𝑝 }, so that

the distance between 𝑥𝑎 and 𝑥𝑝 is small and the distance between 𝑥𝑎
and 𝑥𝑛 is large

Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face
recognition and clustering." Proceedings of the IEEE conference on computer vision and pattern
recognition. 2015.

66 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Loss functions

Metric Learning / Similarity Learning

Example: Triplet Loss

Loss function:

α is the margin constant (usually set to 1)

67 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Summary

Simple Multi Layer Perceptron Neural Network

Stochastic Gradient Descent – Learning algorithms
Back Propagation
Activation Functions
Loss functions

68 Maschinensehen für MMI (Prof. Stiefelhagen)

Institut für Anthropomatik und Robotik
Reading / Learning

Slides should give a good overview

Learn in teams, explain things to each other

Papers as additional reading / more details

Don‘t have to read all papers, focus on main papers (see
slides and summary at end of the lecture)
Please see the references listed in the slides
Papers can easily be googled

Book recommendation for overview / background

I. Goodfellow, Y. Bengio, A. Courville, “Deep Learning”,
MIT Press – chapters 6-10, fundamentals: Ch. 5

Basics of backprop, chain rule, …: look it up

E.g. very nice video tutorials from 3blue1brown.com
69 VL DL4CV - Summary Maschinensehen für MMI (Prof. Stiefelhagen)
Institut für Anthropomatik und Robotik

Unit 5
No ratings yet
Unit 5
61 pages
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
No ratings yet
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
8 pages
Data Science Cheatsheet
100% (1)
Data Science Cheatsheet
5 pages
Duan-Business Intelligence For Enterprise Systems-A Survey-2012
100% (1)
Duan-Business Intelligence For Enterprise Systems-A Survey-2012
9 pages
V03 SS24 DLforCV NN Basics Teil 2
No ratings yet
V03 SS24 DLforCV NN Basics Teil 2
37 pages
V05 SS24 DL CNNs Lecture2
No ratings yet
V05 SS24 DL CNNs Lecture2
73 pages
V04 SS24 DL CNNs Lecture
No ratings yet
V04 SS24 DL CNNs Lecture
68 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
deep learning u1
No ratings yet
deep learning u1
5 pages
CAPSTONE PROJECT
No ratings yet
CAPSTONE PROJECT
7 pages
Problem Sheet For Soft Computing AI and NN Lab
No ratings yet
Problem Sheet For Soft Computing AI and NN Lab
6 pages
Unit III
No ratings yet
Unit III
58 pages
Breast Cancer Diagnosis Using Artificial Intelligence Neural Networks
No ratings yet
Breast Cancer Diagnosis Using Artificial Intelligence Neural Networks
13 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
01 Intro
No ratings yet
01 Intro
49 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Deep Neural Network AIML Handout v1.0-1
No ratings yet
Deep Neural Network AIML Handout v1.0-1
8 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Livro 4 - Deep-Learning
No ratings yet
Livro 4 - Deep-Learning
271 pages
ET 287 Unit3 MLP
No ratings yet
ET 287 Unit3 MLP
71 pages
Octave MLP Neural Networks
No ratings yet
Octave MLP Neural Networks
25 pages
DNN Ho
No ratings yet
DNN Ho
8 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Recent Advances in Deep Learning Based Computer Vision
No ratings yet
Recent Advances in Deep Learning Based Computer Vision
6 pages
winter1516_lecture54
No ratings yet
winter1516_lecture54
20 pages
A Little Book of Deep Learning - Francois Fleuret
No ratings yet
A Little Book of Deep Learning - Francois Fleuret
149 pages
Machine Learnig Syllabus
No ratings yet
Machine Learnig Syllabus
3 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
130 pages
Main
No ratings yet
Main
183 pages
CS671
No ratings yet
CS671
2 pages
LBDL
No ratings yet
LBDL
142 pages
AA12_Deep_Learning_2024 (1)
No ratings yet
AA12_Deep_Learning_2024 (1)
30 pages
Farkas Image Classif NN
No ratings yet
Farkas Image Classif NN
32 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
CE0733_Machine Learning and Deep Learning_Compulsory
No ratings yet
CE0733_Machine Learning and Deep Learning_Compulsory
3 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
Lec13 Neural Networks and Deep Learning PDF
No ratings yet
Lec13 Neural Networks and Deep Learning PDF
33 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
04Introduction to Neural Networks
No ratings yet
04Introduction to Neural Networks
62 pages
Neural Networks in Healthcare Lecture 2_021808
No ratings yet
Neural Networks in Healthcare Lecture 2_021808
73 pages
1803 08823 PDF
No ratings yet
1803 08823 PDF
122 pages
Lecture 10 Merged
No ratings yet
Lecture 10 Merged
14 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Deep Learning Lab Course 2017 (Deep Learning Practical)
No ratings yet
Deep Learning Lab Course 2017 (Deep Learning Practical)
49 pages
Section06 DeepLearning
No ratings yet
Section06 DeepLearning
92 pages
DLCV Ch2 Neural Network
No ratings yet
DLCV Ch2 Neural Network
68 pages
2 marks
No ratings yet
2 marks
5 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
TP3 Mi204 Santos Scardellato
No ratings yet
TP3 Mi204 Santos Scardellato
20 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
From Everand
AI Techniques and Tools Through Python. Supervised Learning: Classification Methods, Ensemble Learning and Neural Networks
César Pérez López
No ratings yet
Ddos Detection ANN
No ratings yet
Ddos Detection ANN
9 pages
Machine Learning (Csen 3233)
No ratings yet
Machine Learning (Csen 3233)
4 pages
ANN Manual
No ratings yet
ANN Manual
41 pages
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
No ratings yet
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
14 pages
Lecture-02: PGDDS 202
No ratings yet
Lecture-02: PGDDS 202
15 pages
deep learning
No ratings yet
deep learning
90 pages
MCA-SEM-III-Syllabus Mobile Computing
No ratings yet
MCA-SEM-III-Syllabus Mobile Computing
12 pages
Slides
No ratings yet
Slides
174 pages
Sample Final Q1
No ratings yet
Sample Final Q1
4 pages
Review On ANN Based STLF Models
No ratings yet
Review On ANN Based STLF Models
5 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Chapter 1 Annexe
No ratings yet
Chapter 1 Annexe
17 pages
Max Charge Per Delay
No ratings yet
Max Charge Per Delay
6 pages
Unit 5
No ratings yet
Unit 5
25 pages
Question Bank of Advanced Dbms
No ratings yet
Question Bank of Advanced Dbms
2 pages
Build Neural Network With MS Excel Sample
No ratings yet
Build Neural Network With MS Excel Sample
104 pages
Project Documentation
No ratings yet
Project Documentation
89 pages
Yu 2023 AI Psychology Oxford
No ratings yet
Yu 2023 AI Psychology Oxford
29 pages
Jacob Quantization and Training
No ratings yet
Jacob Quantization and Training
10 pages
ANN Unit 3
No ratings yet
ANN Unit 3
26 pages
10 - Mark - CNN Architecture and Training
No ratings yet
10 - Mark - CNN Architecture and Training
7 pages
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
No ratings yet
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
26 pages
The Following Papers Belong To: WSEAS NNA-FSFS-EC 2001, February 11-15, 2001, Puerto de La Cruz, Tenerife, Spain
No ratings yet
The Following Papers Belong To: WSEAS NNA-FSFS-EC 2001, February 11-15, 2001, Puerto de La Cruz, Tenerife, Spain
228 pages
Exercises695Clas Solution
100% (2)
Exercises695Clas Solution
13 pages
Artificial Neural Networks For Sustainable Development: A Critical Review
No ratings yet
Artificial Neural Networks For Sustainable Development: A Critical Review
17 pages
Best Real Time Model Development of An Oil Well Drilling System
No ratings yet
Best Real Time Model Development of An Oil Well Drilling System
6 pages
Tensorflow: A System For Large-Scale Machine Learning
No ratings yet
Tensorflow: A System For Large-Scale Machine Learning
21 pages
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
No ratings yet
Training Highlights:: Applied Deep Learning For Medical Data Analysis (Mri, Ctscan, Xray)
5 pages