0% found this document useful (0 votes)
19 views

Multi Layer Perceptron Annotated

Uploaded by

Tuğba Can
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Multi Layer Perceptron Annotated

Uploaded by

Tuğba Can
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

BLG 527E Machine Learning

FALL 2021-2022
Assoc. Prof. Yusuf Yaslan & Assist. Prof. Ayşe Tosun

Multi Layer Perceptron

Lecture Notes from Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press AND
Coursera Introduction to Machine Learning Course by Duke University
Introduction
• Artificial Neural Networks take their inspiration from the brain.
• Our aim is not to understand the brain per se but to build useful
machines.

First neuron drawing by Santiago Ramón y Cajal


Image from: https://blog.ovhcloud.com/what-does-training-neural-networks-mean/ Image from: https://www.quantamagazine.org/why-the-first-drawings-of-neurons-were-defaced-20170928/
Neural Networks
• Networks of processing units (neurons) with connections
(synapses) between them
• Large number of neurons: 1010
• Large connectitivity: 105
• Parallel processing
• Distributed computation/memory
• Robust to noise, failures

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning
Course by Duke University
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning Course
by Duke University
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning Course
by Duke University
Perceptron d
y  w j x j  w 0 w T x
j 1

w w 0 , w1 ,...,wd T
x 1, x1 ,..., xd  T

(Rosenblatt, 1962)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
What a Perceptron Does
• Regression: y=wx+w0
• Classification: y=1(wx+w0>0)

y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y sigmoid o  
1  exp  w T x  
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
Regression:
K Outputs d
y i  w ij x j  w i 0 w Ti x
j 1

y Wx

Classification:

oi w Ti x
expoi
yi 
k expok
choose C i
if y i max y k
k

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
Training
• Online (instances seen one by one) vs batch (whole
sample) learning:
• No need to store the whole sample
• Problem may change in time
• Wear and degradation in system components
• Stochastic gradient-descent: Update after a single
pattern
• Generic update rule (LMS rule):

w ijt  rit  y it x tj
Update LearningFa ctor  DesiredOut put  ActualOutput Input
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Training a Perceptron: Regression
• Regression (Linear output):

1 t 1 t
E w | x , r  r  y   r  w x 
t t

2
t t 2

2
T t 2
 
w tj  r t  y t x tj

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Classification
• Single sigmoid output
y t sigmoid w T xt 
E t w | xt , r t  r t log y t  1  r t log 1  y t 
w tj  r t  y t x tj
• K>2 softmax outputs

exp w Ti xt
t
y  E t w i i | xt , r t   rit log y it
k exp w T t
kx i

w ijt  rit  y it x tj
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
Learning Boolean AND

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15
XOR

• No w0, w1, w2 satisfy:


w0 0
w2  w0 0 (Minsky and Papert, 1969)
w1  w0 0
w1  w 2  w 0 0

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecutre Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
Multilayer Perceptrons
H
y i v Ti z  v ih zh  v i 0
h 1

zh sigmoid w Th x 
1
 d

1  exp   j 1w hj x j  w h 0 

(Rumelhart et al., 1986)


Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 26
Backpropagation
H
y i v z  v ih zh  v i 0
T
i
h 1

zh sigmoid w Th x 
1

1  exp    d
j 1
w hj x j  w h 0 
E E y i zh

w hj y i zh w hj

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27
1
E W, v | X    r  y 
t t 2
Regression 2 t

v h  r t  y t zht
H
y  v z  v 0
t t
h h t
h 1
Backward
E
Forward w hj  
w hj

zh sigmoid w x  T
h
E y t zht
   t t
t y z h w hj

    r t  y t v h zht 1  zht x tj
t

x   r t  y t v h zht 1  zht x tj
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
Regression with Multiple Outputs
yi
1
E W,V | X    ri  y i 
t t 2

2 t i vih
H
y it  v ih zht v i 0
h 1 zh
v ih   rit  y it zht whj
t
xj
  t
w hj     ri  y i v ih  zh 1  zht x tj
t t

t  i 

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 32
MLP with 2 Hidden Layer

This slide is adopted from Nando Freitas Lecutre Notes


This slide is adopted from Nando Freitas Lecutre Notes
This slide is adopted from Nando Freitas Lecutre Notes
This slide is adopted from Nando Freitas Lecutre Notes
Improving Convergence
• Momentum: At each parameter update, successive Δwt i values may
be so different that large oscillations may occur and slow
convergence. t is the time index that is the epoch number in batch
learning and the iteration number in online learning.

t
E
w it    w it  1
w i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 42
Improving Convergence
Adaptive learning rate: In gradient descent, the learning factor
η determines the magnitude of change to be made in the
parameter. It is generally taken between 0.0 and 1.0, mostly
less than or equal to 0.2. It can be made adaptive for faster
convergence, where it is kept large when learning takes place
and is decreased when learning slows down
  a if E t   E t
 
 b otherwise

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 43
Overfitting/Overtraining
• We know from previous chapters that an overcomplex model
memorizes the noise in the training set and does not generalize to
the validation set.
• Similarly in an MLP, when the number of hidden units is large, the
generalization accuracy deteriorates
Overfitting/Overtraining
Number of weights: H (d+1)+(H+1)K

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 45
Overfitting/Overtraining
• A similar behavior happens when training is continued too long: As
• more training epochs are made, the error on the training set
decreases, but the error on the validation set starts to increase
beyond a certain point.
• Early stopping: Learning should be stopped early to alleviate this
problem of overtraining.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 47
Tuning the Network Size
• To find the optimal network size, the most common approach is to
try many different architectures, train them all on the training set,
and choose the one that generalizes best to the validation set.
• Another approach is to incorporate this structural adaptation into
the learning algorithm.
• In the destructive approach, we start with a large network and
gradually remove units and/or connections that are not necessary
• In the constructive approach, we start with a small network and
gradually add units and/or connections to improve performance.
Tuning the Network Size
• Destructive • Constructive
• Weight decay: • Growing networks

E
w i    w i
w i

E ' E 
2
 i
i
w 2

(Ash, 1989) (Fahlman and Lebiere, 1989)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 52
Dimensionality Reduction

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 54
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 55
Learning Time
• Applications:
• Sequence recognition: Speech recognition
• Sequence reproduction: Time-series prediction
• Sequence association
• Network architectures
• Time-delay networks (Waibel et al., 1989)
• Recurrent networks (Rumelhart et al., 1986)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 56
Recurrent Networks

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 57
Recurrent Networks
If the sequences have a small maximum length, then unfolding in
time can be used to convert an arbitrary recurrent network to an
equivalent feedforward network.

The resulting network can be trained with backpropagation with


the additional requirement that all copies of each connection
should remain identical.

The solution is to sum up the different weight changes in time


and change the weight by the average
Unfolding in Time

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 59

You might also like