0% found this document useful (0 votes)

36 views53 pages

Multi Layer Perceptron Annotated

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views53 pages

Multi Layer Perceptron Annotated

Uploaded by

Tuğba Can

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

BLG 527E Machine Learning

FALL 2021-2022
Assoc. Prof. Yusuf Yaslan & Assist. Prof. Ayşe Tosun

Multi Layer Perceptron

Lecture Notes from Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press AND
Coursera Introduction to Machine Learning Course by Duke University
Introduction
• Artificial Neural Networks take their inspiration from the brain.
• Our aim is not to understand the brain per se but to build useful
machines.

First neuron drawing by Santiago Ramón y Cajal

Image from: https://blog.ovhcloud.com/what-does-training-neural-networks-mean/ Image from: https://www.quantamagazine.org/why-the-first-drawings-of-neurons-were-defaced-20170928/
Neural Networks
• Networks of processing units (neurons) with connections
(synapses) between them
• Large number of neurons: 1010
• Large connectitivity: 105
• Parallel processing
• Distributed computation/memory
• Robust to noise, failures

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning
Course by Duke University
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning Course
by Duke University
The Seasons of Neural Networks

This slide is adopted from the following course: Coursera Introduction to Machine Learning Course
by Duke University
Perceptron d
y  w j x j  w 0 w T x
j 1

w w 0 , w1 ,...,wd T
x 1, x1 ,..., xd  T

(Rosenblatt, 1962)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9
What a Perceptron Does
• Regression: y=wx+w0
• Classification: y=1(wx+w0>0)

y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y sigmoid o  
1  exp  w T x  
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
Regression:
K Outputs d
y i  w ij x j  w i 0 w Ti x
j 1

y Wx

Classification:

oi w Ti x
expoi
yi 
k expok
choose C i
if y i max y k
k

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
Training
• Online (instances seen one by one) vs batch (whole
sample) learning:
• No need to store the whole sample
• Problem may change in time
• Wear and degradation in system components
• Stochastic gradient-descent: Update after a single
pattern
• Generic update rule (LMS rule):

w ijt  rit  y it x tj
Update LearningFa ctor  DesiredOut put  ActualOutput Input
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
Training a Perceptron: Regression
• Regression (Linear output):

1 t 1 t
E w | x , r  r  y   r  w x 
t t

2
t t 2

2
T t 2
 
w tj  r t  y t x tj

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13
Classification
• Single sigmoid output
y t sigmoid w T xt 
E t w | xt , r t  r t log y t  1  r t log 1  y t 
w tj  r t  y t x tj
• K>2 softmax outputs

exp w Ti xt
t
y  E t w i i | xt , r t   rit log y it
k exp w T t
kx i

w ijt  rit  y it x tj
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
Learning Boolean AND

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15
XOR

• No w0, w1, w2 satisfy:

w0 0
w2  w0 0 (Minsky and Papert, 1969)
w1  w0 0
w1  w 2  w 0 0

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecutre Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
This slide is obtained from Nando Freitas Lecture Notes
Multilayer Perceptrons
H
y i v Ti z  v ih zh  v i 0
h 1

zh sigmoid w Th x 
1
 d

1  exp   j 1w hj x j  w h 0 

(Rumelhart et al., 1986)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 26
Backpropagation
H
y i v z  v ih zh  v i 0
T
i
h 1

zh sigmoid w Th x 
1

1  exp    d
j 1
w hj x j  w h 0 
E E y i zh

w hj y i zh w hj

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 27
1
E W, v | X    r  y 
t t 2
Regression 2 t

v h  r t  y t zht
H
y  v z  v 0
t t
h h t
h 1
Backward
E
Forward w hj  
w hj

zh sigmoid w x  T
h
E y t zht
   t t
t y z h w hj

    r t  y t v h zht 1  zht x tj
t

x   r t  y t v h zht 1  zht x tj
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 29
Regression with Multiple Outputs
yi
1
E W,V | X    ri  y i 
t t 2

2 t i vih
H
y it  v ih zht v i 0
h 1 zh
v ih   rit  y it zht whj
t
xj
  t
w hj     ri  y i v ih  zh 1  zht x tj
t t

t  i 

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 30
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 32
MLP with 2 Hidden Layer

This slide is adopted from Nando Freitas Lecutre Notes

This slide is adopted from Nando Freitas Lecutre Notes
This slide is adopted from Nando Freitas Lecutre Notes
This slide is adopted from Nando Freitas Lecutre Notes
Improving Convergence
• Momentum: At each parameter update, successive Δwt i values may
be so different that large oscillations may occur and slow
convergence. t is the time index that is the epoch number in batch
learning and the iteration number in online learning.

t
E
w it    w it  1
w i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 42
Improving Convergence
Adaptive learning rate: In gradient descent, the learning factor
η determines the magnitude of change to be made in the
parameter. It is generally taken between 0.0 and 1.0, mostly
less than or equal to 0.2. It can be made adaptive for faster
convergence, where it is kept large when learning takes place
and is decreased when learning slows down
  a if E t   E t
 
 b otherwise

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 43
Overfitting/Overtraining
• We know from previous chapters that an overcomplex model
memorizes the noise in the training set and does not generalize to
the validation set.
• Similarly in an MLP, when the number of hidden units is large, the
generalization accuracy deteriorates
Overfitting/Overtraining
Number of weights: H (d+1)+(H+1)K

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 45
Overfitting/Overtraining
• A similar behavior happens when training is continued too long: As
• more training epochs are made, the error on the training set
decreases, but the error on the validation set starts to increase
beyond a certain point.
• Early stopping: Learning should be stopped early to alleviate this
problem of overtraining.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 47
Tuning the Network Size
• To find the optimal network size, the most common approach is to
try many different architectures, train them all on the training set,
and choose the one that generalizes best to the validation set.
• Another approach is to incorporate this structural adaptation into
the learning algorithm.
• In the destructive approach, we start with a large network and
gradually remove units and/or connections that are not necessary
• In the constructive approach, we start with a small network and
gradually add units and/or connections to improve performance.
Tuning the Network Size
• Destructive • Constructive
• Weight decay: • Growing networks

E
w i    w i
w i

E ' E 
2
 i
i
w 2

(Ash, 1989) (Fahlman and Lebiere, 1989)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 54
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 55
Learning Time
• Applications:
• Sequence recognition: Speech recognition
• Sequence reproduction: Time-series prediction
• Sequence association
• Network architectures
• Time-delay networks (Waibel et al., 1989)
• Recurrent networks (Rumelhart et al., 1986)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 57
Recurrent Networks
If the sequences have a small maximum length, then unfolding in
time can be used to convert an arbitrary recurrent network to an
equivalent feedforward network.

The resulting network can be trained with backpropagation with

the additional requirement that all copies of each connection
should remain identical.

The solution is to sum up the different weight changes in time

and change the weight by the average
Unfolding in Time

Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
24 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
28 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
24 pages
1 1
No ratings yet
1 1
21 pages
Lec 10
No ratings yet
Lec 10
49 pages
ML Introduction 1
No ratings yet
ML Introduction 1
19 pages
Unit - 2
No ratings yet
Unit - 2
96 pages
I2ml3e Chap11
No ratings yet
I2ml3e Chap11
38 pages
I2ml2e Chap4 v1 0
No ratings yet
I2ml2e Chap4 v1 0
27 pages
I2ml Chap1 v1 1
No ratings yet
I2ml Chap1 v1 1
19 pages
Regression
No ratings yet
Regression
33 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
30 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
25 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
ML and Its Application
No ratings yet
ML and Its Application
13 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
Unit - 1
No ratings yet
Unit - 1
63 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
ML Supervised Unsupervised Learning Algorithm
No ratings yet
ML Supervised Unsupervised Learning Algorithm
18 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Lecture6 Neural Network Basics v1.1
No ratings yet
Lecture6 Neural Network Basics v1.1
40 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
I2ml Chap1 v1 1
No ratings yet
I2ml Chap1 v1 1
21 pages
Machine Learning
No ratings yet
Machine Learning
18 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
101 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
95 pages
DTS 101 Lecture 5
No ratings yet
DTS 101 Lecture 5
20 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Slide 1 Introduction
No ratings yet
Slide 1 Introduction
33 pages
CENG403 - Week 9a
No ratings yet
CENG403 - Week 9a
50 pages
Topic 6machine Learning
No ratings yet
Topic 6machine Learning
17 pages
Unit 3 Linear Discrimination
No ratings yet
Unit 3 Linear Discrimination
14 pages
Intro to Neural Networks Lecture
No ratings yet
Intro to Neural Networks Lecture
65 pages
AA12 Deep Learning 2024
No ratings yet
AA12 Deep Learning 2024
30 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
Kernel Machines
No ratings yet
Kernel Machines
33 pages
Lecture 2a An Overview of The Main Types of Neural Network Architecture
No ratings yet
Lecture 2a An Overview of The Main Types of Neural Network Architecture
32 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
Deep Learning
100% (2)
Deep Learning
49 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Neural Network
No ratings yet
Neural Network
97 pages
6S191 MIT DeepLearning L1
No ratings yet
6S191 MIT DeepLearning L1
108 pages
Mathematics of Neural Networks: Bart M.N. Smets November 12, 2022
No ratings yet
Mathematics of Neural Networks: Bart M.N. Smets November 12, 2022
80 pages
Unit 1 and Unit 2
No ratings yet
Unit 1 and Unit 2
30 pages
Lecture 23
No ratings yet
Lecture 23
15 pages
Main
No ratings yet
Main
183 pages
IS23A Chuong 7 Hocsau-Deep Learning v1
No ratings yet
IS23A Chuong 7 Hocsau-Deep Learning v1
44 pages
Cours 4
No ratings yet
Cours 4
30 pages
Unit 5
No ratings yet
Unit 5
61 pages
FDL Module1
No ratings yet
FDL Module1
102 pages
Deep Learning for BTech Students
No ratings yet
Deep Learning for BTech Students
78 pages
XOR Problem & Two-Layer Perceptron
No ratings yet
XOR Problem & Two-Layer Perceptron
74 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
Deep Learning - A Gentle Introduction
No ratings yet
Deep Learning - A Gentle Introduction
100 pages
Deep Neural Network (DNN)
100% (1)
Deep Neural Network (DNN)
80 pages
Building A Convolutional Neural Network Using Tensorflow Keras
No ratings yet
Building A Convolutional Neural Network Using Tensorflow Keras
10 pages
Machine Learning Assignment Questions
No ratings yet
Machine Learning Assignment Questions
2 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
51 pages
AIML Lab Programs
No ratings yet
AIML Lab Programs
13 pages
Data Scientist RoadMap
No ratings yet
Data Scientist RoadMap
8 pages
Income (K-Means Clustering On A Sample Data Set)
No ratings yet
Income (K-Means Clustering On A Sample Data Set)
3 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
8 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron Full Chapters Included
No ratings yet
Hands On Machine Learning With Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron Full Chapters Included
65 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
No ratings yet
Urn CH SLSP ZBZ 9781098134181 Ihv PDF
7 pages
Recurrent & Recursive Nets
No ratings yet
Recurrent & Recursive Nets
10 pages
TB - 04 - Superwised Learning
No ratings yet
TB - 04 - Superwised Learning
24 pages
BDA Worksheet 5 Arman
No ratings yet
BDA Worksheet 5 Arman
5 pages
Deep Learning 3
No ratings yet
Deep Learning 3
3 pages
Deep Learning Techniques Guide
No ratings yet
Deep Learning Techniques Guide
4 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
52 pages
PNN
No ratings yet
PNN
13 pages
Multi Layer Perceptron - Neural Network
No ratings yet
Multi Layer Perceptron - Neural Network
3 pages
Group I Discrete Mathematics
No ratings yet
Group I Discrete Mathematics
4 pages
Handwritten Number Guessing
No ratings yet
Handwritten Number Guessing
16 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
73 pages
790 1549 1 PB 1
No ratings yet
790 1549 1 PB 1
9 pages
Soft Computing Lab Manual
100% (1)
Soft Computing Lab Manual
25 pages
Analisis Pola Pembelian Konsumen Pada PT Indoritel Makmur Internasional TBK Menggunakan Metode Algoritma Apriori
No ratings yet
Analisis Pola Pembelian Konsumen Pada PT Indoritel Makmur Internasional TBK Menggunakan Metode Algoritma Apriori
6 pages
Agglomerative Clustering Guide
No ratings yet
Agglomerative Clustering Guide
3 pages
Deep Learning Guide: Installation to MLPs
No ratings yet
Deep Learning Guide: Installation to MLPs
986 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
25 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
L5 Neural Network
No ratings yet
L5 Neural Network
67 pages
Distinguishing Human Generated Text From Chatgpt Generated Text Using Machine Learning
No ratings yet
Distinguishing Human Generated Text From Chatgpt Generated Text Using Machine Learning
6 pages

Multi Layer Perceptron Annotated

Uploaded by

Multi Layer Perceptron Annotated

Uploaded by

BLG 527E Machine Learning

Multi Layer Perceptron

First neuron drawing by Santiago Ramón y Cajal

• No w0, w1, w2 satisfy:

(Rumelhart et al., 1986)

This slide is adopted from Nando Freitas Lecutre Notes

(Ash, 1989) (Fahlman and Lebiere, 1989)

The resulting network can be trained with backpropagation with

The solution is to sum up the different weight changes in time

You might also like