0% found this document useful (0 votes)

7 views65 pages

Chapter 5 ML

Uploaded by

adamaakif23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views65 pages

Chapter 5 ML

Uploaded by

adamaakif23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 65

Machine Learning:

Neural Networks

Hajar Mousannif
[email protected]
Acknowledgment

This lecture is inspired by Pr. Andrew Ng’s

Machine Learning class on Coursera

2
Why do we need neural networks?
• Say we have a complex supervised learning
classification problem
– We can use logistic regression with many polynomial
terms
– It works well when you have 1-2 features

3
Why do we need neural networks?
• In a housing example with 100 house features, predict whether a
house will be sold in the next 6 months
– Here, if you included all the quadratic terms (second order)
• There are lots of them (x12 ,x1x2, x1x4 ..., x1x100)
• For the case of n = 100, you have about 5000 features
• Number of features grows O(n2)
• This would be computationally expensive to work with as a feature set
• If you include the cubic terms
– e.g. (x12x2, x1x2x3, x1x4x23 etc)
– There are even more features grows O(n3)
– About 170 000 features for n = 100!
• Not a good way to build classifiers when n is large

4
Example: Problems where n is large -
computer vision

Computer vision sees a matrix of pixel intensity values 5

Example: Problems where n is large -
computer vision

To build a car detector

• Build a training set of:
- Not cars
- Cars
• Then test against a car
6
Example: Problems where n is large -
computer vision

7
Example: Problems where n is large -
computer vision
• We need a non-linear hypothesis to separate the
classes
• Feature space:
– If we used 50 x 50 pixels --> 2500 pixels, so n = 2500
– If RGB then 7500
– If 100 x 100 RGB then --> 50 000 000 features
• Too big - way too big
– Logistic regression here is not appropriate for large complex
systems
– Neural networks are much better for a complex nonlinear
hypothesis even when feature space is huge
8
Neurons and the brain
• Neural networks (NNs) were originally motivated
by looking at machines which replicate the brain's
functionality
• Build learning systems that mimic the brain
• Used a lot in the 80s and 90s. Popularity
diminished in late 90s
• Recent major resurgence
– NNs are computationally expensive, so only recently
large scale neural networks became
computationally feasible
9
Neurons and the brain
• Auditory cortex --> takes sound signals
• If you cut the wiring from the ear to the auditory cortex
• Re-route optic nerve to the auditory cortex
• Auditory cortex learns to see
• Brain learns by itself how to learn
 The “one learning algorithm” hypothesis

10
Neurons and the brain

11
Model representation 1
• Three things to notice
– Cell body
– Number of input wires (dendrites)
– Output wire (axon)
• Simple level
– Neuron gets one or more inputs through
dendrites
– Does processing
– Sends output down axon
• Neurons communicate through electric
spikes
– Pulse of electricity via axon to another
neuron 12
Artificial neural network - representation
of a neuron
• In an artificial neural network, a neuron is a logistic unit
– Feed input via input wires
– Logistic unit does computation
– Sends output down output wires
• This is an artificial neuron with a sigmoid (logistic)
activation function

13
Artificial neural network – Model
representation

• Often good to include an x0 input - the bias unit (equal

to 1)
• Ɵ vector may also be called the weights of a model
• Below we have a group of neurons strung together

14
Artificial neural network – Model
representation
• Here, input is x1, x2 and x3
– We could also call input activation on the first layer -
i.e. (a11, a21 and a31 )
• First layer is the input layer
• Final layer is the output layer - produces value computed by a
hypothesis
• Middle layer(s) are called the hidden layers

15
Neural networks - notation
• ai(j) - activation of unit i in layer j
• Ɵ(j) - matrix of parameters controlling the function
mapping from layer j to layer j + 1
• If network has sj units in layer j and sj+1 units in layer j + 1 ,
then Ɵj will be of dimensions [sj+1 X sj + 1]

16
Neural networks – Model representation

The activation value on each hidden unit (e.g. a12 ) is equal to the
sigmoid function applied to the linear combination of inputs
– Three input units
• So Ɵ(1) is the matrix of parameters governing the mapping of the input units to
hidden units
– Ɵ(1) here is a [3 x 4] dimensional matrix
– Three hidden units
• Then Ɵ(2) is the matrix of parameters governing the mapping of the hidden layer
to the output layer
– Ɵ(2) here is a [1 x 4] dimensional matrix (i.e. a row vector)
– One output unit

17
Neural networks – Model representation

• Ɵabc
– a = ranges from 1 to the
number of units in layer c+1
– b = ranges from 0 to the
number of units in layer c
– c is the layer you're moving
FROM
For example Ɵ131 = means
1 - we're mapping to node 1 in layer 2
3 - we're mapping from node 3 in layer 1
1 - we're mapping from layer 1

18
Neural networks - Exercise
Compute the activation values on each layer

19
Neural networks - Solution
Example of network, with the associated calculations :

20
Model Representation II
Objective: carry out the computation efficiently through a
vectorized implementation.

- Some additional terms:

z12 = Ɵ101x0 + Ɵ111x1 + Ɵ121x2 + Ɵ131x3
z22 = Ɵ201x0 + Ɵ211x1 + Ɵ221x2 + Ɵ231x3
z32 = Ɵ301x0 + Ɵ311x1 + Ɵ321x2 + Ɵ331x3

- Activation values become:

a12 = g(z12)
a22 = g(z22)
a32 = g(z32)

21
Model Representation II
• We can vectorize the computation of the
neural network as as follows:
– z2 = Ɵ(1)x
– a2 = g(z(2))
• z2 is a 3x1 vector. a2 is also a 3x1 vector
• g() applies the sigmoid (logistic) function
element wise to each member of the z vector
2

22
Model Representation II
• To make the notation with input layer make
sense;
– a1 = x
• a1 is the activations in the input layer
• Obviously the "activation" for the input layer is just the
input!
– a1 is the vector of inputs
– a2 is the vector of values calculated by the g(z2) function

• We need to calculate a for the final

0
2

hypothesis calculation
23
Model Representation II
This process is called
forward propagation
– Start off with activations
of input unit
• i.e. the x vector as input
– Forward propagate and
calculate the activation
of each layer
sequentially
– This is a vectorized
version of this
implementation
24
Neural networks learning its own features

• Diagram below looks a lot like logistic regression

• Layer 3 is a logistic regression node. The hypothesis output =
g(Ɵ102 a02 + Ɵ112 a12 + Ɵ122 a22 + Ɵ132 a32)
• This is just logistic regression
– The only difference is, instead of input a feature vector, the features are
just values calculated by the hidden layer

25
Neural learning its own features
• The features a12, a22, and a32 are calculated/learned - not
original features
• Mapping from layer 1 to layer 2 (i.e. the calculations which
generate the a2 features) is determined by another set of
parameters - Ɵ1
• So instead of being constrained by the original input
features, a neural network can learn its own features to
feed into logistic regression
• if we compare this to previous logistic regression, you
would have to calculate your own features to define the
best way to classify or describe something
26
Other architectures
• other architectures (topology) are possible:
– More/less nodes per layer
– More layers

27
Practice 1
Compute a1(3)

28
Practice 1 - Solution
Compute a1(3)

29
Practice 2

30
Practice 2 - Solution

31
Practice 3

32
Practice 3 - Solution

33
Neural Network example 1: AND function

• Ɵ101 = -30
• Ɵ111 = 20
• Ɵ121 = 20

34
Neural Network example 1: NOT function

• Ɵ101 = 10
• Ɵ111 = -20
• Negation is achieved by putting a large
negative weight in front of the variable you
want to negative

35
Neural Network example 3: XNOR function

• XNOR is short for NOT XOR, i.e. NOT an exclusive or

• XNOR is :
x1 X2 XNOR
0 0 1
0 1 0
1 0 0
1 1 1

• Can you find a Neural Network representation of the XNOR

function?
• Hint: structure the network so the input which produces a
positive output are:
– AND (i.e. both true)
OR 36
Neural Network example 3: XNOR function

37
Practice 1

38
Practice 1- Solution

39
Practice 2

40
Practice 2- Solution

41
Multiclass classification
• Multiclass classification is when you distiguish
between more than two categories.
• Example: recognizing pedestrian, car, motorbike, or
truck requires building a neural network with four
output units.
• Previously we had written y as an integer {1,2,3,4}
• Now represent y is a vector of four numbers

42
Neural network cost function
• Focus on application of NNs for classification problems
• Training set is {(x1, y1), (x2, y2), (x3, y3) ... (xm, ym)
• L = number of layers in the network
• sl = number of units (not counting bias unit) in layer l

l =4
s1 = 3
s2 = 5
s3 = 5
s4 = 4 43
Cost function for neural networks
• The (regularized) logistic regression cost function is
as follows:

• For neural networks our cost function is a

generalization of the equation above
• Instead of one output we generate k outputs

44
Cost function for neural networks
• We want to find parameters Ɵ which minimize J(Ɵ)

• To do so we can use one of the algorithms already

described such as:
– Gradient descent
– Advanced optimization algorithms
• For this, we need to compute: J(Ɵ) and
45
Remember
• Ɵ(j)
is of dimensions [sj+1 X sj + 1]
– network has sj units in layer j and sj+1 units in layer
j+1
• The partial derivative term is a REAL number (not a
vector or a matrix):

• The partial derivative term is the partial derivative

of a 3-way indexed dataset with respect to a real
number
• How to compute this partial derivative term? 46
Gradient Computation
• We've already described forward propagation
• This is the algorithm which takes your neural network and
the initial input and pushes the input through the network

47
Back propagation Algorithm
• Back propagation takes the output you got from your network, compares
it to the real value (y) and calculates how wrong the parameters were
• Using the calculated error, it back-calculates the error associated with
each unit from the preceding layer
• This goes on until you reach the input layer (where obviously there is no
error)
• These "error" measurements for each unit can be used to calculate
the partial derivatives

48
Back propagation Algorithm
• For each node we calculate (δjl) - this is the error of node j in layer l
• We can first calculate δj4 = aj4 - yj
=[Activation of the unit] - [the actual value observed in the training example]
• Instead of focusing on each node, let's think about this as a
vectorized problem: δ4 = a4 - y
– δ4 is the vector of errors for the 4th layer
– a4 is the vector of activation values for the 4th layer

49
Back propagation Algorithm
• With δ4 calculated, we can determine the error terms for the other
layers:

• If we do the calculus: g'(z(3)) = a(3) . * (1 - a(3))

• So, more easily: δ(3) = (Ɵ(3))T δ(4) . *(a(3) . * (1 - a(3)))
. * is the element wise multiplication between the two vectors
• If we ignore regularization
(and through a very complicated
derivation ), we get:

50
Putting it all together !

When j = 0 we have no regularization term 51

Back propagation intuition
In the example, we will use two features: x1 and x2

52
Back propagation intuition
With our input data present we use forward
propagation

53
Back propagation intuition
• The sigmoid function applied to the z values gives the
activation values.
• Below we show exactly how the z value is calculated for an
example

54
Back propagation intuition
• Back propagation is doing something very similar to forward
propagation, but backwards
• Below we have the cost function if there is a single output (i.e.
binary classification)

• This function cycles over each example, so the cost for one
example really boils down to this:

• We can think about a δ term on a unit as the "error" of cost for

the activation value associated with a unit

55
Back propagation intuition
• So for the output layer, back propagation sets the δ value
as [a - y]
– Difference between activation and actual value
• We then propagate these values backwards

56
Back propagation intuition
Looking at another example to see how we actually calculate
the delta value

57
Practice 1

58
Practice 1- Solution

59
Practice 2

60
Practice 2- Solution

61
Practice 3

62
Practice 3 - Solution

63
How to tune the weights? (Learning)
Implementation in Python

A step-by-Step tutorial:
https://machinelearningmastery.com/
implement-backpropagation-algorithm-
scratch-python/

Neural Network 1704953886
No ratings yet
Neural Network 1704953886
25 pages
Neural Network Representation Overview
No ratings yet
Neural Network Representation Overview
10 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
Slide 7 - Neural Networks
No ratings yet
Slide 7 - Neural Networks
64 pages
CE345 - Lecture #11 - Neural Networks
No ratings yet
CE345 - Lecture #11 - Neural Networks
55 pages
Chapter 5
No ratings yet
Chapter 5
63 pages
Lec 06
No ratings yet
Lec 06
20 pages
Neural Network Oxygen
No ratings yet
Neural Network Oxygen
25 pages
ML - UNIT-1 &2 Notes
No ratings yet
ML - UNIT-1 &2 Notes
84 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
Week2 - Intro To Neural Nets
No ratings yet
Week2 - Intro To Neural Nets
33 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
No ratings yet
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
52 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
1756210939665-Artificial Neural Networks - A Primer
No ratings yet
1756210939665-Artificial Neural Networks - A Primer
7 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
Unit III
No ratings yet
Unit III
29 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Chapter-4 Fundamental of Neural Network
No ratings yet
Chapter-4 Fundamental of Neural Network
26 pages
Refined Chapter 5 UceQEJ
No ratings yet
Refined Chapter 5 UceQEJ
79 pages
Neural Networks: Key Concepts & Applications
No ratings yet
Neural Networks: Key Concepts & Applications
11 pages
Chapter 5 Part I Basics Neural Networks
No ratings yet
Chapter 5 Part I Basics Neural Networks
85 pages
Unit I
No ratings yet
Unit I
90 pages
Basics
No ratings yet
Basics
48 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 5
No ratings yet
Unit 5
59 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
Unit 1
No ratings yet
Unit 1
29 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
Unit 3 - Ann
No ratings yet
Unit 3 - Ann
49 pages
Neural Networks
No ratings yet
Neural Networks
33 pages
Artificial Neural Networks (Anns) : Intro
No ratings yet
Artificial Neural Networks (Anns) : Intro
15 pages
Unit 1
No ratings yet
Unit 1
20 pages
Unit 1
No ratings yet
Unit 1
16 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
06 NeuralNetworks 2024
No ratings yet
06 NeuralNetworks 2024
82 pages
DL Lect 4
No ratings yet
DL Lect 4
41 pages
ML Unit-5
No ratings yet
ML Unit-5
22 pages
Ds Unit V Ann Perceptron
No ratings yet
Ds Unit V Ann Perceptron
69 pages
Artificial Neural Network Using R
No ratings yet
Artificial Neural Network Using R
15 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
01 02NeuralNetworks 2
No ratings yet
01 02NeuralNetworks 2
14 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
22 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Artificial Intelligence Basics
No ratings yet
Artificial Intelligence Basics
13 pages
Neural Networks: Associate Professor Department of Management Studies
No ratings yet
Neural Networks: Associate Professor Department of Management Studies
57 pages
Module 2
No ratings yet
Module 2
84 pages
Unit V
No ratings yet
Unit V
49 pages
Unit 4
100% (1)
Unit 4
57 pages
Unit 2 ML Ak
No ratings yet
Unit 2 ML Ak
12 pages
Components of Soft Computing Explained
No ratings yet
Components of Soft Computing Explained
29 pages
Neural Networks in Statistical Analysis
No ratings yet
Neural Networks in Statistical Analysis
41 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Understanding Neural Networks
No ratings yet
Understanding Neural Networks
12 pages
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
No ratings yet
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
48 pages
Neural Networks in Machine Learning
No ratings yet
Neural Networks in Machine Learning
61 pages
Homework Help Simplifying Rational Expressions
100% (1)
Homework Help Simplifying Rational Expressions
5 pages
Investigation On Eta and M Factors For J Int in SEB Specimens 2018
No ratings yet
Investigation On Eta and M Factors For J Int in SEB Specimens 2018
28 pages
CS 229 Project Report: Predicting Used Car Prices
100% (1)
CS 229 Project Report: Predicting Used Car Prices
5 pages
5-1 Recent Research Developments in Belt Conveyor Technology
No ratings yet
5-1 Recent Research Developments in Belt Conveyor Technology
17 pages
Artificial Intelligence Programming Language
No ratings yet
Artificial Intelligence Programming Language
2 pages
Linear Algebra: Basis and Spanning Sets
No ratings yet
Linear Algebra: Basis and Spanning Sets
19 pages
Module 3 - Direct Methods
No ratings yet
Module 3 - Direct Methods
6 pages
Error Proofing2011
100% (1)
Error Proofing2011
93 pages
M3 答案
No ratings yet
M3 答案
19 pages
Dynamics Lecture Notes: Newton's Laws
No ratings yet
Dynamics Lecture Notes: Newton's Laws
24 pages
Analysis of Correlated Data With SAS and R, 4th Edition Extended Version Download
No ratings yet
Analysis of Correlated Data With SAS and R, 4th Edition Extended Version Download
16 pages
Peb 2024
No ratings yet
Peb 2024
10 pages
Algebra 1 Ch. 5 Practice Test
100% (1)
Algebra 1 Ch. 5 Practice Test
7 pages
Build Better Products Laura Klein PDF Download
No ratings yet
Build Better Products Laura Klein PDF Download
67 pages
Lecture Notes 1 - Planar Mechanisms Part 2 - F2018
No ratings yet
Lecture Notes 1 - Planar Mechanisms Part 2 - F2018
43 pages
IB CS CheatSheet
No ratings yet
IB CS CheatSheet
2 pages
Fundamentals of Acoustics
No ratings yet
Fundamentals of Acoustics
180 pages
Notes
No ratings yet
Notes
5 pages
Thermodynamics
100% (5)
Thermodynamics
341 pages
Lecture 1
No ratings yet
Lecture 1
7 pages
DC Module3 - Error Detection
No ratings yet
DC Module3 - Error Detection
98 pages
Ma 110 - Assignment One, 2025
No ratings yet
Ma 110 - Assignment One, 2025
2 pages
Beam Columns PDF
No ratings yet
Beam Columns PDF
54 pages
STAT 516: Review of STAT 515 Notes
No ratings yet
STAT 516: Review of STAT 515 Notes
21 pages
Arduino Audio Spectrum Analyzer with Video
No ratings yet
Arduino Audio Spectrum Analyzer with Video
3 pages
Reynolds Number Experiment (Pre-Lab)
No ratings yet
Reynolds Number Experiment (Pre-Lab)
6 pages
Road and Off-Road Vehicle Dynamics Handbook
No ratings yet
Road and Off-Road Vehicle Dynamics Handbook
5 pages
Sec 33 B
No ratings yet
Sec 33 B
1 page
Deterioration Prediction of Building Components
No ratings yet
Deterioration Prediction of Building Components
9 pages
Allocation Problem
No ratings yet
Allocation Problem
9 pages

Chapter 5 ML

Uploaded by

Chapter 5 ML

Uploaded by

Machine Learning:

This lecture is inspired by Pr. Andrew Ng’s

Computer vision sees a matrix of pixel intensity values 5

To build a car detector

• Often good to include an x0 input - the bias unit (equal

- Some additional terms:

- Activation values become:

• We need to calculate a for the final

• Diagram below looks a lot like logistic regression

• XNOR is short for NOT XOR, i.e. NOT an exclusive or

• Can you find a Neural Network representation of the XNOR

• For neural networks our cost function is a

• To do so we can use one of the algorithms already

• The partial derivative term is the partial derivative

• If we do the calculus: g'(z(3)) = a(3) . * (1 - a(3))

When j = 0 we have no regularization term 51

• We can think about a δ term on a unit as the "error" of cost for

You might also like