Lectures on Neural Networks and
its applications
Dr. P. K. UPADHYAY
Deptt. of Electrical & Electronics Engg.
Outline
◼ The Biological Neural Network
◼ Aritificial Neural Netwok
◼ Learning Process
◼ Activation Functions
◼ Hebb’s Network and Limitations
◼ Perceptron and its Limitaitons
◼ Adaline
◼ Multi-layer networks
◼ Backpropagation Algorithm
2
The biological inspiration
▪ Central theme of ANN is borrowed from BNN
▪ The information processing cells of the brain
are the neurons.
▪ The information is in form of electric signals.
3
The Structure of Neurons
A neuron has a cell body, a branching input
structure (the dendrites) and a branching output structure
(the axon)
▪ Axons connect to dendrites via synapses.
▪ Electro-chemical signals are propagated
from the dendritic input, through the cell
body, and down the axon to other neurons
4
The Structure of Neurons
synapse axon
nucleus
cell body
dendrites
Analogous ANN
5
6
▪ Communication between neurons is facilitated by
the release of small packets of chemicals into
the synaptic gap.
▪ One particular neuron may communicate with
100,000 other neurons.
7
Brain and Machine
▪ The Brain
▪ Pattern Recognition
▪ Association
▪ Complexity
▪ Noise Tolerance
▪ The Machine
▪ Calculation
▪ Precision
▪ Logic
8
Features of the Brain
▪ Highly complex, nonlinear and parallel processing
system
▪ Ten billion (1010) neurons
▪ 60 trillion connections or synapses
▪ Neuron switching time >10-3secs
▪ Face Recognition ~0.1secs
▪ On average, each neuron has several thousand
connections
▪ Hundreds of operations per second
▪ High degree of parallel computation
▪ Distributed representations 9
Artificial Neural Networks
Definition:
Artificial Neural Networks are massively parallel
adaptive networks of simple nonlinear computing
elements called neurons which are intended to abstract
and model some of the functionality of the human
nervous system in an attempt to partially capture some
of its computational strength.
10
Artificial Neural Networks
History
▪ 1938-Rashevsky initiated the study of neurodynamics.
▪ 1943-McCulloch & Pitts are generally recognised as
the designers of the first neural network.
▪ 1949-First learning rule given by Hebb.
▪ 1958-Invention of Perceptron by Rosenblatt.
▪ 1960-Widrow and Hoff introduced the Adaline.
▪ 1961-Rosenblatt proposed the backpropagation
scheme.
▪ 1980-Re-emergence of ANN - multi-layer networks.
11
The Structure of Neurons
▪ A neuron only fires if its input signal
exceeds a certain amount (the threshold)
in a short time period.
▪ Nature of Synapses:
▪ Excitatory (+)
▪ Inhibitory (-)
▪ No Connection (0)
12
Properties of Artificial Neural
Nets (ANNs)
13
Properties of Artificial Neural
Nets (ANNs)
▪ Nonlinearity
▪ Input-Output mapping
▪ Adaptivity
▪ Fault Tolerance
▪ Evidential Response
▪ VLSI Implementability
14
Learning
Learning is a process by which the free parameters of a neural
Network are adapted through a continuing process of Stimulation
by the environment in which the network is embedded.
•The Neural Network is stimulated by an Environment.
•As a result of this it undergoes changes.
•Then responds in a new way to the environment.
15
Taxonamy of the Learning
Process
Learning Algorithms (Rules)
Learning Process
Learning Paradigms
Learning Algorithms (Rules) Learning Paradigms
•Hebbian Learning •Supervised Learning
•Error-correction Learning •Reinforcement Learning
•Boltzmann Learning •Self-Organised Learning
•Competitive Learning
16
Supervised Learning
◼ Training and test data sets
◼ Training set input & target
17
Nature of Input Signal:
▪ Binary {0, 1}
▪ Bipolar {-1, +1}
▪ Analog [0, 1], [-1, 1]
18
19
20
21
Types of Layers
▪ The input layer.
▪ Introduces input values into the network.
▪ No activation function or other processing.
▪ The hidden layer(s).
▪ Perform classification of features
▪ Two hidden layers are sufficient to solve any problem
▪ Features imply more layers may be better
▪ The output layer.
▪ Functionally just like the hidden layers
▪ Outputs are passed on to the world outside the neural
network.
22
Activation functions
▪ Transforms neuron’s input into output. x0=1
▪ Features of activation functions: x1 w1
▪ A squashing effect is required w2
▪ Prevents accelerating growth of x2 y
.
▪ activation levels through the network. w
. n
xn
23
Activation Function/Sigmoid
Function
Standard activation
functions
▪ The hard-limiting threshold function
▪ Corresponds to the biological paradigm
▪ either fires or not
▪ Sigmoid functions ('S'-shaped curves)
1
▪ The logistic function (x) =
1 + e -ax
▪ The tangent hyperbolic (symmetrical)
▪ Both functions have a simple 1 - e -ax
(x) =
▪ differential form 1 + e -ax
▪ Only the shape is important
25
Linear Separability
X1
A
A
A
B
Decision
A Boundary
A B
B
B
A B
A B B
X2
B
26
Decision boundaries
▪ In simple cases, divide feature space by
drawing a hyperplane across it.
▪ Known as a decision boundary.
▪ Discriminant function: returns different values
on opposite sides. (straight line)
▪ Problems which can be thus classified are
linearly separable.
27
HEBB’S NETWORK
ALGORITHM
❖ Initialize all weights:
wi =0, bi=0
❖Set activations for input units: x i=si
❖Set activations for output unit: y=t
❖Adjust the weights and bias as
wi(new) = wi(old) + xi.y
b(new) = b(old) + y
28
HEBB’S NETWORK
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error : The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1
29
HEBB’S NETWORK
❖Hebb’s Net used for different Logic For AND
AB Output
Functions 00 0
01 0
10 0
❖Limitations: 11 1
▪ Can’t learn for any pattern for which
the target is zero.
▪ Only linearly separable patterns can
be classified
30
Perceptron
◼ Linear treshold unit (LTU)
x0=-1
x1 w1
θ
w2
x2 n y
. v= i=0wi xi
.
. wn
v(n)=wT(n) x(n)
xn n
1 if wi xi >=0
y(xi)= { i=0
-1 otherwise
31
Perceptron Learning Rule
ALGORITHM
▪ Initialize all weights and bias
w(0) =0, b(0) = 0
▪ Set learning rate η ( 0< η<=1)
▪Set activations for input units: xi = si
▪Set activations for output unit: y = t
▪Computen response for output unit:
v= b+ w x
i i
i=0
y= 1 if v>θ
= 0 if – θ ≤ v≤ θ
= -1 if v< -θ
32
Perceptron Learning Rule
▪ Update weights and bias in case of error
▪ If y ≠ t
wi(new) = wi(old) + ηtxi
b(new) = b(old) + ηt
33
A Perceptron for the AND Function:
(bipolar inputs and targets, η = 1, θ = 0)
Input Net(v) Out Targe Weight Weights
t Changes
x1 x2 1 w1 w2 b
0 0 0
1 1 1 0 0 1 1 1 1 1 1 1
1 -1 1 1 1 -1 -1 1 -1 0 2 0
-1 1 1 2 1 -1 1 -1 -1 1 1 -1
-1 -1 1 -3 -1 -1 0 0 0 1 1 -1
Second Epoch
1 1 -1
1 1 1 1 1 1 0 0 0 1 1 -1
1 -1 1 -1 -1 -1 0 0 0 1 1 -1
-1 1 1 -1 -1 -1 0 0 0 1 1 -1
-1 -1 1 -3 -1 -1 0 0 0 1 1 -1
34
Decision Surface of a
Perceptron
x2 x2
+
+ + -
+ -
- x1
x1
+ - - +
-
Linearly separable Non-Linearly separable
35
Different Non-Linearly
Separable Problems
Types of Exclusive-OR Classes with Most General
Structure
Decision Regions Problem Meshed regions Region Shapes
Single-Layer Half Plane A B
Bounded By B
A
Hyperplane B A
Two-Layer Convex Open A B
Or B
A
Closed Regions B A
Three-Layer Arbitrary
(Complexity A B
B
Limited by No. A
of Nodes) B A
36
Adaline
▪A Perceptron-like system
x1 w1 ▪Uses LMS algorithm
w2 v
x2 y ▪Minimizes mean squared error
.w
. n
- Threshold
xn
Error(e) + y=1 if v≥θ
-1 if v< θ
Desired Response
37
Multilayer Perceptron (MLP)
Output Values
Output Layer
Hidden Layer
Input Layer
Input Signals (External Stimuli) 38
Back-Propagation
▪ A training procedure which allows multi-layer
feedforward Neural Networks to be trained;
▪ Can theoretically perform “any” input-output
mapping;
▪ Can learn to solve linearly inseparable
problems.
39
Activation functions and
training
▪ For feed-forward networks:
▪ A continuous function can be differentiated
allowing gradient-descent.
▪ Back-propagation is an example of a
gradient-descent technique.
40
Gradient Descent Learning
Rule
▪ Consider linear unit without threshold and
continuous output o (not just –1,1)
o=w0 + w1 x1 + … + wn xn
▪ Train the wi’s such that they minimize the
squared error
E[w1,…,wn] = ½ dD (td-od)2
where D is the set of training examples
41
Gradient Descent
(w1,w2)
w=- E[w] (w1+w1,w2 +w2)
wi=- E/wi
=/wi 1/2d(td-od)2
= /wi 1/2d(td-i wi xi)2
= d(td- od)(-xi)
42
Gradient Descent
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn)
is the vector of input values, and t is the target output value, is the
learning rate (e.g. 0.1)
▪ Initialize each wi to some small random value
▪ Until the termination condition is met, Do
▪ Initialize each wi to zero
▪ For each <(x1,…xn),t> in training_examples Do
▪ Input the instance (x1,…,xn) to the linear unit and compute the
output o
▪ For each linear unit weight wi Do
wi= wi + (t-o) xi
▪ For each linear unit weight w Do
wi=wi+wi
43
Comparison Perceptron
and Gradient Descent Rule
▪ Perceptron learning rule guaranteed to succeed if
▪ Training examples are linearly separable
▪ Sufficiently small learning rate
▪ Linear unit training rules uses gradient descent
▪ Guaranteed to converge to hypothesis with minimum
squared error
▪ Given sufficiently small learning rate
▪ Even when training data contains noise
▪ Even when training data not separable by H
44
Multi-Layer Networks
output layer
hidden layer
input layer
45
Sigmoid Unit
x0=1
x1 w1
w0 net=i=0n wi xi o=(net)=1/(1+e-net)
w2
x2 o
.
. (x) is the sigmoid function: 1/(1+e-x)
. wn
d(x)/dx= (x) (1- (x))
xn
Derive gradient decent rules to train:
• one sigmoid function
E/wi = -d(td-od) od (1-od) xi
• Multilayer networks of sigmoid units
backpropagation:
46
47
48
49
Backpropagation Algorithm
The Error signal is given by: e j ( n) = d j ( n) − y j ( n)
The instant. Sum of squared error is ( n) = 1
2 ( n)
ej
jC
2
N
1
The average squared error is av =
N
( n)
n =1
p
v j (n) = w ji (n) y i (n)
i =0
Output of the Neuron j is yj (n) = j (vj (n))
(n) (n) ej (n) yj (n) vj (n)
=
wji(n) ej (n) yj (n) vj (n) wji(n)
ej (n) y j (n) vj (n)
(n)
= ej (n) ; = −1; = j (vj (n)) ;
!
= yj (n)
ej (n) yj (n) v j (n) wji(n)
50
Backpropagation Algorithm
(n)
= −ej (n)j ! (vj (n)) yi (n)
wji(n)
(n)
wji(n) = − ; wji(n) = j (n) yi (n)
wji(n)
(n) ej (n) yj (n)
j (n) = = − e j (n) j (vj (n))
!
ej (n) yj (n) vj (n)
Case: When neuron j is a hidden node:
(n) yj (n) (n) !
j (n) = − = j (vj (n))
yj (n) vj (n) yj (n)
(n) = 12 ek (n)
2
Neuron k is an output node
kC 51
Backpropagation Algorithm
(n) ek (n) ek (n) vk (n)
= ek (n) = ek (n)
yj (n) k yj (n) k vk (n) yj (n)
ek (n) = dk (n) − yk (n) = d k (n) − k (v j (n))
ek (n)
= − k (vk (n)) ;
!
vk (n)
q
vk (n) = w ji (n) y j (n)
j =0
vk (n)
= −wkj (n)
y j (n)
52
Backpropagation Algorithm
(n)
= − ek (n) k (vk (n)) wkj (n)
!
yj (n) k
(n)
= − k (n)wkj (n)
yj (n) k
j (n) = j (vj (n)) k (n)wkj (n)
!
k
weight learning local input signal
correction = rate parameter . gradient . of neuron j
Δw (n) η δ (n) y (n)
ji j j
When momentum w ji (n) = w ji (n − 1) + j (n) yi (n)
factor is included
53
Generalized Neuron Models
Simple summation neuron model. Simple product neuron model
54
Summation type neural network ( - ANN)
55
Product type Neuron (Π – Neuron)
It consists of product function at aggregation level and sigmoid
function at activation level.
56
Product – summation type neural network (Π- – ANN)
This is a network in which Π – neurons are taken at the hidden layer
and – neurons are at output layer. 57
Factors Affecting the Performance of Artificial Neural Network Models
ANN performance depends mainly upon the following factors:
1. Network
2. Problem complexity
3. Learning Complexity.
Network Complexity: Network complexity broadly depends on
a. Neuron complexity
b. Number of neurons in each layer
c. Number of layers
d. Number and type of interconnecting weights.
58
59
DESIGNING A NEURAL
NETWORK
▪ Using minimal size of network.
▪ Determine number of inputs and outputs.
▪ Determine number of hidden units.
H=P/(10*(M+N))
Where
P=Number of training examples
N=Number of inputs
M=Number of outputs
▪ Completing the network 60
GUIDELINES TO PREVENT
OVERTRAINING
▪ Selecting proper number of hidden
layers.
▪ Selecting proper number of perceptrons
in each layer.
▪ Presenting patterns randomly.
▪ Stopping training before excessive
training.
▪ Introducing noise into training patterns
61
PRUNING A NEURAL
NETWORK
▪ If a unit has same output throughout, remove the
unit and add the output of the unit to the bias of the
units connected to it.
▪ If two units in a layer have almost same output,
remove one unit and add its weight to the weight of
the other unit.
▪ If two units have nearly opposite outputs then
remove one unit and subtract its weight from the
weight of the other unit and add its weight to the
bias of the other unit.
▪ Remove all units with relatively small weights. 62
APPLICATION BASED POPULAR
NETWORKS
▪ BAM- Bidirectional Associative Memory
▪ BSB- Brain State in a Box
▪ CCN- Cascade Correlation
▪ CPN- Counter Propagation
▪ GRNN-Generalized Regression Neural
Network
▪ LVQ- Learning Vector Quantization
▪ MLFF with BP- Multilayer Feed Forward
with Backpropagation
▪ NLN- Neuro logic Network 63
APPLICATION BASED POPULAR
NETWORKS (cntd.)
▪ PNN- Probabilistic Neural Network
▪ RBF-Radial Basis Function
▪ RNN- Recurrent Neural Network
▪ RCE- Reduced Coulomb Energy
▪ SOFM- Self Organizing Feature Map
▪ Hopfield
▪ MADLINE- Multiple ADALINE
▪ HAMMING
▪ Perceptron
64
CATAGORIZATION OF NEURAL
NETWORKS
APPLICATION TYPE
Associative Optimization Classification Pattern General Prediction
Recognition
Memory Mapping
ADALINE ADALINE ADALINE
ART
ART BOLTZMANN ART CCN CCN
CCN
AM Hopfield CCN GRNN GRNN
CPN
BAM MLFF with BP CPN MADALINE
GRNN
BSB RNN GRHH MLFF
LVQ with BP
Hopfield SOFM LVQ
MLFF with BP
MLFF with BP MLFF with BP RBF
Neocognitron
RBF RNN
RBF
RCE SOFM
RCE
SOFM 65
SOFM
66
RESOURCES
❖ Neural Networks By Simon Haykin
Macmillan College Publishing Company
❖ Artificial Neural Networks
Kishan Mehrotra, Sanjay Ranka & C.K. Mohan
Penram International
❖ Neural Networks – A Classroom pproach
Satish Kumar (TMH)
❖Pattern Recognition with Neural Networks in C++
A.S. Pandya, Robert B. Macy (IEEE Press)
❖ Pattern Recognition in C++ By Rao and Rao 67
Thank You!
68