PRINCIPLES OF SOFT COMPUTING
Dr. Ajay Bhardwaj
The Perceptron
◼ Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model (1958)
◼ A more general computational model than
McCulloch–Pitts neurons
◼ Main differences: Introduction of numerical
weights for inputs and a mechanism for learning
these weights
◼ Inputs are no longer limited to Boolean values
◼ Refined and carefully analyzed by Minsky and
Papert (1969) - their model is referred to as the
perceptron model here.
Principles of Soft Computing (SRM University-AP) 2
Principles of Soft Computing (SRM University-AP) 3
The Perceptron
A more accepted convention,
Rewriting the above
Principles of Soft Computing (SRM University-AP) 5
Difference between Perceptron and McCulloch Pitts
neurons
The weights (including threshold) can be learned and the inputs can be real valued.
Principles of Soft Computing (SRM University-AP) 6
7
How does the perceptron learn its classification
tasks?
◼ This is done by making small adjustments in the weights to reduce the
difference between the actual and desired outputs of the perceptron.
◼ The initial weights are randomly assigned, usually in the range [-0.5, 0.5],
and then updated to obtain the output consistent with the training examples.
Principles of Soft Computing (SRM University-AP) 8
The initial weights are randomly assigned, usually in the range [-0.5, 0.5], and then updated to obtain the
output consistent with the training examples.
9
The perceptron learning rule
◼ If at iteration p, the actual output is Y(p) and the desired output is Yd (p), then
the error is given by:
e(p) =Yd (p)-Y(p) , where p = 1, 2, 3, . . .
Iteration p here refers to the pth training data presented to the perceptron.
◼ If the error, e(p), is positive, we need to increase perceptron output Y(p), but
if it is negative, we need to decrease Y(p).
where p = 1, 2, 3, . . .
a (alpha) is the learning rate, which is positive constant less than unity.
Principles of Soft Computing (SRM University-AP) 10
Perceptron’s training algorithm
Step 1: Initialization
Set initial weights w1, w2,…, wn and threshold q to random numbers in the range
[-0.5, 0.5].
If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it
is negative, we need to decrease Y(p).
Principles of Soft Computing (SRM University-AP) 11
Perceptron’s training algorithm
Step 2: Activation
Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired
output Yd (p).
Calculate the actual output at iteration p = 1
where n is the number of the perceptron inputs, and step is a step activation
function.
Principles of Soft Computing (SRM University-AP) 12
Perceptron’s training algorithm
◼ Step 3: Weight training
◼ Update the weights of the perceptron
where ∆wi(p) is the weight correction at iteration p.
◼ The weight correction is computed by the delta rule:
Principles of Soft Computing (SRM University-AP) 13
Perceptron’s training algorithm
◼ Step 4: Iteration
Increase iteration p by one, go back to Step 2 and repeat the process until
convergence.
Principles of Soft Computing (SRM University-AP) 14
Example: AND function using Perceptron
If y ≠ t, then
15
16
17
Final weights for AND gate
18
Numerical
19
20
21
22
23
Perceptron for AND
logic
This table shows only the
weight calculation for one
iteration.
Assignment-
Calculate final weights
generated after next 4 iterations
for each epochs
Principles of Soft Computing (SRM University-AP) 32
ADALINE- (Adaptive Linear Neuron)
◼ ADALINE- (Widrow and Hoff)
◼ Adaline is a single layer neural network with multiple
nodes where each node accepts multiple inputs and
generates one output.
◼ ADALINE network is very similar to the perceptron,
except that its transfer function is linear, instead of
hard-limiting.
◼ Both the ADALINE and the perceptron suffer from the
same inherent limitation: they can only solve linearly
separable problems.
◼ For example, most long distance phone lines use
ADALINE networks for echo cancellation or
classification problem.
◼ It uses bipolar activation function.
Principles of Soft Computing (SRM University-AP) 33
Important points about Adaline are as follows:
• It uses bipolar activation function.
• It uses delta rule for training to minimize the Mean-Squared Error (MSE)
between the actual output and the desired/target output.
• The weights and the bias are adjustable.
34
35
Training Algorithm
37
38
Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the highest weight
change occurred during training is smaller than the specified tolerance.
39
Example: Design the OR Gate using Adaline
40
41
x1 x2 t yin (t-yin) ∆w1 ∆w2 ∆b w1 (0.1) w2 (0.1) b (0.1) (t-yin)^2
1 1 1 0.3 0.7 0.07 0.07 0.07 0.17 0.17 0.17 0.49
1 -1 1 0.17 0.83 0.083 -0.083 0.083 0.253 0.087 0.253 0.69
-1 1 1 0.087 0.913 -0.0913 0.0913 0.0913 0.1617 0.1783 0.3443 0.83
-1 -1 -1 0.0043 -1.0043 0.1004 0.1004 -0.1004 0.2621 0.2787 0.2439 1.01
This is epoch 1 where the total error is 0.49 + 0.69 + 0.83 + 1.01 = 3.02
So, more epochs will run until the total error becomes less than equal to the least squared error i.e 2.
42
43
44
Example:
45
46
47
48
49
MADALINE- (Multiple Adaptive Linear Neuron)
50
• Architecture
• The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the Adaline layer, and 1 neuron of the
Madaline layer.
• The Adaline layer can be considered as the hidden layer as it is between the input layer and the output layer, i.e. the Madaline
layer.
51
By now we know that only the weights and bias between the input and the Adaline layer are to be adjusted, and the weights
and bias between the Adaline and the Madaline layer are fixed.
52
53
54
55
56
Example: Using the Madaline network, implement XOR function with bipolar inputs and targets.
Assume the required parameter for the training of the network.
•Initially, weights and bias are: Set α = 0.5
[w11 w21 b1] = [0.05 0.2 0.3]
[w12 w22 b2] = [0.1 0.2 0.15]
[v1 v2 v3] = [0.5 0.5 0.5]
57
• for the first i/p & o/p pair from training data :
x1 = 1 x2 = 1 t = -1 α = 0.5
• Net input to the hidden unit :
zin1 = b1 + x1w11 + x2w21 = 0.05 * 1 + 0.2 *1 + 0.3 = 0.55
zin2 = b2 + x1w12 + x2w22 = 0.1 * 1 + 0.2 *1 + 0.15 = 0.45
•Apply the activation function f(z) to the net input
z1 = f(zin1) = f(0.55) = 1
z2 = f(zin2) = f(0.45) = 1
•computation for the output layer
yin = b3 + z1v1 + z2v2 = 0.5 + 1 *0.5 + 1*0.5 = 1.5
y=f(yin) = f(1.5) = 1
•Since (y=1) is not equal to (t=-1) update the weights and bias
wij(new) =wij(old) + α(t-zinj)xi
bj(new) = bj(old) + α(t-zinj)
58
• w11(new) = w11(old) + α(t-zin1)x1 = 0.05 + 0.5(-1-0.55) * 1 = -0.725
w12(new) = w12(old) + α(t-zin2)x1 = 0.1 + 0.5(-1-0.45) * 1 = -0.625
b1(new) = b1(old) + α(t-zin1) = 0.3 + 0.5(-1-0.55) = -0.475
w21(new) = w21(old) + α(t-zin1)x2 = 0.2 + 0.5(-1-0.55) * 1 = -0.575
w22(new) = w22(old) + α(t-zin2)x2 = 0.2 + 0.5(-1-0.45) * 1 = -0.525
b2(new) = b2(old) + α(t-zin2) = 0.15 + 0.5(-1-0.45) = -0.575
So, after epoch 1 weight like :
[w11 w21 b1] = [-0.725 -0.575 -0.475]
[w12 w22 b2] = [-0.625 -0.525 -0.575]
59
60
ADALINE
algorithm
Principles of Soft Computing (SRM University-AP) 61
Assignment
ADALINE for AND function: Consider bipolar inputs and targets
Principles of Soft Computing (SRM University-AP) 62
Principles of Soft Computing (SRM University-AP) 63
Principles of Soft Computing (SRM University-AP) 64
Principles of Soft Computing (SRM University-AP) 65
Principles of Soft Computing (SRM University-AP) 66
Assignment:
Train a MADLINE network for XOR function: Consider bipolar
inputs and targets.
Principles of Soft Computing (SRM University-AP) 67
Backpropagation Network
Why different type of neural network architecture?
◼ To give the answer to this question, let us first consider the case of a single
neural network with two inputs as shown below
Principles of Soft Computing (SRM University-AP) 69
Back-propagation neural network
Principles of Soft Computing (SRM University-AP) 70
Back-propagation training algorithm
◼ Step 1: Initialization: Set all the weights and threshold levels of the
network to random numbers uniformly distributed inside a small range.
◼ Step 2: Activation: Activate the back-propagation neural network by
applying inputs x1(p), x2(p),…, xn(p) and desired outputs yd,1(p), yd,2(p),…,
yd,n(p).
(a) Calculate the actual outputs of the neurons in the hidden layer:
where n is the number of inputs of neuron j in the hidden layer, and sigmoid is
the sigmoid activation function.
Principles of Soft Computing (SRM University-AP) 71
Back-propagation training algorithm
Step 2: Activation (continued)
(b) Calculate the actual outputs of the neurons in the output layer:
where m is the number of inputs of neuron k in the output layer.
Principles of Soft Computing (SRM University-AP) 72
Back-propagation training algorithm
Step 3: Weight training
◼ Update the weights in the back-propagation network propagating backward
the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output layer:
where
Calculate the weight corrections:
Update the weights at the output neurons:
Principles of Soft Computing (SRM University-AP) 73
Back-propagation training algorithm
Step 3: Weight training (continued)
◼ (b) Calculate the error gradient for the neurons in the hidden layer:
◼ Calculate the weight corrections:
◼ Update the weights at the hidden neurons:
Principles of Soft Computing (SRM University-AP) 74
Back-propagation training algorithm
Step 4: Iteration
◼ Increase iteration p by one, go back to Step 2 and repeat the
process until the selected error criterion Is satisfied.
◼ Visualization
◼ [Link]
Principles of Soft Computing (SRM University-AP) 75
Three-layer network for solving the Exclusive-OR
operation
Principles of Soft Computing (SRM University-AP) 76
77
78
79
80
81
82
We do not know the
target at Z1 and Z2.
83
84
85
86
87
88
89
Exclusive-OR operation-Numerical
◼ The effect of the threshold applied to a neuron in the hidden or output layer is
represented by its weight, θ, connected to a fixed input equal to -1.
◼ The initial weights and threshold levels are set randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, θ3 = 0.8, θ4 = -
0.1 and θ5 = 0.3.
◼ We consider a training set where inputs x1 and x2 are equal to 1 and desired
output yd,5 is 0. The actual outputs of neurons 3 and 4 in the hidden layer are
calculated as
Principles of Soft Computing (SRM University-AP) 90
Exclusive-OR operation-Numerical
◼ Now the actual output of neuron 5 in the output layer is determined as:
◼ Thus, the following error is obtained:
Principles of Soft Computing (SRM University-AP) 91
Exclusive-OR operation-Numerical
◼ The next step is weight training. To update the weights and threshold levels
in our network, we propagate the error, e, from the output layer backward to
the input layer.
◼ First, we calculate the error gradient for neuron 5 in the output layer:
◼ Then we determine the weight corrections assuming that the learning rate
parameter, a, is equal to 0.1:
Principles of Soft Computing (SRM University-AP) 92
Exclusive-OR operation-Numerical
◼ At last, we update all weights and threshold:
Principles of Soft Computing (SRM University-AP) 93
Final results of three-layer network learning
Principles of Soft Computing (SRM University-AP) 94
Accelerated learning in multilayer neural networks
◼ We also can accelerate training by including a momentum term in the delta
rule:
where β is a positive number (0 ≤ β < 1) called the momentum constant.
Typically, the momentum constant is set to 0.95. This equation is called the
generalised delta rule.
Principles of Soft Computing (SRM University-AP) 95
Accelerated learning in multilayer neural networks
Learning with adaptive learning rate
To accelerate the convergence and yet avoid the danger of instability, we can
apply two heuristics:
◼ Heuristic 1
If the change of the sum of squared errors has the same algebraic sign for
several consequent epochs, then the learning rate parameter, α, should be
increased.
◼ Heuristic 2
If the algebraic sign of the change of the sum of squared errors alternates for
several consequent epochs, then the learning rate parameter, α, should be
decreased.
Principles of Soft Computing (SRM University-AP) 96
Advantages of ANN
◼ ANNs exhibits mapping capabilities, that is, they can map input patterns to their
associated output pattern.
◼ Thus, an ANN architecture can be trained with known example of a problem before
they are tested for their inference capabilities on unknown instance of the problem. In
other words, they can identify new objects previous untrained.
◼ The ANNs posses the capability to generalize. This is the power to apply in application
where exact mathematical model to problem are not possible.
◼ The ANNs are robust system and fault tolerant. They can therefore, recall full patterns
from incomplete, partial or noisy patterns. The ANNS can process information in
parallel, at high speed and in a distributed manner. Thus a massively parallel
distributed processing system made up of highly interconnected (artificial) neural
computing elements having ability to learn and acquire knowledge is possible.
Principles of Soft Computing (SRM University-AP) 97
Group 1
98
Group 2
99
Group 3
100
Group 4
101
Group 5
Train a MADLINE network for XOR function: Consider bipolar
inputs and targets.
102
Group 6
103
Group 7
104
Group 8
105