Multi Layer Perceptron
Multi Layer Perceptron
The multilayer perceptron (MLP) belongs to the class of feedforward networks, meaning that the information flows among
the network nodes exclusively in the forward direction.
Multilayer perceptron with an input layer, three hidden layers, and an output layer
Backpropagation learning algorithm
The algorithm is based on the gradient descent technique for solving an optimization problem, which involves the
minimization of the network cumulative error Ec
where the index i represents the i-th neuron of
the output layer composed of a total number of
q neurons
being the square of the Euclidian norm of the vectorial difference between the k-th target output vector t(k) and the
k-th actual output vector o(k) of the network
Consider n is the number of training patterns presented to the network for learning purposes
The algorithm is designed in such a way as to update the weights in the direction of the gradient
descent of the cumulative error (with respect to the weight vector).
Off line (all the training patterns are
presented to the system at once) or on line
(training is made pattern by pattern)
with respect to the vector w(l ) corresponding to all
interconnection weights between layer (l ) and the preceding
layer (l − 1)
The signal toti (l ) represents the sum of all signals reaching
node (i) at hidden layer (l) coming from previous layer
(l-1).
• Using chain rule differentiation we obtain:
• For the case where layer (l) is the output layer (L), above equation
can be expressed as:
• Considering the case where f is the sigmoid function
• The error signal becomes expressed as:
• Propagating the error backward now, and for the case where (l)
represents a hidden layer (l < L), the expression of Δwij(l) is given as follows
• where the error signal δi(l) is now expressed as a function of output of
previous layers as:
To illustrate this powerful
algorithm, we apply it for the
training of the following
network shown in Figure.
The following three training
pattern pairs are used, with x
and t being the input and the
output data respectively:
Momentum
Effect of Hidden Nodes on Function Approximation
Effect of training patterns on function approximation