0% found this document useful (0 votes)
2 views

ANN

ANN

Uploaded by

khangb2110485
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ANN

ANN

Uploaded by

khangb2110485
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

The work flow for the neural network design process has seven primary steps.

Referenced topics discuss the basic


ideas behind steps 2, 3, and 5.

1. Collect data
2. Create the network — Create Neural Network Object
3. Configure the network — Configure Neural Network Inputs and Outputs
4. Initialize the weights and biases
5. Train the network — Neural Network Training Concepts
6. Validate the network
7. Use the network
Data collection in step 1 generally occurs outside the framework of Neural Network Toolbox™ software, but it is
discussed in general terms in Multilayer Neural Networks and Backpropagation Training. Details of the other steps
and discussions of steps 4, 6, and 7, are discussed in topics specific to the type of network.

The Neural Network Toolbox software uses the network object to store all of the information that defines a neural
network. This topic describes the basic components of a neural network and shows how they are created and stored
in the network object.

After a neural network has been created, it needs to be configured and then trained. Configuration involves arranging
the network so that it is compatible with the problem you want to solve, as defined by sample data. After the network
has been configured, the adjustable network parameters (called weights and biases) need to be tuned, so that the
network performance is optimized. This tuning process is referred to as training the network. Configuration and
training require that the network be provided with example data. This topic shows how to format the data for
presentation to the network. It also explains network configuration and the two forms of network training: incremental
training and batch training.

===============================================================
Simple Neuron
The fundamental building block for neural networks is the single-input neuron, such as this example.

There are three distinct functional operations that take place in this example neuron. First, the scalar input p is
multiplied by the scalar weight w to form the product wp, again a scalar. Second, the weighted input wp is added to
the scalar bias b to form the net input n. (In this case, you can view the bias as shifting the function f to the left by an
amount b. The bias is much like a weight, except that it has a constant input of 1.) Finally, the net input is passed
through the transfer function f, which produces the scalar output a. The names given to these three processes are:
the weightfunction, the net input function and the transfer function.

For many types of neural networks, the weight function is a product of a weight times the input, but other weight
functions (e.g., the distance between the weight and the input, |w − p|) are sometimes used. (For a list of weight
functions, type help nnweight.) The most common net input function is the summation of the weighted inputs
with the bias, but other operations, such as multiplication, can be used. (For a list of net input functions, type help
nnnetinput.) Introduction to Radial Basis Neural Networks discusses how distance can be used as the weight
function and multiplication can be used as the net input function. There are also many types of transfer functions.
Examples of various transfer functions are in Transfer Functions. (For a list of transfer functions, type help
nntransfer.)

Note that w and b are both adjustable scalar parameters of the neuron. The central idea of neural networks is that
such parameters can be adjusted so that the network exhibits some desired or interesting behavior. Thus, you
can train the network to do a particular job by adjusting the weight or bias parameters.

All the neurons in the Neural Network Toolbox™ software have provision for a bias, and a bias is used in many of the
examples and is assumed in most of this toolbox. However, you can omit a bias in a neuron if you want.

Transfer Functions
Many transfer functions are included in the Neural Network Toolbox software.

Two of the most commonly used functions are shown below.

The following figure illustrates the linear transfer function.

Neurons of this type are used in the final layer of multilayer networks that are used as function approximators. This is
shown in Multilayer Neural Networks and Backpropagation Training.

The sigmoid transfer function shown below takes the input, which can have any value between plus and minus
infinity, and squashes the output into the range 0 to 1.

This transfer function is commonly used in the hidden layers of multilayer networks, in part because it is differentiable.

The symbol in the square to the right of each transfer function graph shown above represents the associated transfer
function. These icons replace the general f in the network diagram blocks to show the particular transfer function
being used.

For a complete list of transfer functions, type help nntransfer. You can also specify your own transfer
functions.

You can experiment with a simple neuron and various transfer functions by running the example program nnd2n1.

Neuron with Vector Input


The simple neuron can be extended to handle inputs that are vectors. A neuron with a single R-element input vector
is shown below. Here the individual input elements

p1,p2,…pR

are multiplied by weights

w1,1,w1,2,…w1,R

and the weighted values are fed to the summing junction. Their sum is simply Wp, the dot product of the (single row)
matrix W and the vector p. (There are other weight functions, in addition to the dot product, such as the distance
between the row of the weight matrix and the input vector, as in Introduction to Radial Basis Neural Networks.)

The neuron has a bias b, which is summed with the weighted inputs to form the net input n. (In addition to the
summation, other net input functions can be used, such as the multiplication that is used in Introduction to Radial
Basis Neural Networks.) The net input n is the argument of the transfer function f.

n=w1,1p1+w1,2p2+…+w1,RpR+b

This expression can, of course, be written in MATLAB® code as

n = W*p + b

However, you will seldom be writing code at this level, for such code is already built into functions to define and
simulate entire networks.

Abbreviated Notation

The figure of a single neuron shown above contains a lot of detail. When you consider networks with many neurons,
and perhaps layers of many neurons, there is so much detail that the main thoughts tend to be lost. Thus, the authors
have devised an abbreviated notation for an individual neuron. This notation, which is used later in circuits of multiple
neurons, is shown here.
Here the input vector p is represented by the solid dark vertical bar at the left. The dimensions of p are shown below
the symbol p in the figure as R × 1. (Note that a capital letter, such as R in the previous sentence, is used when
referring to the size of a vector.) Thus, p is a vector of R input elements. These inputs postmultiply the single-row, R-
column matrix W. As before, a constant 1 enters the neuron as an input and is multiplied by a scalar bias b. The net
input to the transfer function f is n, the sum of the bias b and the product Wp. This sum is passed to the transfer
function fto get the neuron's output a, which in this case is a scalar. Note that if there were more than one neuron, the
network output would be a vector.

A layer of a network is defined in the previous figure. A layer includes the weights, the multiplication and summing
operations (here realized as a vector product Wp), the bias b, and the transfer function f. The array of inputs,
vector p, is not included in or called a layer.

As with the Simple Neuron, there are three operations that take place in the layer: the weight function (matrix
multiplication, or dot product, in this case), the net input function (summation, in this case), and the transfer function.

Each time this abbreviated network notation is used, the sizes of the matrices are shown just below their matrix
variable names. This notation will allow you to understand the architectures and follow the matrix mathematics
associated with them.

As discussed in Transfer Functions, when a specific transfer function is to be used in a figure, the symbol for that
transfer function replaces the f shown above. Here are some examples.

You can experiment with a two-element neuron by running the example program nnd2n2.

===============================================================
Two or more of the neurons shown earlier can be combined in a layer, and a particular network could contain one or
more such layers. First consider a single layer of neurons.

One Layer of Neurons


A one-layer network with R input elements and S neurons follows.
In this network, each element of the input vector p is connected to each neuron input through the weight matrix W.
The ith neuron has a summer that gathers its weighted inputs and bias to form its own scalar output n(i). The
various n(i) taken together form an S-element net input vector n. Finally, the neuron layer outputs form a column
vector a. The expression for a is shown at the bottom of the figure.

Note that it is common for the number of inputs to a layer to be different from the number of neurons (i.e., R is not
necessarily equal to S). A layer is not constrained to have the number of its inputs equal to the number of its neurons.

You can create a single (composite) layer of neurons having different transfer functions simply by putting two of the
networks shown earlier in parallel. Both networks would have the same inputs, and each network would create some
of the outputs.

The input vector elements enter the network through the weight matrix W.

W= w1,1w2,1wS,1w1,2w2,2wS,2………w1,Rw2,RwS,R

 
Note that the row indices on the elements of matrix W indicate the destination neuron of the weight, and the column
indices indicate which source is the input for that weight. Thus, the indices inw1,2 say that the strength of the
signal from the second input element to the first (and only) neuron is w1,2.

The S neuron R-input one-layer network also can be drawn in abbreviated notation.

Here p is an R-length input vector, W is an S × R matrix, a and b are S-length vectors. As defined previously, the
neuron layer includes the weight matrix, the multiplication operations, the bias vector b, the summer, and the transfer
function blocks.

Inputs and Layers

To describe networks having multiple layers, the notation must be extended. Specifically, it needs to make a
distinction between weight matrices that are connected to inputs and weight matrices that are connected between
layers. It also needs to identify the source and destination for the weight matrices.

We will call weight matrices connected to inputs input weights; we will call weight matrices connected to layer
outputs layer weights. Further, superscripts are used to identify the source (second index) and the destination (first
index) for the various weights and other elements of the network. To illustrate, the one-layer multiple input network
shown earlier is redrawn in abbreviated form here.
As you can see, the weight matrix connected to the input vector p is labeled as an input weight matrix (IW1,1) having a
source 1 (second index) and a destination 1 (first index). Elements of layer 1, such as its bias, net input, and output
have a superscript 1 to say that they are associated with the first layer.

Multiple Layers of Neurons uses layer weight (LW) matrices as well as input weight (IW) matrices.

Multiple Layers of Neurons


A network can have several layers. Each layer has a weight matrix W, a bias vector b, and an output vector a. To
distinguish between the weight matrices, output vectors, etc., for each of these layers in the figures, the number of
the layer is appended as a superscript to the variable of interest. You can see the use of this layer notation in the
three-layer network shown next, and in the equations at the bottom of the figure.

The network shown above has R1 inputs, S1 neurons in the first layer, S2 neurons in the second layer, etc. It is
common for different layers to have different numbers of neurons. A constant input 1 is fed to the bias for each
neuron.

Note that the outputs of each intermediate layer are the inputs to the following layer. Thus layer 2 can be analyzed as
a one-layer network with S1 inputs, S2 neurons, and an S2 × S1 weight matrixW2. The input to layer 2 is a1; the output
is a2. Now that all the vectors and matrices of layer 2 have been identified, it can be treated as a single-layer network
on its own. This approach can be taken with any layer of the network.

The layers of a multilayer network play different roles. A layer that produces the network output is called an output
layer. All other layers are called hidden layers. The three-layer network shown earlier has one output layer (layer 3)
and two hidden layers (layer 1 and layer 2). Some authors refer to the inputs as a fourth layer. This toolbox does not
use that designation.

The architecture of a multilayer network with a single input vector can be specified with the notation R − S1 − S2 −...
− SM, where the number of elements of the input vector and the number of neurons in each layer are specified.

The same three-layer network can also be drawn using abbreviated notation.

Multiple-layer networks are quite powerful. For instance, a network of two layers, where the first layer is sigmoid and
the second layer is linear, can be trained to approximate any function (with a finite number of discontinuities)
arbitrarily well. This kind of two-layer network is used extensively in Multilayer Neural Networks and Backpropagation
Training.

Here it is assumed that the output of the third layer, a3, is the network output of interest, and this output is labeled
as y. This notation is used to specify the output of multilayer networks.

Input and Output Processing Functions


Network inputs might have associated processing functions. Processing functions transform user input data to a form
that is easier or more efficient for a network.

For instance, mapminmax transforms input data so that all values fall into the interval [−1, 1]. This can speed up
learning for many networks. removeconstantrows removes the rows of the input vector that correspond to input
elements that always have the same value, because these input elements are not providing any useful information to
the network. The third common processing function is fixunknowns, which recodes unknown data (represented in
the user's data with NaN values) into a numerical form for the network. fixunknowns preserves information about
which values are known and which are unknown.

Similarly, network outputs can also have associated processing functions. Output processing functions are used to
transform user-provided target vectors for network use. Then, network outputs are reverse-processed using the same
functions to produce output data with the same characteristics as the original user-provided targets.

Both mapminmax and removeconstantrows are often associated with network outputs.
However, fixunknowns is not. Unknown values in targets (represented by NaN values) do not need to be altered
for network use.

Processing functions are described in more detail in Choose Neural Network Input-Output Processing Functions.

========================================================================================

Rosenblatt [Rose61] created many variations of the perceptron. One of the simplest was a single-layer network
whose weights and biases could be trained to produce a correct target vector when presented with the corresponding
input vector. The training technique used is called the perceptron learning rule. The perceptron generated great
interest due to its ability to generalize from its training vectors and learn from initially randomly distributed
connections. Perceptrons are especially suited for simple problems in pattern classification. They are fast and reliable
networks for the problems they can solve. In addition, an understanding of the operations of the perceptron provides
a good basis for understanding more complex networks.

The discussion of perceptrons in this section is necessarily brief. For a more thorough discussion, see Chapter 4,
"Perceptron Learning Rule," of [HDB1996], which discusses the use of multiple layers of perceptrons to solve more
difficult problems beyond the capability of one layer.

Neuron Model
A perceptron neuron, which uses the hard-limit transfer function hardlim, is shown below.

Each external input is weighted with an appropriate weight w1j, and the sum of the weighted inputs is sent to the hard-
limit transfer function, which also has an input of 1 transmitted to it through the bias. The hard-limit transfer function,
which returns a 0 or a 1, is shown below.

The perceptron neuron produces a 1 if the net input into the transfer function is equal to or greater than 0; otherwise it
produces a 0.

The hard-limit transfer function gives a perceptron the ability to classify input vectors by dividing the input space into
two regions. Specifically, outputs will be 0 if the net input n is less than 0, or 1 if the net input n is 0 or greater. The
following figure show the input space of a two-input hard limit neuron with the weights w1,1 = −1, w1,2 = 1 and a bias b =
1.
Two classification regions are formed by the decision boundary line L at
Wp + b = 0. This line is perpendicular to the weight matrix W and shifted according to the bias b. Input vectors above
and to the left of the line L will result in a net input greater than 0 and, therefore, cause the hard-limit neuron to output
a 1. Input vectors below and to the right of the line L cause the neuron to output 0. You can pick weight and bias
values to orient and move the dividing line so as to classify the input space as desired.

Hard-limit neurons without a bias will always have a classification line going through the origin. Adding a bias allows
the neuron to solve problems where the two sets of input vectors are not located on different sides of the origin. The
bias allows the decision boundary to be shifted away from the origin, as shown in the plot above.

You might want to run the example program nnd4db. With it you can move a decision boundary around, pick new
inputs to classify, and see how the repeated application of the learning rule yields a network that does classify the
input vectors properly.

Perceptron Architecture
The perceptron network consists of a single layer of S perceptron neurons connected to R inputs through a set of
weights wi,j, as shown below in two forms. As before, the network indices i and jindicate that wi,j is the strength of the
connection from the jth input to the ith neuron.
The perceptron learning rule described shortly is capable of training only a single layer. Thus only one-layer networks
are considered here. This restriction places limitations on the computation a perceptron can perform. The types of
problems that perceptrons are capable of solving are discussed in Limitations and Cautions.

Create a Perceptron
You can create a perceptron with the following:

net = perceptron;

net = configure(net,P,T);

where input arguments are as follows:

 P is an R-by-Q matrix of Q input vectors of R elements each.


 T is an S-by-Q matrix of Q target vectors of S elements each.
Commonly, the hardlim function is used in perceptrons, so it is the default.

The following commands create a perceptron network with a single one-element input vector with the values 0 and 2,
and one neuron with outputs that can be either 0 or 1:

P = [0 2];

T = [0 1];

net = perceptron;

net = configure(net,P,T);

You can see what network has been created by executing the following command:

inputweights = net.inputweights{1,1}

which yields

inputweights =

delays: 0

initFcn: 'initzero'

learn: true

learnFcn: 'learnp'

learnParam: (none)

size: [1 1]

weightFcn: 'dotprod'

weightParam: (none)

userdata: (your custom info)

The default learning function is learnp, which is discussed in Perceptron Learning Rule (learnp). The net input to
the hardlim transfer function is dotprod, which generates the product of the input vector and weight matrix and
adds the bias to compute the net input.

The default initialization function initzero is used to set the initial values of the weights to zero.
Similarly,

biases = net.biases{1}

gives

biases =

initFcn: 'initzero'

learn: 1

learnFcn: 'learnp'

learnParam: []

size: 1

userdata: [1x1 struct]

You can see that the default initialization for the bias is also 0.

Perceptron Learning Rule (learnp)


Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a set of input,
output pairs

p1t1,p2t1,…,pQtQ

where p is an input to the network and t is the corresponding correct (target) output. The objective is to reduce the
error e, which is the difference t − a between the neuron response a and the target vector t. The perceptron learning
rule learnp calculates desired changes to the perceptron's weights and biases, given an input vector p and the
associated error e. The target vector t must contain values of either 0 or 1, because perceptrons
(with hardlim transfer functions) can only output these values.

Each time learnp is executed, the perceptron has a better chance of producing the correct outputs. The perceptron
rule is proven to converge on a solution in a finite number of iterations if a solution exists.

If a bias is not used, learnp works to find a solution by altering only the weight vector w to point toward input
vectors to be classified as 1 and away from vectors to be classified as 0. This results in a decision boundary that is
perpendicular to w and that properly classifies the input vectors.

There are three conditions that can occur for a single neuron once an input vector p is presented and the network's
response a is calculated:

CASE 1. If an input vector is presented and the output of the neuron is correct (a = t and e = t – a = 0), then the
weight vector w is not altered.

CASE 2. If the neuron output is 0 and should have been 1 (a = 0 and t = 1, and e = t – a = 1), the input vector p is
added to the weight vector w. This makes the weight vector point closer to the input vector, increasing the chance
that the input vector will be classified as a 1 in the future.

CASE 3. If the neuron output is 1 and should have been 0 (a = 1 and t = 0, and e = t – a = –1), the input vector p is
subtracted from the weight vector w. This makes the weight vector point farther away from the input vector,
increasing the chance that the input vector will be classified as a 0 in the future.

The perceptron learning rule can be written more succinctly in terms of the error e = t – a and the change to be made
to the weight vector Δw:
CASE 1. If e = 0, then make a change Δw equal to 0.

CASE 2. If e = 1, then make a change Δw equal to pT.

CASE 3. If e = –1, then make a change Δw equal to –pT.

All three cases can then be written with a single expression:


T T
Δw=(t−α)p =ep
You can get the expression for changes in a neuron's bias by noting that the bias is simply a weight that always has
an input of 1:

Δb=(t−α)(1)=e
For the case of a layer of neurons you have
T T
ΔW=(t−a)(p) =e(p)
and

Δb=(t−a)=e
The perceptron learning rule can be summarized as follows:
new old T
W =W +ep
and
new old
b =b +e
where e = t – a.

Now try a simple example. Start with a single neuron having an input vector with just two elements.

net = perceptron;

net = configure(net,[0;0],0);

To simplify matters, set the bias equal to 0 and the weights to 1 and -0.8:

net.b{1} = [0];

w = [1 -0.8];

net.IW{1,1} = w;

The input target pair is given by

p = [1; 2];

t = [1];

You can compute the output and error with

a = net(p)

a =

0
e = t-a

e =

and use the function learnp to find the change in the weights.

dw = learnp(w,p,[],[],[],[],e,[],[],[],[],[])

dw =

1 2

The new weights, then, are obtained as

w = w + dw

w =

2.0000 1.2000

The process of finding new weights (and biases) can be repeated until there are no errors. Recall that the perceptron
learning rule is guaranteed to converge in a finite number of steps for all problems that can be solved by a
perceptron. These include all classification problems that are linearly separable. The objects to be classified in such
cases can be separated by a single line.

You might want to try the example nnd4pr. It allows you to pick new input vectors and apply the learning rule to
classify them.

Training (train)
If sim and learnp are used repeatedly to present inputs to a perceptron, and to change the perceptron weights
and biases according to the error, the perceptron will eventually find weight and bias values that solve the problem,
given that the perceptron can solve it. Each traversal through all the training input and target vectors is called a pass.

The function train carries out such a loop of calculation. In each pass the function train proceeds through the
specified sequence of inputs, calculating the output, error, and network adjustment for each input vector in the
sequence as the inputs are presented.

Note that train does not guarantee that the resulting network does its job. You must check the new values
of W and b by computing the network output for each input vector to see if all targets are reached. If a network does
not perform successfully you can train it further by calling train again with the new weights and biases for more
training passes, or you can analyze the problem to see if it is a suitable problem for the perceptron. Problems that
cannot be solved by the perceptron network are discussed in Limitations and Cautions.

To illustrate the training procedure, work through a simple problem. Consider a one-neuron perceptron with a single
vector input having two elements:
This network, and the problem you are about to consider, are simple enough that you can follow through what is done
with hand calculations if you want. The problem discussed below follows that found in [HDB1996].

Suppose you have the following classification problem and would like to solve it with a single vector input, two-
element perceptron network.

{ [ ] }{ [ ] }{ [ ] }{ [ ] }
p1= 22 ,t1=0 p2= 1−2 ,t2=1 p3= −22 ,t3=0 p4= −11 ,t4=1
Use the initial weights and bias. Denote the variables at each step of this calculation by using a number in
parentheses after the variable. Thus, above, the initial values are W(0) and b(0).

W(0)=[00]b(0)=0
Start by calculating the perceptron's output a for the first input vector p1, using the initial weights and bias.

( [ ] )
α=hardlim(W(0)p1+b(0))=hardlim [00] 22 +0 =hardlim(0)=1
The output a does not equal the target value t1, so use the perceptron rule to find the incremental changes to the
weights and biases based on the error.

e=t1−α=0−1=−1ΔW=epT1=(−1)[22]=[−2−2]Δb=e=(−1)=−1

You can calculate the new weights and bias using the perceptron update rules.
new old T new old
W =W +ep =[00]+[−2−2]=[−2−2]=W(1)b =b +e=0+(−1)=−1=b(1)
Now present the next input vector, p2. The output is calculated below.

( [ ] )
α=hardlim(W(1)p2+b(1))=hardlim [−2−2] 1−2 −1 =hardlim(1)=1
On this occasion, the target is 1, so the error is zero. Thus there are no changes in weights or bias, so W(2) = W(1) =
[−2 −2] and b(2) = b(1) = −1.

You can continue in this fashion, presenting p3 next, calculating an output and the error, and making changes in the
weights and bias, etc. After making one pass through all of the four inputs, you get the values W(4) = [−3 −1] and b(4)
= 0. To determine whether a satisfactory solution is obtained, make one pass through all input vectors to see if they
all produce the desired target values. This is not true for the fourth input, but the algorithm does converge on the sixth
presentation of an input. The final values are

W(6) = [−2 −3] and b(6) = 1.

This concludes the hand calculation. Now, how can you do this using the train function?
The following code defines a perceptron.

net = perceptron;

Consider the application of a single input

p = [2; 2];

having the target

t = [0];

Set epochs to 1, so that train goes through the input vectors (only one here) just one time.

net.trainParam.epochs = 1;

net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1}

w =

-2 -2

b =

-1

Thus, the initial weights and bias are 0, and after training on only the first vector, they have the values [−2 −2] and −1,
just as you hand calculated.

Now apply the second input vector p2. The output is 1, as it will be until the weights and bias are changed, but now
the target is 1, the error will be 0, and the change will be zero. You could proceed in this way, starting from the
previous result and applying a new input vector time after time. But you can do this job automatically with train.

Apply train for one epoch, a single pass through the sequence of all four input vectors. Start with the network
definition.

net = perceptron;

net.trainParam.epochs = 1;

The input vectors and targets are

p = [[2;2] [1;-2] [-2;2] [-1;1]]

t = [0 1 0 1]

Now train the network with

net = train(net,p,t);

The new weights and bias are

w = net.iw{1,1}, b = net.b{1}

w =

-3 -1
b =

This is the same result as you got previously by hand.

Finally, simulate the trained network for each of the inputs.

a = net(p)

a =

0 0 1 1

The outputs do not yet equal the targets, so you need to train the network for more than one pass. Try more epochs.
This run gives a mean absolute error performance of 0 after two epochs:

net.trainParam.epochs = 1000;

net = train(net,p,t);

Thus, the network was trained by the time the inputs were presented on the third epoch. (As you know from hand
calculation, the network converges on the presentation of the sixth input vector. This occurs in the middle of the
second epoch, but it takes the third epoch to detect the network convergence.) The final weights and bias are

w = net.iw{1,1}, b = net.b{1}

w =

-2 -3

b =

The simulated output and errors for the various inputs are

a = net(p)

a =

0 1 0 1

error = a-t

error =

0 0 0 0

You confirm that the training procedure is successful. The network converges and produces the correct target outputs
for the four input vectors.

The default training function for networks created with perceptron is trainc. (You can find this by
executing net.trainFcn.) This training function applies the perceptron learning rule in its pure form, in that
individual input vectors are applied individually, in sequence, and corrections to the weights and bias are made after
each presentation of an input vector. Thus, perceptron training with train will converge in a finite number of steps
unless the problem presented cannot be solved with a simple perceptron.

The function train can be used in various ways by other networks as well. Type help train to read more about
this basic function.
You might want to try various example programs. For instance, demop1 illustrates classification and training of a
simple perceptron.

Limitations and Cautions


Perceptron networks should be trained with adapt, which presents the input vectors to the network one at a time
and makes corrections to the network based on the results of each presentation. Use of adapt in this way
guarantees that any linearly separable problem is solved in a finite number of training presentations.

As noted in the previous pages, perceptrons can also be trained with the function train. Commonly
when train is used for perceptrons, it presents the inputs to the network in batches, and makes corrections to the
network based on the sum of all the individual corrections. Unfortunately, there is no proof that such a training
algorithm converges for perceptrons. On that account the use of train for perceptrons is not recommended.

Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of two
values (0 or 1) because of the hard-limit transfer function. Second, perceptrons can only classify linearly separable
sets of vectors. If a straight line or a plane can be drawn to separate the input vectors into their correct categories, the
input vectors are linearly separable. If the vectors are not linearly separable, learning will never reach a point where
all vectors are classified properly. However, it has been proven that if the vectors are linearly separable, perceptrons
trained adaptively will always find a solution in finite time. You might want to try demop6. It shows the difficulty of
trying to classify input vectors that are not linearly separable.

It is only fair, however, to point out that networks with more than one perceptron can be used to solve more difficult
problems. For instance, suppose that you have a set of four vectors that you would like to classify into distinct groups,
and that two lines can be drawn to separate them. A two-neuron network can be found such that its two decision
boundaries classify the inputs into four categories. For additional discussion about perceptrons and to examine more
complex perceptron problems, see [HDB1996].

Outliers and the Normalized Perceptron Rule

Long training times can be caused by the presence of an outlier input vector whose length is much larger or smaller
than the other input vectors. Applying the perceptron learning rule involves adding and subtracting input vectors from
the current weights and biases in response to error. Thus, an input vector with large elements can lead to changes in
the weights and biases that take a long time for a much smaller input vector to overcome. You might want to
try demop4 to see how an outlier affects the training.

By changing the perceptron learning rule slightly, you can make training times insensitive to extremely large or small
outlier input vectors.

Here is the original rule for updating weights:


T T
Δw=(t−α)p =ep
As shown above, the larger an input vector p, the larger its effect on the weight vector w. Thus, if an input vector is
much larger than other input vectors, the smaller input vectors must be presented many times to have an effect.

The solution is to normalize the rule so that the effect of each input vector on the weights is of the same magnitude:
T
Δw=(t−α) =e T
p p p p

The normalized perceptron rule is implemented with the function learnpn, which is called exactly like learnp.
The normalized perceptron rule function learnpn takes slightly more time to execute, but reduces the number of
epochs considerably if there are outlier input vectors. You might try demop5 to see how this normalized training rule
works.

You might also like