Week 4
Week 4
Developed by Finish Prof Teuvo Kohonan in 1980’s. This network is also known as
Topology Preserving maps. The name Topology preserving is provided since the location or
position of the node varies in the stating time of training procedure and once the network
learned the given input pattern the topology or the location of neural nodes are fixed.
Each node is provided with a weight vector which is nothing but the position of that
node in the input space or Map. Job of training is to adjust this weight vector so the distance in
the map reduces. The weight moves towards the input. Thus, from a higher dimension the map
reduces to a 2 Dimension. This is the Dimensionality Reduction Process. After training, SOM
can classify the input by selecting a nearest node (small distance) with closest weight vector to
the input space vector
2
This transformation is performed in an orderly manner. SOM uses only two-
dimensional discretized input space known as Maps for its operation. Instead of error correction
learning SOM uses Competitive / Winner Takes All learning is utilized in SOM.
Step 1: Initialize the Weights Wij. Initialize the learning rate and topological neighbourhood
parameters
Step 2: While stop condition is false Do steps 3 to 9
Step 3: For each input Vector x, do steps 4 to 6
Step 4: For each j calculate D(j) = Σ (Wij – xi)2
Step 5: Find the index ‘j’ for which D(j) is Minimum
Step 6: For all units of j within a specific neighbourhood of j and for all i
Wij (new) = Wij (old) + α [ xi – Wij (old)]
Step 7: Update Learning Rate
Step 8: Reduce the radius of Topological Neighbourhood at specific time periods
Step 9: Test for stop condition
• Initialization
• Competition
For the given each inputs patterns, the neurons calculate a discriminant function(Here
we use Euclidean Distance function).This Discriminant function acts as a basis for the
3
competition among neurons. The neuron with smaller Distance value is selected as wining
neuron(Winner Takes All law)
• Cooperation
The wining neuron determines the spatial locations of the excited neurons in that
neighbourhood (Topological map). Thus, a cooperation between the neurons is established by
the wining neuron in that rearranged neighbourhood
• Adaptation
The wining neurons by adjusting its weight values, tries to minimize the discriminant
function (distance value) between them and the inputs. When similar inputs are provided the
response of the wining neuron is enhance in a better way
• Easy to interpret
• Dimensionality Reduction
• Capable of handling different types of classification problems
• Can cluster large complex input set’s
• SOM training Time is less
• Simple algorithm
• Easy to implement
Demerits
It does not build a generative model for the data, i.e, the model does not understand
how data is created.
It does not behave so gently when using categorical data, even worse for mixed types
data.
The time for preparing model is slow, hard to train against slowly evolving data
• Character Recognition
• Speech Recognition
4
• Texture Recognition
• Image Clustering
• Data Clustering
• Classification problems
• Dimensionality reduction applications
• Seismic analysis
• Failure Analysis etc
5
2.2.1 Algorithm
Step 1: Initialize the weight vectors to the ‘m’ Training vectors, where ‘m’ is the number of
different classification/Cluster. Start learning rate α near zero (small value)
Step 2: While Stop condition id false do steps 3 to 6
Step 3: For each input training vector X, do steps 4 to 5
Step 4: Find J such that D(J) is minimum
Step 5: Update the weights of Jth neural unit as given below
IF T = Cj then
Wj(new) = Wj(old) + α [x -wj(old)] [ Move the weight vector W towards the input X]
If T is Not Equal to C then
Wj(new) = Wj(old) - α[x -wj(old)] [ Move the weight vector W away from the input X]
Step 6: Reduce Learning Rate α
Step 7: Test for stop condition (Either a fixed number of Iteration reached or Learning rate
αhas reached a very minimum value
2.2.2 Merits
Demerits
6
o All input feature vectors are connected with the Middle-Hidden Layer
o Hidden nodes are connected into groups and Each group denotes a particular class ‘K’
o Each node present in Hidden Layer resembles to a Gaussian Function centered on its
Feature Vector for that Kth class
o All of these Gaussian function outputs of a group/class are fed to the Kth Output unit
o Hence, we have only ‘K’ Output units only
o PNN is closely related to PARZEN Window PDF Estimator or Mixed Gaussian
Estimator
o For any output node ‘K’, all Gaussian values (of the previous Hidden layer) for that
output class are summed up
o This summed up value is scaled to a Probability Density Function (PDF)
o If class 1 contains ‘p’ feature vectors and Class 2 contains ‘Q’ feature vectors, Then P
nodes are present in Hidden layer for the class 1 & Q nodes for class 2 is present
o The equations for Gaussian functions for any input is given as
2.3.1. Algorithm
7
Figure 3: General Probabilistic Neural Network Architecture Diagram
8
Cascade correlation addresses both issues of slow rate of convergence and fixation of
nodes while training by dynamically adding hidden units to the architecture-but only the
minimum number necessary to achieve the specified error tolerance for the training set.
Furthermore, a two-step weight-training process ensures that only one layer of weights is being
trained at any time.
A cascade correlation net consists of input units, hidden units, and output units. Input
units are connected directly to output units with adjustable weighted connections.
Connections from inputs to a hidden unit are trained when the hidden unit is added to
the net and are then frozen. Connections from the hidden units to the output units are adjustable.
Cascade correlation starts with a minimal network, consisting only of the required input and
output units (and a bias input that is always equal to 1). This net is trained until no further
improvement is obtained; the error for each output unit is then computed (summed over all
training patterns).
Next, one hidden unit is added to the net in a two-step process. During the first step, a
candidate unit is connected to each of the input units, but is not connected to the output units.
The weights on the connections from the input units to the candidate unit are adjusted to
maximize the correlation between the candidate's output and the residual error at the output
units. The residual error is the difference between the target and the computed output,
multiplied by the derivative of the output unit's activation function, i.e., the quantity that would
be propagated back from the output units in the backpropagation algorithm. When this training
is completed, the weights are frozen and the candidate unit becomes a hidden unit in the net.
The second step in which the new unit is added to the net now commences. The new
hidden unit is connected to the output units, the weights on the connections being adjustable.
Now all connections to the output units are trained. (The connections from the input units are
trained again, and the new connections from the hidden unit are trained for the first time.)
9
Figure 4. Schematic Representation of Cascade Corelation Network
2.4.1. Merits
2.4.2. Applications
o General Regression Neural Networks [GRNN’s] was proposed by D.F. Specht in 1991
o GRNN is a Single pass learning Network
10
o General Regression Neural Networks uses Gaussian Activation function for its Hidden
Layers
o GRNN is based on Function Approximation or Function estimation procedures
o Output is estimated using weighted average of the outputs of training dataset, where the
weight is calculated using the Euclidean distance between the training data and test
data
o If the distance is large then the weight will be very less and if the distance is small more
weight is given to the output
o Contains 4 layers: (1) Input layer (2) Hidden (pattern) Layer (3) Summation Layer (4)
Output (division) Layer
o GRNN’s Estimator is given by the equation
Where x = input
xi = Training sample
Y(xi) = Output for sample I
di2 = Euclidean Distance
e-(di2∕2σ2) = Activation Function – This value is taken as weight value
σ = Spread constant (only Unknown parameter)
Select σ when MSE is Minimum
Used to calculate optimum value of σ. First divide the samples into two parts. One part
is used to train and the other is used to Test the network. Apply GRNN to Test data based on
Training data & calculate MSE for different σ . Select the Minimum MSE and its
Corresponding σ. The architecture Diagram of GRNN is given in Figure 5.
11
Figure 5. General Regression Neural Network
Consider the characters given in figure 6. Now the objective is to recognise a particular
alphabet, say ‘A’ in this example. Using Image analysis models the particular alphabet is
segmented and converted into Intensity or Gray scale or Pixel values. The general work flow
is shown in Figure 7. The first procedure is segmentation. Segmentation is the process of
Subdividing the images into sub blocks. So alphabet “A” is isolated by using appropriate
segmentation procedures like thresholding or region Growing or Edge detector based
algorithms.
12
Figure 6. Input to Character recognition system
13
Figure 8b. Character Pattern conversion into
Figure 8a. Character Pattern values
intensity
Figures 6,7,8 are adapted from Praveen Kumar et al. (2012), “Character Recognition
using Neural Network”, vol3 ,issue 2., .Pp 978- 981, IJST
For figure 8.b Texture Features, Shape Features and or Boundary features etc can be
extracted. This feature values are known as exemplars which is the actual input into the
neural network. Consider any neural network. The input is the feature table created as
explained in the above process, which is shown in Figure 9. This table is provided as in put to
the neural system
Figure 9 Adopted from Yusuf Perwej et al. (2011), “Neural Networks for Handwritten
English Alphabet Recognition”, International Journal of Computer Applications (0975 –
8887) Volume 20– No.7, April 2011.
14
Figure 10. ANN implementation of character recognition system
Figure 10: Adopted from Anita pal et al. (2010), “Handwritten English Character
Recognition Using Neural Network”, International Journal of Computer Science &
Communication, Vol. 1, No. 2, July-December 2010, pp. 141-144.
If the feature sets matches between the trained and current input features the output
produces “1” , which denotes that the particular alphabet is trained else “0” not recognised.
REFERENCE BOOKS
15