0% found this document useful (0 votes)

175 views16 pages

Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering

The document summarizes the Least Mean Square (LMS) algorithm and the backpropagation algorithm for training multilayer feedforward neural networks. LMS is a stochastic gradient descent algorithm that iteratively adjusts weights to minimize the mean squared error between the actual and desired output. Backpropagation uses gradient descent and the chain rule to calculate weight updates that reduce the error for each training pattern by propagating errors backward from the output to hidden layers.

Uploaded by

terrorindarkness

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

175 views16 pages

Least Mean Square (LMS) Algorithm: 3.1 Spatial Filtering

Uploaded by

terrorindarkness

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

3.

Least Mean Square (LMS) Algorithm

3.1 Spatial Filtering

uses single linear neuron and can be understood as adaptive filtering

y = ∑k wkxk for k = 1 to p

error e = d − y where d = desired value

1 2
cost function = mean squared error = J = e
2

-1 w0 = θ
w1 output y
x1 ∑
.....

wp
xn
J

3.2 Steepest descent

∂J//∂wk = 0 to determine optimum weight

adjust weights iteratively and move gradient

= ∂J//∂w
along the error surface towards the
Jmin
optimum value

wk(n+1) = wk(n) − η (∂∂J(n)//∂wk) w0 single

weight
i.e. updated value is proportional to
negative of the gradient of the error surface
∴ wk(n+1) = wk(n) + η e(n) xk(n)

18
Properties of LMS:

• a stochastic gradient algorithm in that the gradient vector is ‘random’ in

contrast to steepest descent

• on average improves in accuracy for increasing values of n.

• reduces storage requirement to information present in its current set of

weights, and can operate in a nonstationary environment.

3.2.1 Convergence ( proof not given)

• in the mean if weight vector → optimum value as n → ∞, requires:

0 < η < 2//λmax λmax is max eigenvalue of autocorrelation matrix Rx

Rx = E[x xT]
• in the mean square if mean-square of error signal → constant as n → ∞,
requires:

0 < η < 2//tr[Rx] where tr[Rx] = ∑k λk ≥ λmax

• Faster convergence is usually obtained by making η a function of n,

for example η(n) = c//n for some constant c

19
4. Multilayer Feedforward Perceptron Training

4.1 Back-propagation Algorithm

• .....
.....

.....

neuron j
dj
yi

wij uj
ej
from previous ϕ(••) -1
layer

Let wji be weight connected from neuron i to neuron j

error signal: ej(n) = dj(n) − yj(n)

net internal sum: υj(n) = ∑iwji(n)yi(n) for i = 0 to p

output: yj(n) = ϕj(υ

υj(n))

1
Instantaneous sum of squared errors: (n) = ∑ e2(n) over all j in o/p layer

2 j

20
1
For N patterns, average squared error: = ∑ (n) for n = 1 to N

av
N n

• Learning goal is to minimise av by adjusting weights, but instead of the

estimate (n) is used on a pattern-by-pattern basis

From the ∂ (n) = ∂ (n) ∂ej(n) ∂yj(n) ∂υj(n)

chain rule: ∂wji(n) ∂ej(n) ∂yj(n) ∂υj(n) ∂wji(n)

∴ weight correction: ∆ wji(n) = − η ∂ (n) i.e. steepest descent

∂ wji(n)

= η δj (n) yi(n) where δj (n) = - ∂ (n) / ∂υj(n)

Case 1: Output node, local gradient easily calculated

Case 2: hidden node more complex, need to consider neuron j feeding neuron k,
where inmputs to neuron j are yi

δj(n) = − ∂ (n) ϕj′(υ

υj(n)) = −∑k ek ∂ek(n) ϕj′(υ
υj(n))

∂yj(n) ∂yj(n)

∴ δj(n) = − ϕj′(υ
υj(n)) ∑k ek(n) ∂ek(n) ∂υk(n) = − ϕj′(υ
υj(n)) ∑k δk(n) wkj(n)
∂υk(n) ∂yj(n)

• Thus δj(n) is computed in terms of δk(n) which is closer to the output. After
calculating the network output in a forward pass, the error is computed and
recursively back-propagated through the network in a backward pass.

×(local gradient)×
weight correction = (learning rate)× ×(i/p signal neuron)
∆ wji(n) = η δj(n) yi(n)

21
4.2 Back-propagation training

Activation function:
yj(n) = ϕj(uj(n)) = 1
−uj(n))
1 + exp(−

∂yj(n) = ϕj′(uj(n)) = exp(−−uj(n)) = yj(n) [ 1 − yj(n)]

∂uj(n) −uj(n))]
[1 + exp(− 2

Note that max value of ϕj′(υ

υj(n)) occurs at yj(n) = 0.5 and
min value of 0 occurs at yj(n) = 0 or 1

Momentum term: + α ∆ wji(n − 1) 0 ≤ |α

α| < 1
helps locate more desirable local minimum in complex error surface
example error surface

no change in error sign ⇒ ∆ wji(n)

increases and descent is accelerated

changes in error sign ⇒ ∆ wji(n) decreases

and stabilises oscillations

large enough α can stop process

terminating in shallow local minima
single weight

with momentum, η can be larger

22
4.3 Other perspectives for improving generalisation

4.3.1 Pattern vs Batch Mode

Choice depends on particular problem:

• randomly updating weights after each pattern requires very little storage and
leads to a stochastic search which is less likely to get stuck in local minima

• updating after presentation of all training samples (an epoch) provides a more
accurate estimate of the gradient vector since it is based on the average
squared error

4.3.2 Stopping criteria

e.g. gradient vector threshold and/or change in average squared error per epoch

4.3.3 Initialisation

• default is uniform distribution inside a small range of values

• too large values can lead to premature saturation (neuron outputs close to
limits) which gives small weight adjustments even though error is large

4.3.4 Training Set Size

worst-case formula N > W//ε where:
N = no. of examples, W = no. of synaptic weights,
ε = fraction of errors permitted on test

4.3.5 Cross-Validation
• measures generalisation on test set
• various parameters including no. of hidden nodes, learning rate and
training set size can be set based on cross-validation performance

23
4.3.6 Network Pruning by complexity regularisation

(two possibilities: network growing and network pruning)

goal is to find weight vector that minimises R(w) = s(w) +λ c(w)

where s(w) is standared error measure e.g. mean square error

λ is the regularisation parameter

c(w) is the complexity penalty that depends on the network e.g. ||w||2

• regularisation term allows identification of weights having insignificant effect

4.3.7 Other ways of minimising cost function

• Back-propagation uses a relatively simple, quick approach to minimising cost

function by obtaining an instantaneous estimate of the gradient

• methods and techniques from nonlinear optimum filtering and nonlinear

function optimisation have been used to provide more sophisticated approach
to minimising the cost function e.g. Kalman filtering, conjugate-gradient
method

4.4 Universal Approximation Theorem

single hidden layer with suitable ϕ gets arbitrarily close to any continuous
function
• logistic function satisfies ϕ(⋅⋅) definition
• single hidden layer sufficient, but no clue on synthesis
• single hidden layer is restrictive in that hierarchical features not supported

24
4.5 Example of learning XOR Problem

Decision Boundaries

x1 x2 target x2
0 0 0 neuron a
0 1 1 1 out
1 0 1 =1
1 1 0
out
=0

0
x1
-1 1
1.5 x2
1 neuron b
x•
a -2 out
1 1 1
1 c
1 out
0.5
• 1 b -1 =1
x2
0.5 out
-1 =0
0
1 x1

x1 x2 a b target x2
0 0 0 0 0 neuron c
0 1 0 1 1 1
out
1 0 0 1 1 =0
1 1 1 1 0 out
=1
out
=0
0 1 x1

25
4.6 Example: vehicle navigation

sharp sharp
left right

....................... 45 output units

fully connected

............. 9 hidden units

fully connected

video input retina

network computes steering angle

training examples from human driver

obstacles detected by laser range finder

26
5. Associative Memories

5.1 linear associative memory

stimulus ak response bk
bk1 ak = ak1
bk = ak1 • w11 1 bk1
bk2 ak2 w12
. . w13
. .
bkp akp ak2 • 2 bk2

.....

.....
 w11 ( k ) w12 ( k ) ... w1 p ( k )
 w ( k ) w ( k ) ... w ( k ) akp • p bkp
W(k) =  
22 22 2p

 

 
 w p1 ( k ) w p 2 ( k ) ... w pp ( k )

response bk = W(k) ak

Design of weight matrix for storing q pattern associations ak bk

estimate of weight matrix = ∑k bk akT for k = 1 to q

(Hebbian learning principle)

bk1 [ak1, ak2, ...,akp]

where bk akT is the outer product = bk2
.
.
bkp

27
Pattern recall:
For recall of a stimulus pattern aj: b = W aj = ∑k (akTaj) bk
assuming that key patterns have been normalised, akTaj = 1
b = bj + vj where vj = ∑k (akTaj)bk for k = 1 to q, k ≠ j
i.e. vj results from interference from all other stimulus patterns
∴ (akTaj) = 0 for j ≠k → perfect recall (orthonormal patterns)

Main features:

distributed memory

auto- and hetero-associative

content addressable and resistant to noise and damage

interaction between stored patterns may lead to error on recall

The max. no. of patterns reliably stored is p, the dimension of input space
which is also the rank (no. of independent columns or rows) of W

For an auto-associative memory ideally W ak = ak showing that

stimulus patterns are eigenvectors of W with all unity eigenvalues

Example: a1 = [1 0 0 0]T, a2 = [0 1 0 0]T, a3 = [0 0 1 0]T

b1 = [5 1 0]T, b2 = [-2 1 6]T, b3 = [-2 4 3]T

memory weight matrix = 5 -2 -2 0 giving perfect recall

1 1 4 0 since stimulus patterns
0 6 3 0 are orthonormal

noisy stimulus e.g. [0.8 -0.15 0.15 -0.2]T gives [4 1.25 -0.45]T

which is closer to b1 than b2 or b3

28
6. Radial Basis Functions

6.1 Separability of patterns

Separability theorem (Cover) states that if mapping ϕ(x) is nonlinear and hidden-
unit space is high relative to input space then it is more likely to be non-separable

1
x1 • ϕ1 w0
w1

x2 • ϕ2 w21
.....
.....

.....

wp
xp • ϕp

Example of RBF is a Gaussian

ϕ(x) = exp(−
−||x − t||2)

t = centre of Gaussian

output neuron is linear weighted sum

ϕ(x) is nonlinear and hidden-unit space [ϕ

ϕ1(x), ϕ2(x),..., ϕp(x)] is usually
high dimension relative to input space and more likely to be separable

a difficult nonlinear optimisation problem has been converted to a linear

optimisation problem that can be solved by LMS algorithm

if a different RBF is centred on each training pattern, then the training

set can be learned perfectly

29
6.2 Example: XOR

use two hidden Gaussian functions

ϕ1(x) = exp(−
−||x − t1||2), t1 = [1,1]T

ϕ2(x) = exp(−
−||x − t2||2), t2 = [0,0]T
ϕ2(x)
pattern
1 • (1,1)

decision
patterns boundary
(0,1) (1,0)
• pattern
(0,0)
•

0
1 ϕ1(x)

x1 x2 ϕ1(x) ϕ2(x)
0 0 e-√√2 1
0 1 e-1 e-1
1 0 e-1 e-1
1 1 1 e-√√2

30
6.3 Ill-posed Hypersurface Reconstruction

Inverse problem of finding unknown mapping F from domain X and range Y is

well-posed if:
1. for every x ∈ X there exists y ∈ Y (existence)
2. for every pair of inputs x, t ∈ X, F(x) = F(t) iff x = t (uniqueness)
3. mapping is continuous (continuity) X Y

x F(x)

Learning is ill-posed because of sparsity of information & noise in training set

Regularisation Theory for solving ill-posed problems (Tikhonov) uses a

modified cost functional, that includes a complexity term:

(F) = s(F) +λ c(F)

where s(F) is the standard error term and c(F) is the regularising term

one regularised solution is given by a linear superposition of

multivariate Gaussian basis functions
• one regularised is given by a linear superposition of multivariate Gaussian
basis functions, with centres xi and widths σi

F(x) = ∑i wi exp( − ||x − xi||2 ) for i = 1 to N

σi2
2σ

practical ways of regularising:

reduce number of RBFs

change σ of the RBFs

choose position of centres

6.4 RBF Networks vs. MLP

• Single vs possibly multiple hidden layers

• common computation nodes vs. fundamentally different in hidden & o/p

layers

• all layers usually nonlinear vs. nonlinear hidden but linear output

• computation of inner product of i/p vector & weight vector vs. Euclidean
norm between i/p vector and centre of appropriate unit

• global approximation and therefore good at extrapolation vs. local

approximation with fast learning but poor extrapolation

6.5 Learning Strategies

variety of possibilities since a nonlinear optimisation strategy for hidden layer is

combined with linear optimisation strategy in output layer. For the hidden layer
the main choice involves how the centres are learned:

• Fixed Centres selected at random, e.g. choose Gaussian exp(−

− M d-2 ||x − ti||2)
where M = no. of centres and d = distance between them

• Self-organised selection centres, e.g. k-n-n or self-organising NN

• Supervised Selection of centres, e.g. error-correction learning 2with suitable

cost function using modified gradient descent

32
6.6 Example: curve fitting

−2)(2x+1)(1+x2)-1 from 15 noise-free examples

RBF for approximating (x−

15 Gaussian hidden units with same σ

Three designs are generated for σ = 0.5, σ = 1.0, σ = 1.5

−8, 12]
output shown for 200 inputs uniformly sampled in the range [−

x x σ = 1.0
x
x x
x
x
x σ = 0.5
x
x
x
x
x
x
σ = 1.5
x

best compromise is σ = 1.0

Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
ANN - Ch2-Adaline and Madaline
100% (1)
ANN - Ch2-Adaline and Madaline
29 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
39 pages
Seminar Artificial Neural Network 24 9
No ratings yet
Seminar Artificial Neural Network 24 9
39 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #8: Associative Memory and Hopfield Networks
9 pages
Module 1 DL
No ratings yet
Module 1 DL
84 pages
Multi-Layer Perceptron & Backpropagation
No ratings yet
Multi-Layer Perceptron & Backpropagation
88 pages
6.1-Fundamentals of Artificial Neural Networks
No ratings yet
6.1-Fundamentals of Artificial Neural Networks
12 pages
Back Propagation ALGORITHM
No ratings yet
Back Propagation ALGORITHM
11 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Chapters 1-4
No ratings yet
Chapters 1-4
6 pages
ANN - Ch2-Adaline and Madaline
No ratings yet
ANN - Ch2-Adaline and Madaline
27 pages
NN Ch3
No ratings yet
NN Ch3
34 pages
Learning
No ratings yet
Learning
34 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
Unit 3
No ratings yet
Unit 3
110 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Back-Propagation: Learning by Example
No ratings yet
Back-Propagation: Learning by Example
7 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Neural Networks Course Overview
No ratings yet
Neural Networks Course Overview
72 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Chapter 7
No ratings yet
Chapter 7
68 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Hebbian Learning and Associative Memory
No ratings yet
Hebbian Learning and Associative Memory
13 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Neural Network Backpropagation Guide
No ratings yet
Neural Network Backpropagation Guide
9 pages
Perceptrons
No ratings yet
Perceptrons
11 pages
Question 105A
No ratings yet
Question 105A
33 pages
CSC 323-06 Artificial Neural Network
No ratings yet
CSC 323-06 Artificial Neural Network
29 pages
XOR Problem & Two-Layer Perceptron
No ratings yet
XOR Problem & Two-Layer Perceptron
74 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
Backpropagation in MLP: A Detailed Guide
No ratings yet
Backpropagation in MLP: A Detailed Guide
34 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Introduction To Predictive Learning
No ratings yet
Introduction To Predictive Learning
101 pages
Neural Network
No ratings yet
Neural Network
97 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Unit 2
No ratings yet
Unit 2
37 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
No ratings yet
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
24 pages
AML M2 Neural Networks
No ratings yet
AML M2 Neural Networks
52 pages
Unit 3
No ratings yet
Unit 3
39 pages
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
No ratings yet
Neural Networks For Machine Learning: Lecture 3a Learning The Weights of A Linear Neuron
34 pages
NN Ch3
No ratings yet
NN Ch3
36 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Final PPT DataMining
No ratings yet
Final PPT DataMining
64 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Multi-Layer Perceptrons Guide
No ratings yet
Multi-Layer Perceptrons Guide
20 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Ch. 1-4 Assignments: Will Need Calculator Program
No ratings yet
Ch. 1-4 Assignments: Will Need Calculator Program
9 pages
Cosmological Perturbations Study
No ratings yet
Cosmological Perturbations Study
24 pages
Adaptive Neural Oscillator Using Continuous-Time Back-Propagation Learning
No ratings yet
Adaptive Neural Oscillator Using Continuous-Time Back-Propagation Learning
11 pages
EGB424 Assig23
No ratings yet
EGB424 Assig23
3 pages
Ap12 Physics C Mechanics q3
No ratings yet
Ap12 Physics C Mechanics q3
11 pages
CE382 Ch11 13
No ratings yet
CE382 Ch11 13
36 pages
Earthquake Engineering Project
No ratings yet
Earthquake Engineering Project
8 pages
Spe 8206 Pa PDF
No ratings yet
Spe 8206 Pa PDF
12 pages
10 Pepsin On Egg White
No ratings yet
10 Pepsin On Egg White
3 pages
1.2 Displacement
No ratings yet
1.2 Displacement
7 pages
Department of Chemical Engineering: Assignment 1 (Solving of Matrices) How To Solve A Matrix Equation
No ratings yet
Department of Chemical Engineering: Assignment 1 (Solving of Matrices) How To Solve A Matrix Equation
4 pages
EC8451 Electromagnetic Fields
100% (1)
EC8451 Electromagnetic Fields
13 pages
Mars Dosha and Marriag Detailed Study
No ratings yet
Mars Dosha and Marriag Detailed Study
12 pages
Pursuit Magazine, No 27-37 Combined
100% (1)
Pursuit Magazine, No 27-37 Combined
332 pages
Set Theory & Probability Cheat Sheet
No ratings yet
Set Theory & Probability Cheat Sheet
2 pages
Heating Ventilation and Air Conditioning
0% (1)
Heating Ventilation and Air Conditioning
25 pages
Geotechnical Survey Bids for IIIT-D
No ratings yet
Geotechnical Survey Bids for IIIT-D
21 pages
Effect of Coagulation Mechanism On Membrane Permeability in Coagulation-Assisted Microfiltration For Spent Filter Backwash Water Recycling PDF
No ratings yet
Effect of Coagulation Mechanism On Membrane Permeability in Coagulation-Assisted Microfiltration For Spent Filter Backwash Water Recycling PDF
7 pages
Soil Permeability Testing Guide
No ratings yet
Soil Permeability Testing Guide
12 pages
Calibration of Temperature Measuring Devices
No ratings yet
Calibration of Temperature Measuring Devices
7 pages
Advanced Fluid Thermodynamics
No ratings yet
Advanced Fluid Thermodynamics
11 pages
Composite Breakwater Design, 3-14-00
No ratings yet
Composite Breakwater Design, 3-14-00
14 pages
Artículo - Introduction To Symmetry Methods (P.E. Hydon)
No ratings yet
Artículo - Introduction To Symmetry Methods (P.E. Hydon)
20 pages
B.Tech. I /II Semester (Common To All Branches) Met-103 Manufacturing Processes
No ratings yet
B.Tech. I /II Semester (Common To All Branches) Met-103 Manufacturing Processes
144 pages
Numerical Analysis of Shallow Tunnels in Soft Ground Using Plaxis2d
No ratings yet
Numerical Analysis of Shallow Tunnels in Soft Ground Using Plaxis2d
6 pages
Polyaryletherketones (PAEK) Overview
100% (1)
Polyaryletherketones (PAEK) Overview
9 pages
Ammonia Piping Handbook PDF
100% (1)
Ammonia Piping Handbook PDF
55 pages
Iso 18436-2 Vibration Analysis Cat - III Topics Published 1546436736 PDF
No ratings yet
Iso 18436-2 Vibration Analysis Cat - III Topics Published 1546436736 PDF
2 pages