Neural Networks And Their
Statistical Application
By Clint Hagen
Statistics Senior Seminar 2006
Outline
What Neural Networks are and why they
are desirable
How the process works and appropriate
statistical applications
Basic Architectures and algorithms
Applications
Drawbacks and limitations
Demonstration using “NeuroShell 2”
The original analyst
What are they?
Computer algorithms designed to mimic
human brain function
Set of simple computational units which
are highly interconnected
Human Brain Function
Neural Network Function
Some Similarities
Why Neural Networks are desirable
Human brain can generalize from abstract
Recognize patterns in the presence of
noise
Recall memories
Make decisions for current problems
based on prior experience
Why Desirable in Statistics
Prediction of future events based on past
experience
Able to classify to nearest pattern in
memory, doesn’t have to be exact
Predict latent variables that are not easily
measured
Non-linear regression problems
What are Neural Networks?
The computational ability of a
digital computer combined with
the desirable functions of the
human brain.
How the Process Works
Terminology, when to use neural
networks and why they are used
in statistical applications.
Terminology
Input: Explanatory variables also referred
to as “predictors”.
Neuron: Individual units in the hidden
layer(s) of a neural network.
Output: Response variables also called
“predictions”.
Hidden Layers: Layers between input and
output that an apply activation function.
Terminology
Weights: Result (parameters) of an
objective function (usually sum of squares
error) used while training a network.
Backpropagation: Most popular training
method for neural networks.
Network training: To find values of network
parameters (weights) for performing a
particular task.
Terminology
Patterns: Set of predictors with their actual
output used in training the network
When to use neural networks
Use for huge data sets (i.e. 50 predictors
and 15,000 observations) with unknown
distributions
Smaller data sets with outliers as neural
networks are very resistant to outliers
Why Neural Networks in Statistics?
The methodology is seen as a new
paradigm for data analysis where models
are not explicitly stated but rather implicitly
defined by the network.
Advanced pattern recognition capabilities
Allows for analysis where traditional
methods might be extremely tedious or
nearly impossible to interpret.
Basic Architectures
Feed Forward
Feed-forward method trained using
backpropagation (backpropagation
network) is used in time series prediction
problems most often. It is the most
commonly used algorithm.
We will see this algorithm in more detail
soon
Adaline Network
Pattern recognition network
Essentially a single layer backpropagation
network
Only recognizes exact training patterns
Hopfield Model
The Hopfield model is used as an auto-
associative memory to store and recall a
set of bitmap images.
Associative recall of images, given
incomplete or corrupted version of a
stored image the network can recall the
original
Boltzmann Machine
The Boltzmann machine is a stochastic
version of the Hopfield model.
Used for optimization problems such as
the classic traveling salesman problem
Note
Those are only a few of the more
common network structures.
Advanced users can build
networks designed for a particular
problem in many software
packages readily available on the
market today.
Feed Forward Network Trained
Using Backpropagation
Structure
Input Input Input Input
One-way only
Can have multiple
hidden layers
Each layer can have Hidden Hidden Hidden Hidden
Layer Layer Layer Layer
independent number
of neurons
Each layer fully
connected to the next
layer. Output Output Output Output
Feed-Forward Design
Alternate Structure
Predictor 1 Predictor 2 Predictor 3
Neuron i Neuron j
Wik
Wjl
Neuron k Neuron l
Output t
Weights
Each connection (arrow) in the previous
diagram has a weight, also called the
synaptic weight
The function of these weights is to reduce
error between desired output and actual
output
Weights
Weights are adjustable
Weight Wij is interpreted as the strength of
the connection between the jth unit and the
ith unit
Weights are computed in opposite
direction as the networks runs
Netinput ij = ∑ wij * outputj + µi
µi is a threshold for neuron i
Threshold
Each neuron takes its net input and
applies an activation function to it
The output of the jth neuron (activation
value) is g(∑ wij * xi) where g(·) is the
activation function and xi is the output of
the ith unit connected to j
If the net input exceeds the threshold the
neuron will “fire”
Activation Function
The only practical requirement for an
activation function is that it be
differentiable
Sigmoid function is commonly used
g(netinput) = 1/(1+ exp-(netinput))
Or a simple binary threshold unit
Ө(netinput) = {1 ,if netinput ≥ 0 ; 0 ,
otherwise}
Backpropagation
The backpropagation algorithm is a
method to find weights for a multilayered
feed forward network.
It has been shown that a feed forward
network trained using backpropagation
with sufficient number of units can
approximate any continuous function to
any level of accuracy
Training the Network
Neural Networks must be first trained
before being used to analyze new data
Process entails running patterns through
the network until the network has “learned”
the model to apply to future data
Can take a long time for noisy data
Usually doesn’t converge with desired
output, but an acceptable value close to
desired can be achieved
New Data
Once the network is trained new data can
be run through it
The network will classify new data based
on the previous data it trained with
If an exact match can not be found it will
match with the closest found in memory
Regression and Neural Networks
Objective of regression problem is to find
coefficients that minimize sum of errors
To find coefficients we must have a
dataset that includes the independent
variable and associated values of the
dependent variable. (very similar to
training the network)
Equivalent to a single layer feed forward
network
Regression
Independent variables correspond to
predictors
Coefficients β correspond to weights
The activation function is the identity
function
To find weights in a neural network we use
backpropagation and a cost function
Difference in Neural Networks
The difference in the two approaches is
that multiple linear regression has a closed
form solution for the coefficients, while
neural networks use an iterative process.
In regression models a functional form is
imposed on the data
In the case of multiple linear regression
this assumption is that the outcome is
related a linear combination of the
independent variables.
If this assumption is not correct, it will lead
to error in the prediction
An alternate approach is not to assume
any functional relationship between the
independent variables (predictors) and let
the data define the functional form.
This is the basis of the power of the neural
networks
This is very useful when you have no idea
of the functional relationship between the
dependent and independent variables
If you had an idea, you’d be better off
using a regression model
Drawbacks and Limitations
Neural Networks can be extremely hard to
use
The programs are filled with settings you
must input and a small error will cause
your predictions to have error also
The results can be very hard to interpret
as well
Drawbacks and Limitations
Neural networks should not be used when
traditional methods are appropriate
Since they are data dependent
performance will improve as sample size
increases
Regression performs better when theory
or experience indicates an underlying
relationship
A short demonstration using
“NeuroShell 2”