ML Unit 2
ML Unit 2
............
Input xm x2 x1
Processing ∑
∑= X1+X2 + ….+Xm =y
Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm ..... w2 w1
weights
Output y
How do ANNs work?
............
xm x2 x1
Input
wm ..... w2 w1
weights
Processing ∑
Transfer Function (Activation
f(vk)
Function)
Output y
The output is a function of the input, that is affected by the
weights, and the transfer functions
Artificial Neural Networks
An ANN can:
1. compute any computable function, by the appropriate selection of the network topology
and weights values.
2. learn from experience!
Specifically, by trial‐and‐error
Learning by trial‐and‐error
Continuous process of:
Trial:
Processing an input to produce an output (In terms of ANN: Compute the output function of a given
input)
Evaluate:
Evaluating this output by comparing the actual output with the expected output.
Adjust:
Adjust the weights.
Types of ANN:
A prototypical example of ANN learning is provided by Pomerleau's (1993) system
ALVINN, which uses a learned ANN to steer an autonomous vehicle driving at normal
speeds on public highways.
The input to the neural network is a 30 x 32 grid of pixel intensities obtained from a
forward-pointed camera mounted on the vehicle.
The network output is the direction in which the vehicle is steered.
The ANN is trained to mimic the observed steering commands of a human driving the
vehicle for approximately 5 minutes.
ALVINN has used its learned networks to successfully drive at speeds up to 70 miles per
hour and for distances of 90 miles on public highways.
Figure 4.1: Neural network
learning to steer an autonomous
vehicle. The ALVINN system
uses BACKPROPAGATION to
learn to steer an autonomous
vehicle (photo at top) driving at
speeds up to 70 miles per hour.
The network is shown on the left side of the figure, with the input camera image depicted below it.
Each node (i.e., circle) in the network diagram corresponds to the output of a single network unit,
and the lines entering the node from below are its inputs.
There are four units that receive inputs directly from all of the 30 x 32 pixels in the image. These
are called "hidden" units because their output is available only within the network and is not
available as part of the global network output.
Each of these four hidden units computes a single real-valued output based on a weighted
combination of its 960 inputs.
These hidden unit outputs are then used as inputs to a second layer of 30 "output" units. Each output
unit corresponds to a particular steering direction, and the output values of these units determine
which steering direction is recommended most strongly
The diagrams on the right side of the figure depict the learned weight values associated with one of
the four hidden units in this ANN.
The large matrix of black and white boxes on the lower right depicts the weights from the 30 x 32
pixel inputs into the hidden unit.
Here, a white box indicates a positive weight, a black box a negative weight, and the size of the box
indicates the weight magnitude.
The BACKPROPAGATION algorithm is the most commonly used ANN
learning technique. It is appropriate for problems with the following
characteristics:
1. Instances are represented by many attribute-value pairs.
2. The target function output may be discrete-valued, real-valued, or a vector of
several real- or discrete-valued attributes.
3. The training examples may contain errors or missing Values.
4. Long Training Times are Acceptable.
5. Fast evaluation of the learned target function may be required.
6. The ability of humans to understand the learned target function is not important.
Here t is the target output for the
current training example,
o is the output generated by the
perceptron, and
n is a positive constant called the
learning rate.
The role of the learning rate is to
moderate the degree to which weights
are changed at each step
VISUALIZING THE HYPOTHESIS SPACE
td = Constant
Single perceptrons can only express linear decision surfaces.
In contrast, the kind of multilayer networks learned by the Backpropagation algorithm are capable of expressing a
rich variety of nonlinear decision surfaces.
Multiple layers of cascaded linear units still produce only linear functions, and we prefer networks capable of
representing highly nonlinear functions.
What we need is a unit whose output is a nonlinear function of its inputs, but whose output is also a
differentiable function of its inputs.
One solution is the sigmoid unit-a unit very much like a perceptron, but based on a smoothed, differentiable
threshold function.
The sigmoid unit is illustrated in Figure 4.6. Like the perceptron, the sigmoid unit first computes a linear
combination of its inputs, then applies a threshold to the result.
Random Weights can be range
of -1 to +1.
Calculate Error less than 5%
Assign Random Numbers
between 0 and 1
Convergence and Local Minima – All the Neurons are interconnected with each
other & Minimization of Errors within Network.
Representational Power of Feedforward Networks – Boolean function,
Continuous functions, Arbitrary functions.
Hypothesis Space Search and Inductive Bias
Hidden Layer Representations
Generalization, Overfitting, and Stopping Criterion.
Task:
The learning task here involves classifying camera images of faces of various people in
various poses.
Images of 20 different people were collected, including approximately 32 images per
person, varying the person's expression (happy, sad, angry, neutral), the direction in which
they were looking (left, right, straight ahead, up), and whether or not they were wearing
sunglasses.
There is also variation in the background behind the person, the clothing worn by the
person, and the position of the person's face within the image.
In total, 624 greyscale images were collected, each with a resolution of 120 x 128, with each
image pixel described by a greyscale intensity value between 0 (black) and 255 (white).
Design Choices:
After training on a set of 260 images, classification accuracy over a separate test set is 90%.
In contrast, the default accuracy achieved by randomly guessing one of the four possible
face directions is 25%.
Input encoding:
Given that the ANN input is to be some representation of the image, one key
design choice is how to encode this image.
For example, we could preprocess the image to extract edges, regions of
uniform intensity, or other local image features, then input these features to the
network
Output encoding:
ANN must output one of four values indicating the direction in which the
person is looking (left, right, up, or straight).
We could encode this four-way classification using a single output unit,
assigning outputs of, say, 0.2,0.4,0.6, and 0.8 to encode these four possible
values.
Bias: If S is training set, errorS(h) is optimistically biased
Bias in the estimate: The observed accuracy of the learned hypothesis over the training
examples is a poor estimator of its accuracy over future examples ==> we test the hypothesis
on a test set chosen independently of the training set and the hypothesis.
Variance in the estimate: Even with a separate test set, the measured accuracy can vary from
the true accuracy, depending on the makeup of the particular set of test examples. The
smaller the test set, the greater the expected variance.
When evaluating a learned hypothesis we are most often interested in estimating the
accuracy with which it will classify future instances.
At the same time, we would like to know the probable error in this accuracy estimate.
The target function f : X ->{0,1) classifies each person Yes or No
Eg. Total 60
Sample :30
P(X)=H/T (or) 0/1
The general process includes the following steps:
1. Identify the underlying population parameter p to be estimated, for example,
error D (h).
2. Define the estimator Y (e.g., errors(h)). It is desirable to choose a minimum
variance, unbiased estimator.
3. Determine the probability distribution DY that governs the estimator Y, including
its mean and variance.
4. Determine the N% confidence interval by finding thresholds L and U such that
N% of the mass in the probability distribution DY falls between L and U
Test h1 on sample S1, test h2 on S2
1. Pick parameter to estimate
2. Choose an estimator
4. Find interval (L, U) such that N% of probability mass falls in the interval (L,U –
Thresholds)