0% found this document useful (0 votes)
15 views

Module - 3 AAI

A1

Uploaded by

Nagarjuna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Module - 3 AAI

A1

Uploaded by

Nagarjuna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

MODULE 3

DEPT. OF AIML
• Artificial intelligence (AI) is the simulation of human intelligence in
machines that are programmed to think and act like humans. Learning,
reasoning, problem-solving, perception, and language comprehension are
all examples of cognitive abilities.
ARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKS

• ANN stands for Artificial Neural


Networks. Basically, it’s a
computational model.
• That is based on structures and
functions of biological neural
networks.
ARTIFICIAL NEURAL NETWORKS
• An artificial neural network (ANN) is used as a methodology for information processing and the method got
its inspiration from biological nervous systems.
• These systems consist of innumerable highly interconnected neurons working together to solve different
problems.

Use of artificial neural networks


• Adaptive Learning: Neural networks have the ability to do tasks after learning from experience gained from
previous data.
• 2. Self-Organization: ANN is capable of organizing and representing the information it receives from training
data.
• 3. Real-Time Operation: ANN computations can be carried out in parallel. Special hardware devices can be
designed and manufactured so that we can take advantage of this capability.
• 4. Fault Tolerance: If a neuron fails to work, the performance of the network will not stop, but it will give less
accurate results.
EVOLUTION OF NEURAL NETWORKS
NEURONS, PART OF HUMAN BRAIN
• That was composed of 86 billion nerve
cells. Also, connected to other thousands
of cells by Axons. Although, there are
various inputs from sensory organs.
That was accepted by dendrites.

• As a result, it creates electric impulses.


That is used to travel through the
Artificial neural network. Thus, to handle
the different issues, neuron send a
message to another neuron.
BASICS OF ARTIFICIAL NEURAL NETWORKS

The models of ANN are specified by the three


basic entities:
• 1. Model’s synaptic connections.
• 2. Training/learning rules adopted for adjusting
weights.
• 3. Activation functions.
NETWORK ARCHITECTURE
(TYPES OF ARTIFICIAL NEURAL NETWORKS)
Single Layer Feed-forward Network Multilayer Feed-forward Network
TYPES OF ARTIFICIAL NEURAL NETWORKS

• FeedBack ANN
RECURRENT NETWORK
• Recurrent network is a feedback network with a
closed loop.
• This type of network can be a single layer network or
multilayer network.
• In a single layer network with a feedback connection,
a processing element’s output can be directed back to
itself or to another processing element or to both.
• When the feedback is directed back to the hidden
layers it forms a multilayer recurrent network.
• In addition, a processing element output can be
directed back to itself and to other processing
elements in the same layer.
LEARNING

• Supervised Learning:In supervised learning, each


input vector is associated with the output which is desired. The input
vector and the corresponding output vector results in a training pair.
Here, the network knows what should be the output.
• During training, the input vector is given to the network to produce
an output. This output is the actual output. Then this actual output
is checked whether it is same as the desired output.
• The block diagram of supervised learning algorithm is shown in Fig.
6.6.
• The difference between the actual and desired output is considered
as the error signal and is generated by the network.
• This error signal can be used to adjust the weights of the network
layers so that for all training pair the actual output becomes the
desired output.
• Unsupervised Learning: In
unsupervised learning, the inputs of a similar
category are grouped together without the help of
anytraining.
• The network clubs together the similar input
patterns to form clusters in the training process.
• When a new input is applied, the network gives an
output response indicating the class to which it
belongs.
• If an input does not belong to any cluster, a new
cluster is formed. This is shown in Fig. 6.7.
• No feedback from the environment to decide if the
output is correct. Here the network discovers its
own patterns by changing its parameters. This is
termed as self-organization.
• Reinforcement Learning:Reinforcement
learning is similar to supervised learning in that information
is available.
• However, in case of reinforcement learning, only critical
information is available. The exact information needs to be
obtained from this critical information.
• The process of extracting real information from critical
information is termed as reinforcement learning (Fig. 6.8).
ACTIVATION FUNCTION
• The Activation function is applied over the net input to calculate the output of
the ANN.
• The neuron as a processing node performs the operation of summation of its
weighted inputs, or the scalar product computation to obtain the net.
• Subsequently, it performs the nonlinear operation f(net) through its activation
function.
• Typical activation functions used are bipolar activation functions and unipolar
activation functions.
TYPES:
• Bipolar Activation Functions
• Unipolar Activation Functions
• Identity Function
• Ramp Function
BIPOLAR ACTIVATION FUNCTIONS
UNIPOLAR ACTIVATION FUNCTIONS
IDENTITY FUNCTION

The output here remains the same as


input. The input layer uses the identity
activation function.
RAMP FUNCTION
MCCULLOCH–PITTS NEURON MODEL
IMPLEMENT AND FUNCTION USING MCCULLOCH–
PITTS NEURON.
As already mentioned, only analysis can be performed in the
McCulloch–Pitts model. Hence, let us
assume the weights w1 = 1 and w2 = 1. The network architecture is
shown in Fig. 6.15.
SOLVE:
IMPLEMENT USING MCCULLOCH–PITTS NEURON (USE BINARY DATA
REPRESENTATION).

Implement ANDNOT function Implement XOR function using


PROBLEMS FOR NEURAL NETWORK LEARNING
ALVINN (AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK)

• Neural network learning is used to steer an autonomous vehicle.


• ALVINN (Autonomous Land Vehicle In a Neural Network) was a self-driving
car project that used a neural network to control a vehicle's steering:
• The ALVINN system uses BACKPROPAGATION to learn to steer an
autonomous vehicle.
• A forward-mounted camera is mapped to 960 neural network inputs, which are
fed forward to 4 hidden units, connected to 30 output units.
• The BACKPROPAGATION algorithm is the most commonly used ANN
learning technique.
BIOLOGICALLY-INSPIRED NEURAL NETWORKS
FOR SELF-DRIVING CARS
It Is Appropriate For Problems With The Following Characteristics:
•Many Features (Attribute-Value Pairs): ANNs work well when data points are described by many features, like
the pixels in an image. These features can be related to each other or completely independent.

•Flexible Output: ANNs can predict single or multiple outputs, which could be categories (e.g., yes/no), continuous
values (e.g., temperature), or a mix of both. For example, it can predict both steering directions and acceleration in a
self-driving car.

•Handling Errors in Data: ANNs can learn effectively even if the training data has mistakes or noise, making them
robust to imperfect data.

•Long Training Times: Training ANNs can take longer than simpler models, ranging from seconds to hours, but this
time is necessary for learning complex patterns in data.

•Fast Prediction: Once trained, ANNs can quickly make predictions on new data. For instance, in real-time
applications like autonomous driving, they can update decisions several times per second.

•Not Easily Interpreted: The internal workings of ANNs (weights and connections) are hard to understand for
humans, unlike simpler models that can be explained more clearly.
PERCEPTRON
• Perceptron is considered as a single-layer neural
network that consists of four main parameters
named input values (Input nodes), weights and
Bias, net sum, and an activation function.
• The perceptron model begins with the
multiplication of all input values and their
weights, then adds these values together to create
the weighted sum.
• Then this weighted sum is applied to the
activation function 'f' to obtain the desired
output.
• This activation function is also known as the step
function and is represented by 'f'.
WORKING OF PERCEPTRON
One type of ANN system is based on a unit called a perceptron
REPRESENTATIONAL POWER OF PERCEPTRON
PERCEPTRON FOR BOOLEAN
Not gate: NOT(x) is a 1-variable function,
that means that we will have one input at a
time: N=1. Also, it is a logical function, and
so both the input and the output have only
two possible states: 0 and 1
AND LOGICAL FUNCTION

• The AND logical function is a 2-variables


function, AND(x1, x2), with binary inputs and
output.
• This graph is associated with the following
computation:
ŷ = ϴ(w1*x1 + w2*x2 + b)
IMPLEMENTATION

• Output;
OR LOGICAL FUNCTION

• OR(x1, x2) is a 2-variables function too, and its


output is 1-dimensional (i.e., one number) and
has two possible states (0 or 1).
W1 = 1, W2 = 1,
B = -0.5

Output;
XOR — ALL (PERCEPTRONS) FOR ONE (LOGICAL FUNCTION)
PERCEPTRON TRAINING ALGORITHM

• Understanding Weights (wi) : yes/no or 1/0


• Updating Weights:
• wi​←wi​+Δwi​
Calculating the Weight Change (Δwi):
• •The formula for the weight change is:
Example
IDENTIFY THE ANIMAL USING
ANN

Initial Setup

Training Data
Cat example: Ear size = 3, Tail length = 5, Target output = 0 (representing "cat").
Dog example: Ear size = 8, Tail length = 12, Target output = 1 (representing "dog").
Initial Prediction

Checking the Error:


error=target−output

Ex : 0-1=-1
Updating the Weights
Repeat the Process
• After adjusting the weights, the perceptron tries again on new examples, and
over time, it makes fewer errors.
• For example, after training on many images of cats and dogs, the perceptron
learns to predict accurately whether the input is a cat or a dog based on ear size
and tail length.
GRADIENT DESCENT AND DELTA RULE
VISUALIZING THE HYPOTHESIS SPACE
DERIVATION OF DELTA RULE
HOW TO CALCULATE THE DIRECTION OF STEEPEST DESCENT ALONG
THE ERROR SURFACE?
TRAINING RULE
Finally,
MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

A multilayer network is a type of neural network where information moves through layers
of interconnected nodes (neurons). It consists of an input layer, one or more hidden layers,
and an output layer. These networks can solve complex problems because the hidden
layers learn to capture intricate patterns in data.
MLP networks are used for supervised learning format. A typical learning algorithm for
MLP networks is also called back propagation's algorithm.
BACKPROPAGATION ALGORITHM
The backpropagation algorithm is a method used to train multilayer networks.
Here's how it works:
1. Data is fed into the input layer and passed forward through the network.
2. The network makes a prediction at the output layer.
3. If the prediction is wrong, the backpropagation algorithm calculates the error and
sends it backward through the network.
4. The network adjusts its weights (connections between neurons) to reduce the error.
Through repeated training, the network gets better at making accurate predictions.
DIFFERENTIABLE THRESHOLD UNIT
• A differentiable threshold unit (like a neuron with a sigmoid or softmax
activation function) helps the network learn by producing smooth outputs that
can change gradually, rather than hard "yes/no" decisions. For instance, instead
of deciding "yes" or "no," it outputs a value between 0 and 1,or -1 and +1
which indicates confidence in its prediction.
A DIFFERENTIABLE THRESHOLD UNIT (SIGMOID UNIT)

• Sigmoid unit-a unit very much like a perceptron, but based on a smoothed differentiable threshold
function.
• The sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result
and the threshold output is a continuous function of its input.
• More precisely, the sigmoid unit computes its output O as
THE BACKPROPAGATION
ALGORITHM
ADDING MOMENTUM
LEARNING IN ARBITRARY ACYCLIC NETWORKS
DERIVATION OF THE BACKPROPAGATION
RULE
CONTINUATION…
CONTINUATION…
REMARKS ON THE BACKPROPAGATION
ALGORITHM
• Convergence and Local Minima
• Representational Power of Feedforward Networks
• Hypothesis Space Search and Inductive Bias
• Hidden Layer Representations
• Generalization, Overfitting, and Stopping Criterion
Convergence and Local Minima

Backpropagation over multilayer networks is only guaranteed to converge


towards some local minimum in E and not necessarily to the global minimum
error.
REPRESENTATIONAL POWER OF FEEDFORWARD
NETWORKS
HYPOTHESIS SPACE SEARCH AND INDUCTIVE BIAS

• The hypothesis space is the n-dimensional Euclidean space of the n network


weights. Notice this hypothesis space is continuous, in contrast to the
hypothesis spaces of decision tree learning and other methods based on
discrete representations.
• It is difficult to characterize precisely the inductive bias of Backpropagation
learning, because it depends on the interplay between the gradient descent
search and the way in which the weight space spans the space of
representable functions.
HIDDEN LAYER REPRESENTATIONS
One intriguing property of Backpropagation is its ability to discover useful
intermediate representations at the hidden unit layers inside the network. Because
training examples constrain only the network inputs and outputs, the weight-
tuning procedure is free to set weights that define whatever hidden unit
representation is most effective at minimizing the squared error E.
GENERALIZATION

Generalization refers to the model’s ability to perform well on new, unseen data
after being trained on a specific dataset. The goal of training a neural network is
to create a model that accurately predicts outcomes for examples it has never
encountered. If a model has good generalization, it will make reliable predictions
not only on the training data but also on data from the real world.
OVERFITTING:
In backpropagation, overfitting happens during later stages of training when the
model starts to adapt to specific details in the training data that do not generalize
well to new data. The error on the training set continues to decrease, but the error
on a separate validation set starts to rise, signaling overfitting.
STOPPING CRITERION
• The stopping criterion is the rule used to determine when to stop the training
process to avoid overfitting. A common approach in backpropagation is to use
early stopping, where training is halted when the model’s performance on a
validation set starts to degrade.
• The network is trained until the error on the validation set reaches its
minimum, after which training is stopped to avoid fitting noise or unnecessary
details in the training data.
GENETIC ALGORITHM IN MACHINE
LEARNING
• A genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature.“
• Proposed by “John Holland”
• It is used to solve optimization problems in machine learning.
• It is one of the important algorithms as it helps solve complex problems that
would take a long time to solve.
GENETIC ALGORITHM
• Genetic algorithms simulate the process of natural selection which means those species
that can adapt to changes in their environment can survive and reproduce and go to the
next generation.
• In simple words, they simulate “survival of the fittest” among individuals of
consecutive generations to solve a problem.
• Each generation consists of a population of individuals and each individual represents
a point in search space and possible solution.
• Each individual is represented as a string of character/integer/float/bits. This string is
analogous to the Chromosome.
GENETIC ALGORITHM
Genetic algorithms are based on an analogy with the
genetic structure and behavior of chromosomes of the
population. Following is the foundation of GAs based
on this analogy –

• Individuals in the population compete for resources


and mate
• Those individuals who are successful (fittest) then
mate to create more offspring than others
• Genes from the “fittest” parent propagate throughout
the generation, that is sometimes parents create
offspring which is better than either parent.
• Thus each successive generation is more suited for
their environment.
OPERATORS OF GENETIC ALGORITHMS
Once the initial generation is created, the algorithm evolves the generation using following operators –
1) Selection Operator: The idea is to give preference to the individuals with good fitness scores and allow
them to pass their genes to successive generations.
2) Crossover Operator: This represents mating between individuals. Two individuals are selected using
selection operator and crossover sites are chosen randomly. Then the genes at these crossover sites are
exchanged thus creating a completely new individual (offspring). For example –
3) Mutation Operator: The key idea is to insert random genes in offspring to
maintain the diversity in the population to avoid premature convergence. For
example –
SUMMARIZED ALGORITHM
PROBLEM

Phenotypes are:13,24,8,19
GENETIC PROGRAMMING
• Genetic programming (GP) is a form of evolutionary computation in which the individuals in the evolving
population are computer programs rather than bit strings.

• Koza (1992) describes the basic genetic programming approach and presents a broad range of simple
programs that can be successfully learned by GP.
Crossover operation applied to two parent program trees (top). Crossover points (nodes
shown in bold at top) are chosen at random. The subtrees rooted at these crossover
points are then exchanged to create children trees (bottom).
EXAMPLE

A block-stacking problem. The task for GP is to discover a program that can transform an arbitrary
initial configuration of blocks into a stack that spells the word "universal." A set of 166 such initial
configurations was provided to evaluate fitness of candidate programs (after Koza 1992).
On the table (from left to right):
On the stack (from bottom to top):
• Block with the letter v
• Block with the letter r
• Block with the letter u
• Block with the letter s
• Block with the letter l
• Block with the letter e
• Block with the letter a
• Block with the letter n
• Block with the letter i
Representation (How GP sees the problem):
• CS (Current Stack): This tells the system what block is currently at the top of the stack. If there's no
stack, it returns F (False).
• TB (Top Correct Block): This refers to the topmost block that is in the correct order on the stack. For
example, if you've correctly stacked the blocks up to "U", this will return "U".
• NN (Next Necessary Block): This refers to the next block that needs to be stacked to correctly spell the
word. For example, after "U", "N" is the next necessary block.

Available Actions (Primitive Functions):


The GP system can use several basic actions to manipulate the blocks:
•(MS x) (Move to Stack): If block x is on the table, it moves it to the top of the stack.
•(MT x) (Move to Table): If block x is at the top of the stack, it moves it to the table.
•(EQ x y) (Equal): Checks if two things are equal and returns True or False.
•(NOT x): Returns True if x is False, and False if x is True.
•(DU x y) (Do Until): Repeats action x until y becomes True.
The GP system starts by randomly creating 300 programs (which are combinations of these functions and terms). It then
tests these programs by seeing how well they can stack blocks to spell "universal" in different random configurations.

• The blocks need to be stacked in the following order: Top to bottom: l → a → i → n → v → e → r → s → u


• The GP program will evolve a solution that optimally moves the blocks into this configuration.
REMARKS ON GENETIC PROGRAMMING (GP)
Extension of Genetic Algorithms:
Genetic programming (GP) extends genetic algorithms by evolving entire computer programs.GP
searches through an enormous hypothesis space but has demonstrated successful applications in
various fields.

Applications of GP:
While the earlier block-stacking example is simple, Koza et al. (1996) have applied GP to more
complex problems, such as:
• Designing electronic filter circuits
• Classifying segments of protein molecules
Electronic Filter Circuit Design:
•A challenging problem where GP evolves programs to transform a seed circuit into a final
design.
•Primitive functions in GP edit circuits by adding/deleting components and wiring.
•Circuit fitness is calculated using the SPICE circuit simulator by comparing the designed
circuit’s output with the desired output across 101 input frequencies.

Population and Selection Process:


•The GP system maintained a large population (640,000).
•Selection contributed to 10% of new individuals, crossover to 89%, and mutation to 1%.
•The system ran on a 64-node parallel processor.
Improvements Over Generations:
•Initially, 98% of the circuits couldn’t even be simulated.
•This figure improved to 84.9% after one generation, 75% after two generations, and
stabilized at 9.6%.
•Fitness of the best circuit in the population improved from a score of 159 to 0.8 over
137 generations.
•The final circuit exhibited behavior very close to the desired output.

Importance of Representation and Fitness Function:


•The performance of GP is highly dependent on the chosen representation and fitness
function.
•Active research is focusing on automatically discovering and incorporating
subroutines to improve the set of primitive functions, which would allow the GP
system to adapt its primitives dynamically (Koza, 1994).
MODELS OF EVOLUTION AND LEARNING
Lamarckian Evolution Theory:
• The Lamarckian theory states the characteristic individual acquire during their lifetime pass them to
their children.
• According to Lamarck’s theory, learning is an important part of the evolution of species(or for our
purpose in the Evolutionary algorithm).
BALDWIN EFFECT:

• Baldwin proposed that individual learning can explain evolutionary


phenomena that appear to require Lamarckian inheritance of acquired
characteristics.
• The ability of individuals to learn can guide the evolutionary process. In
effect, learning smooths the fitness landscape, thus facilitating evolution.
It focuses on
• Genotype : Genetic code(DNA),Works on Global search phenomena.
• phenotype : Individual characteristics (Behaviour), Works on Local
search phenomena.
• It measures “Cost of Learning ” in terms of “Time and Energy ”

You might also like