0% found this document useful (0 votes)
2 views29 pages

unit1

The document provides an overview of Soft Computing, focusing on mathematical models of neurons, artificial neural network (ANN) architecture, learning rules, and training algorithms. It covers various neuron models such as perceptrons, sigmoid, and ReLU, as well as the structure and functioning of ANNs, including forward propagation and backpropagation. Additionally, it discusses learning paradigms like supervised, unsupervised, and reinforcement learning, along with the gradient descent algorithm for optimizing neural network training.

Uploaded by

thegauravvv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views29 pages

unit1

The document provides an overview of Soft Computing, focusing on mathematical models of neurons, artificial neural network (ANN) architecture, learning rules, and training algorithms. It covers various neuron models such as perceptrons, sigmoid, and ReLU, as well as the structure and functioning of ANNs, including forward propagation and backpropagation. Additionally, it discusses learning paradigms like supervised, unsupervised, and reinforcement learning, along with the gradient descent algorithm for optimizing neural network training.

Uploaded by

thegauravvv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Soft Computing Unit-1

Contents

1 Unit - 1 2
1.1 Mathematical Models of Neurons . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Perceptron Model (McCulloch–Pitts Neuron) . . . . . . . . . . . . . . 2
1.1.2 Sigmoid Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 ReLU Neuron (Rectified Linear Unit) . . . . . . . . . . . . . . . . . . 3
1.1.4 Leaky ReLU, Tanh, Softmax . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Summary of Neuron Models . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Artificial Neural Network (ANN) Architecture . . . . . . . . . . . . . . . . . 3
1.2.1 Components of ANN Architecture . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Backpropagation (Learning Phase) . . . . . . . . . . . . . . . . . . . 4
1.2.4 ANN Structure Example . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.5 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.6 Summary Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Learning Rules in Artificial Neural Networks . . . . . . . . . . . . . . . . . . 5
1.4 Gradient Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Learning Paradigms – Supervised, Unsupervised, and Reinforcement Learning 8
1.6 ANN Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8 Training Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Delta Training Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.10 Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11 Multilayer Perceptron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.12 Hopfield Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13 Associative Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.14 Applications of Artificial Neural Networks . . . . . . . . . . . . . . . . . . . 26

1
Chapter 1

Unit - 1

1.1 Mathematical Models of Neurons


1.1.1 Perceptron Model (McCulloch–Pitts Neuron)
The simplest model of a neuron, introduced by McCulloch and Pitts (1943), later formalized
as a perceptron.
Formula: !
Xn
y=f w i xi + b
i=1

• xi : Input features
• wi : Weights
• b: Bias (threshold)
• f : Activation function (step, sigmoid, ReLU, etc.)
• y: Output
Step Activation Function:
(
1 if z ≥ 0
f (z) =
0 if z < 0

1.1.2 Sigmoid Neuron


Uses a sigmoid activation function to handle continuous outputs.
1
f (z) =
1 + e−z
• Smooth and differentiable
• Output in the range (0, 1)
• Used in logistic regression and early neural nets

2
1.1.3 ReLU Neuron (Rectified Linear Unit)
Used in deep learning due to better convergence properties.

f (z) = max(0, z)

• Output is 0 for negative input, linear for positive

• Avoids vanishing gradient (unlike sigmoid)

1.1.4 Leaky ReLU, Tanh, Softmax


Other activation functions lead to variations in neuron models:

Leaky ReLU:
(
z if z > 0
f (z) =
0.01z otherwise

Tanh:
ez − e−z
f (z) = tanh(z) = range: (−1, 1)
ez + e−z

Softmax (for multi-class classification):


e zi
f (zi ) = P zj for each class i
je

1.1.5 Summary of Neuron Models


Model Activation Output Range Used In
Perceptron Step {0, 1} Binary classification
Sigmoid Neuron Sigmoid (0, 1) Logistic models
ReLU ReLU [0, ∞) Deep networks
Tanh Tanh (–1, 1) Deep learning
Softmax Softmax [0, 1], sum = 1 Multi-class output

1.2 Artificial Neural Network (ANN) Architecture


An Artificial Neural Network (ANN) is inspired by the structure of the biological
nervous system. It consists of layers of interconnected processing units (neurons), each
performing simple computations.

3
1.2.1 Components of ANN Architecture
• Input Layer: Receives raw data. Each neuron corresponds to an input variable. No
computation is performed here.

• Hidden Layer(s): One or more intermediate layers. Each neuron performs a weighted
sum and applies an activation function. Multiple hidden layers form a Deep Neural
Network (DNN).

• Output Layer: Produces the final output (e.g., class label, probability). The number
of neurons depends on the task.

1.2.2 Forward Propagation


The process of computing outputs from inputs:

z (l) = W (l) a(l−1) + b(l)


a(l) = f (z (l) )
Where:

• W (l) : Weight matrix at layer l

• b(l) : Bias vector

• a(l) : Activation output of layer l

• f : Activation function (e.g., ReLU, sigmoid)

1.2.3 Backpropagation (Learning Phase)


This phase adjusts weights based on the error between predicted and actual output. It uses:

• Chain rule of calculus to compute gradients

• Gradient descent (or its variants) to update weights

1.2.4 ANN Structure Example


Input Layer Hidden Layer(s) Output Layer
[x1] [ h1 ] [ y1 ]
[x2] ---> W + b -->[ h2 ] ---> W + b ---> [ y2 ]
[x3] [ h3 ] [ y3 ]

4
1.2.5 Activation Functions
Name Formula Range
Sigmoid 1+e1−z (0, 1)
ez −e−z
Tanh ez +e−z
(-1, 1)
ReLU max(0, z) [0, ∞)
z
Softmax Pe ei zj (0, 1), sum = 1
j

1.2.6 Summary Table


Layer Role Example Size
Input Layer Accepts raw data 3 (e.g., x1 , x2 , x3 )
Hidden Layer Learns features/representations 4 neurons (configurable)
Output Layer Produces prediction 1 (binary) or n (multi-class)

1.3 Learning Rules in Artificial Neural Networks


Learning rules are algorithms or mathematical formulas that update the weights of the
connections between neurons based on the input data and error. The aim is to minimize the
output error and improve the network’s performance over time.

Hebbian Learning Rule


Hebbian learning is based on the principle: ”Neurons that fire together, wire together.”

∆wij = η · xi · yj
Where:

• ∆wij : Change in weight between neuron i and j

• η: Learning rate

• xi : Input from neuron i

• yj : Output of neuron j

Perceptron Learning Rule


Used in single-layer perceptrons for linearly separable data.

∆wi = η(t − o)xi


Where:

• t: Target output

5
• o: Actual output

• xi : Input

• η: Learning rate

Delta Rule (Widrow-Hoff Rule)


Used in gradient descent to minimize error.
∂E
∆wi = −η
∂wi
Where:

• E: Error function (typically Mean Squared Error)

• ∂E
∂wi
: Partial derivative of error with respect to weight

Backpropagation Learning Rule


Extension of the delta rule for multi-layer networks. It uses the chain rule to propagate
errors backward from the output to hidden layers.
Weight update:
∂E
∆wij = −η
∂wij
This learning rule is essential in deep learning and modern ANNs.

Summary

Rule Key Idea Use Case


Hebbian Co-activation of neurons Biological basis, unsupervised learning
Perceptron Error correction in classification Binary classification
Delta Rule Gradient descent Linear regression, simple networks
Backpropagation Error backpropagation through layers Deep learning

1.4 Gradient Descent Algorithm


Gradient Descent is an optimization algorithm used to minimize the error or cost function
in machine learning models by iteratively adjusting the model parameters (weights).

6
Objective
Minimize the cost function: m
1 X
J(θ) = L(y (i) , ŷ (i) )
m i=1
Where:

• J(θ): Cost or loss function

• L: Loss between true label y and prediction ŷ

• m: Number of training examples

• θ: Parameters (e.g., weights)

Update Rule
Each parameter θj is updated using the partial derivative of the cost function:

∂J(θ)
θj := θj − η
∂θj
Where:

• η: Learning rate (a small positive number)

• ∂J
∂θj
: Gradient of the cost function with respect to parameter θj

Algorithm Steps
1. Initialize parameters θ randomly or to zero

2. Repeat until convergence:

• Compute gradient ∇J(θ)


• Update: θ := θ − η · ∇J(θ)

3. Return optimized parameters θ

Types of Gradient Descent


• Batch Gradient Descent: Uses the entire dataset for each update.

• Stochastic Gradient Descent (SGD): Updates weights using one sample at a time.

• Mini-batch Gradient Descent: A compromise — uses a small batch of data.

7
Remarks
• The learning rate η affects convergence speed and stability.
• Too large η may cause overshooting; too small slows down learning.
• Advanced optimizers (e.g., Adam, RMSprop) build on gradient descent.

1.5 Learning Paradigms – Supervised, Unsupervised,


and Reinforcement Learning
Machine learning is broadly categorized into three main learning paradigms based on the na-
ture of the feedback signal and data availability: supervised learning, unsupervised learning,
and reinforcement learning.

Supervised Learning
In supervised learning, the model is trained on a labeled dataset, which means each training
example includes both input data and the correct output.
• Input: Features x
• Output: Labels y
• Goal: Learn a mapping f : x → y
Examples:
• Classification (e.g., spam detection)
• Regression (e.g., predicting house prices)
Loss Function: Measures the difference between predicted and actual output (e.g.,
Mean Squared Error, Cross Entropy).

Unsupervised Learning
In unsupervised learning, the model works with input data only, without any labeled outputs.
The goal is to discover hidden patterns or structures in the data.
• Input: Features x
• Output: No labels
• Goal: Find patterns, groupings, or reduce dimensionality
Examples:
• Clustering (e.g., K-means, hierarchical clustering)
• Dimensionality Reduction (e.g., PCA, t-SNE)
Applications: Customer segmentation, anomaly detection, market basket analysis

8
Reinforcement Learning
Reinforcement learning (RL) involves training an agent to make a sequence of decisions by
interacting with an environment. The agent receives feedback in the form of rewards or
penalties.

• Agent: Learner or decision-maker

• Environment: What the agent interacts with

• State: Current situation of the agent

• Action: Decision made by the agent

• Reward: Feedback signal for performance

Goal: Learn a policy that maximizes cumulative reward over time.


Examples:

• Game playing (e.g., AlphaGo)

• Robotics

• Dynamic resource allocation

Learning Algorithms: Q-learning, Deep Q-Networks (DQN), Policy Gradient Methods

1.6 ANN Training Algorithms


Training algorithms in Artificial Neural Networks (ANNs) are designed to optimize the
weights and biases to minimize the error or cost function. These algorithms adjust the
model parameters to improve its performance on a given task.

Overview of Training in ANN


In general, training an ANN involves the following steps:

• Initialization: Randomly initialize weights and biases.

• Forward Propagation: Compute the output of the network based on the current
weights and inputs.

• Error Calculation: Calculate the error or loss between the predicted output and the
actual target output.

• Backpropagation: Adjust the weights and biases to minimize the error using opti-
mization techniques.

• Iterate: Repeat the process for multiple epochs until the network converges.

9
Gradient Descent Algorithm
Gradient Descent is the most widely used optimization technique for training neural net-
works. It updates weights by moving in the direction of the negative gradient of the error
function with respect to the weights.

Gradient Descent Update Rule


∂E
wi := wi − η
∂wi
Where:
• wi : Weight at index i
• η: Learning rate (a small positive number)
• ∂E
∂wi
: Partial derivative of the error function with respect to weight wi

Types of Gradient Descent


• Batch Gradient Descent: Computes the gradient using the entire dataset and up-
dates weights after processing the whole dataset.
• Stochastic Gradient Descent (SGD): Computes the gradient using a single train-
ing example and updates the weights after each sample.
• Mini-batch Gradient Descent: Combines the advantages of both batch and stochas-
tic gradient descent by updating weights using small batches of data.

Backpropagation Algorithm
The Backpropagation algorithm is a widely used supervised learning technique to train multi-
layer neural networks. It involves two main phases: forward propagation and backward
propagation.

Forward Propagation
During forward propagation, the inputs are passed through the network, and the output is
calculated using the current weights. The output is then compared to the target to calculate
the error.
m
1X
E= (ti − oi )2
2 i=1
Where:
• E: Total error
• ti : Target output
• oi : Predicted output

10
Backward Propagation
Backward propagation uses the error computed in the forward pass to update the weights
and biases by applying the chain rule of differentiation.
∂E
∆wij = −η
∂wij
Where:

• ∆wij : Weight change between neuron i and neuron j

• η: Learning rate

• ∂E
∂wij
: Gradient of the error with respect to weight wij

Weight Update Rule


The weights are updated iteratively using the following rule:
∂E
wij := wij − η ·
∂wij

Perceptron Learning Rule


The Perceptron learning rule is used to train single-layer perceptrons. The rule is based on
adjusting the weights whenever the predicted output does not match the target.

∆wi = η(t − o)xi


Where:

• t: Target output

• o: Actual output

• xi : Input feature

• η: Learning rate

The Perceptron learning rule updates the weights in the direction that reduces the error.

Hebbian Learning Rule


Hebbian learning is an unsupervised learning rule based on the principle that ”neurons
that fire together, wire together.” It is typically used for associative memory tasks and
self-organizing networks.

∆wij = η · xi · yj
Where:

11
• ∆wij : Change in weight between neurons i and j

• η: Learning rate

• xi : Input from neuron i

• yj : Output from neuron j

This rule adjusts the weights based on the correlation between the activations of two
neurons.

Delta Rule (Widrow-Hoff Rule)


The Delta rule, also known as the Widrow-Hoff rule, is a gradient descent-based optimization
method used for training linear neural networks. It minimizes the Mean Squared Error (MSE)
between the target output and the actual output.
∂E
∆wi = −η
∂wi
Where:

• E: Error function (typically MSE)

• ∂E
∂wi
: Gradient of the error function with respect to the weight wi

• η: Learning rate

The Delta rule helps the network to adjust its weights by minimizing the error at each
step.

Learning Rate and Convergence


The learning rate η controls the step size of weight updates. If the learning rate is too large,
the weights may overshoot the optimal values, and if it is too small, convergence may be
very slow. An optimal learning rate ensures faster convergence while avoiding overshooting.

Summary of Training Algorithms


Algorithm Main Concept
Perceptron Binary classification based on error correction
Gradient Descent Optimization of weights using negative gradients
Backpropagation Multi-layer training with error propagation
Hebbian Learning Weight adjustment based on neuron correlation
Delta Rule Gradient descent for linear networks

12
1.7 Perceptrons
The perceptron is the most basic unit of a neural network and was first introduced by
Frank Rosenblatt in 1958. It is a supervised learning algorithm used for binary classification
tasks. The perceptron models a single neuron and makes decisions by weighing input signals,
applying an activation function, and producing an output.

Structure of a Perceptron
A perceptron consists of:

• A set of input features x1 , x2 , . . . , xn

• A corresponding set of weights w1 , w2 , . . . , wn

• A bias term b

• An activation function (typically a step function)

Mathematical Model
The output of the perceptron is given by:
n
!
X
y=f w i xi + b
i=1

Where:

• xi : Input feature

• wi : Weight associated with xi

• b: Bias

• f : Activation function (e.g., step function)

The activation function is usually defined as:


(
1, if z ≥ 0
f (z) =
0, otherwise

Perceptron Learning Rule


The perceptron learning rule is an iterative method for updating the weights based on the
prediction error:

wi := wi + η(t − y)xi

13
b := b + η(t − y)
Where:

• η: Learning rate (a small positive constant)

• t: Target output

• y: Actual output

• xi : Input feature

Training Algorithm
The training process for a perceptron involves the following steps:

1. Initialize weights and bias to small random values.

2. For each training sample:

(a) Compute the output using the current weights and bias.
(b) Update the weights and bias using the learning rule.

3. Repeat until convergence or maximum number of iterations is reached.

Limitations of Perceptrons
• Perceptrons can only solve linearly separable problems.

• They cannot model more complex relationships (e.g., XOR problem).

Applications
Despite their simplicity, perceptrons are foundational and have applications in:

• Binary classification tasks

• As building blocks in multilayer networks (e.g., MLPs)

• Pattern recognition and image classification (in early days)

1.8 Training Rules


Training rules in artificial neural networks define how the connection weights between neurons
are updated during the learning process. The objective is to adjust these weights so that
the network performs better on the given task, typically by minimizing a cost function.

14
Overview of Training Rules
Training rules are based on different learning paradigms (supervised, unsupervised, rein-
forcement). They dictate how weights should be modified based on the inputs, outputs, and
possibly the errors. Some of the most commonly used training rules include:

• Perceptron Learning Rule

• Delta Rule (Widrow-Hoff Rule)

• Hebbian Learning Rule

• Competitive Learning Rule

• Error-Correction Learning Rule

Perceptron Learning Rule


This rule is used in supervised learning for binary classification problems. It updates the
weights based on the error between the actual and target outputs.

∆wi = η(t − o)xi


Where:

• ∆wi : Change in weight

• η: Learning rate

• t: Target output

• o: Actual output

• xi : Input value

Delta Rule (Widrow-Hoff Rule)


This rule uses gradient descent to minimize the mean squared error between the actual and
target outputs. It is suitable for training linear units.

∆wi = η(t − o)xi


Unlike the perceptron rule, the delta rule works with continuous activation functions and
can be used to train multilayer networks when combined with backpropagation.

15
Hebbian Learning Rule
This rule is based on the biological principle: “Cells that fire together, wire together.” It is
typically used in unsupervised learning and associative memory.

∆wij = ηxi yj
Where:

• xi : Input from neuron i

• yj : Output from neuron j

Competitive Learning Rule


In competitive learning, neurons compete to become active. The winning neuron updates
its weights to be more like the input. This is useful in clustering and self-organizing maps.

∆wi = η(xi − wi )
Only the winning neuron (and possibly its neighbors) gets updated.

Error-Correction Learning Rule


This is a general framework for rules that use the error signal (difference between desired
and actual output) to adjust the weights.
∂E
∆wi = −η
∂wi
Where E is the error function. This rule underpins most supervised learning algorithms,
including the delta rule and backpropagation.

Comparison of Training Rules


Rule Type Key Concept
Perceptron Supervised Binary classification with threshold activation
Delta Supervised Gradient descent for continuous outputs
Hebbian Unsupervised Correlation-based weight strengthening
Competitive Unsupervised Winner-takes-all adaptation
Error-Correction Supervised General framework using error gradients

Summary
Training rules are essential for enabling neural networks to learn from data. The choice of
rule depends on the type of task (classification, clustering, regression), the architecture of
the network, and the learning paradigm. Advanced training algorithms like backpropagation
build upon these foundational rules to enable deep learning in multilayer networks.

16
1.9 Delta Training Rule
The Delta Rule, also known as the Widrow-Hoff rule, is a gradient descent-based learning rule
used for training artificial neural networks, particularly for linear units. It is an improvement
over the Perceptron Learning Rule and allows learning in cases where the decision boundary
is not linearly separable. The Delta Rule is mainly used in supervised learning.

Basic Idea
The goal of the Delta Rule is to minimize the error between the desired output and the
actual output of the neuron by adjusting the weights. It does this by moving the weight
vector in the direction of the negative gradient of the error function.

Mathematical Formulation
For a neuron with inputs x1 , x2 , . . . , xn , weights w1 , w2 , . . . , wn , and bias b, the net input is
given by:
n
X
net = w i xi + b
i=1
The actual output o is computed using an activation function f (net), typically a linear
or sigmoid function.
The error for a single training sample is:
1
E = (t − o)2
2
Where:
• t: Target output
• o: Actual output
The weight update rule derived using gradient descent is:

∆wi = η(t − o)xi


And the weights are updated as:

wi := wi + ∆wi

Key Characteristics
• It uses a continuous activation function.
• Suitable for training linear units.
• Can be extended to multilayer networks via backpropagation.
• Converges even when the data is not linearly separable.

17
Delta Rule vs. Perceptron Rule
• The Perceptron Rule uses a step activation function and updates weights only for
misclassified inputs.

• The Delta Rule uses a differentiable activation function and applies continuous updates
based on error magnitude.

Applications
• Linear regression models

• Single-layer neural networks

• Foundation for the Backpropagation algorithm in multilayer networks

Conclusion
The Delta Rule is a fundamental building block in neural network training, allowing for
gradual, data-driven updates to model parameters. Its continuous and differentiable nature
enables its integration into more complex training algorithms for deep learning.

1.10 Backpropagation Algorithm


The Backpropagation Algorithm is a supervised learning technique used for training multi-
layer feedforward neural networks. It is an extension of the delta rule to networks with one
or more hidden layers. The algorithm uses the method of gradient descent to minimize the
error between the actual and target outputs by propagating the error backward through the
network.

Overview
Backpropagation involves two main phases:
1. Forward Pass: Inputs are passed through the network to compute the output.

2. Backward Pass: The error is propagated backward from the output layer to the input
layer, and weights are adjusted using gradient descent.

Mathematical Representation
Given:
• Inputs: x1 , x2 , . . . , xn

• Weights: wij between layer i and layer j

• Activation Function: Sigmoid f (x) = 1


1+e−x

18
• Output: ok , Target: tk

The error function (Mean Squared Error for a single output neuron) is:
1X
E= (tk − ok )2
2 k

Weight Update Rule


For output layer neurons:

δk = (tk − ok )f ′ (netk )
∆wjk = η · δk · oj
For hidden layer neurons:
X
δj = f ′ (netj ) δk wjk
k

∆wij = η · δj · oi
Where:

• δk : Error term for output neuron k

• δj : Error term for hidden neuron j

• η: Learning rate

• oj , oi : Outputs from neurons in the previous layer

Algorithm Steps
1. Initialize all weights and biases with small random values.

2. For each training example:

(a) Perform a forward pass to compute the outputs.


(b) Compute the output error using the target value.
(c) Propagate the error backward and compute the delta terms.
(d) Update the weights and biases using the computed deltas.

3. Repeat for all training samples until the error is minimized or a stopping criterion is
met.

19
Advantages
• Allows training of multilayer networks.

• Enables the learning of non-linear mappings.

• Can be combined with techniques like momentum and adaptive learning rate.

Limitations
• Can get stuck in local minima.

• Requires differentiable activation functions.

• Slow convergence for deep networks without optimization techniques.

Applications
• Handwritten digit recognition

• Speech and image recognition

• Forecasting and time series prediction

• Function approximation

Conclusion
Backpropagation is a foundational algorithm in neural network training. It has enabled the
development of deep learning architectures by allowing networks to automatically adjust
weights in response to errors, leading to improved learning and performance.

1.11 Multilayer Perceptron Model


The Multilayer Perceptron (MLP) is a type of feedforward artificial neural network that
consists of multiple layers of neurons: an input layer, one or more hidden layers, and an
output layer. MLPs are capable of learning complex mappings between inputs and outputs
and are widely used in classification and regression tasks.

Architecture
An MLP typically consists of the following components:

• Input Layer: Receives input features. Each neuron represents one feature of the
input vector.

20
• Hidden Layer(s): Intermediate layers that learn internal representations of the data.
Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result
through a non-linear activation function.
• Output Layer: Produces the final output of the network, typically using an appro-
priate activation function depending on the task (e.g., softmax for classification).

Mathematical Model
Let x = [x1 , x2 , . . . , xn ] be the input vector, and W (l) be the weight matrix for layer l. The
forward propagation through the layers is defined as:

z (l) = W (l) a(l−1) + b(l)


a(l) = f (z (l) )
Where:
• z (l) : Net input to layer l
• a(l) : Output (activation) of layer l
• f : Activation function (e.g., sigmoid, ReLU)
• b(l) : Bias vector for layer l

Activation Functions
Common activation functions used in MLPs include:

• Sigmoid: f (x) = 1
1+e−x

• ReLU: f (x) = max(0, x)


• Tanh: f (x) = tanh(x)

Training the MLP


MLPs are trained using the Backpropagation algorithm combined with gradient descent. The
goal is to minimize a loss function (e.g., mean squared error or cross-entropy) by adjusting
the weights and biases.

1. Initialize weights and biases randomly.


2. Perform a forward pass to compute outputs.
3. Compute the error between predicted and actual outputs.
4. Backpropagate the error and update weights using gradient descent.
5. Repeat for a number of epochs or until convergence.

21
Advantages
• Capable of modeling non-linear relationships.

• Applicable to a wide variety of tasks (classification, regression, etc.).

• Supports multiple input and output variables.

Limitations
• Requires careful tuning of hyperparameters (e.g., learning rate, number of layers).

• Prone to overfitting with small datasets.

• May suffer from vanishing gradient problem with deep networks.

Applications
• Handwriting and speech recognition

• Financial forecasting

• Image and pattern recognition

• Medical diagnosis and classification

Conclusion
The Multilayer Perceptron is a powerful and flexible neural network architecture. Its ability
to learn complex patterns makes it a core component of many modern machine learning
applications.

1.12 Hopfield Networks


Hopfield Networks are a type of recurrent neural network that serve as associative memory
systems. They are used primarily for pattern recognition and optimization problems. Intro-
duced by John Hopfield in 1982, these networks are characterized by symmetric connections
and binary threshold nodes.

Network Architecture
A Hopfield Network consists of a single layer of neurons where each neuron is connected to
every other neuron except itself (i.e., no self-connections). The weights between neurons are
symmetric:

wij = wji , wii = 0


Each neuron in the network has a binary state, typically represented as +1 or −1.

22
Energy Function
The Hopfield Network is governed by an energy function, which is used to determine the
stability of the network:
1 XX X
E=− wij si sj + θi si
2 i j i

Where:

• si : State of neuron i

• wij : Weight between neurons i and j

• θi : Threshold of neuron i

The network updates the states of the neurons asynchronously to reduce the energy,
eventually converging to a stable state, which corresponds to a stored pattern.

Learning Rule (Hebbian Learning)


The weights in a Hopfield Network are usually set using the Hebbian learning rule:
P
1 X µ µ
wij = ξ ξ , wii = 0
N µ=1 i j

Where:

• ξ µ : The µth training pattern

• P : Number of patterns to store

• N : Number of neurons

Operation Phases
1. Training Phase: Store patterns by setting weights using Hebbian learning.

2. Recall Phase: Present a partial or noisy pattern; the network will update its states
to converge to the closest stored pattern (auto-associative memory).

Convergence and Stability


Hopfield Networks always converge to a stable state, but this could be a local minimum rather
than a desired pattern. The number of patterns that can be stored reliably is approximately
0.138 × N , where N is the number of neurons.

23
Applications
• Pattern and image recognition
• Associative memory
• Solving optimization problems (e.g., Traveling Salesman Problem)

Limitations
• Limited storage capacity
• Possibility of spurious states (unlearned stable patterns)
• Sensitive to noise and overlapping patterns

Conclusion
Hopfield Networks offer a biologically inspired approach to memory and pattern recognition.
Although they have limitations in capacity and accuracy, they laid the foundation for energy-
based learning models and continue to influence modern neural architectures.

1.13 Associative Memories


Associative memories are neural network models designed to store patterns and recall them
when presented with partial or noisy versions. Also known as content-addressable memory,
they differ from traditional memory systems that use addresses to retrieve data; instead,
associative memories retrieve data based on similarity to the input pattern.

Types of Associative Memory


• Auto-associative Memory: Recalls the original pattern from a noisy or incomplete
version of the same pattern.
• Hetero-associative Memory: Recalls a different, related output pattern when given
an input pattern.

Auto-associative Memory
In auto-associative memory, the input and output patterns are the same. When a noisy
version of a stored pattern is presented, the network converges to the original, clean version.
Hopfield Networks are a classic example of auto-associative memory.
Let the stored patterns be ξ µ = [ξ1µ , ξ2µ , . . . , ξN
µ
] for µ = 1, 2, . . . , P . The Hebbian learning
rule for storing these patterns is:
P
1 X µ µ
wij = ξ ξ , wii = 0
N µ=1 i j

24
Hetero-associative Memory
In hetero-associative memory, different input and output patterns are stored. The network
learns a mapping from input pattern x to output pattern y. The weight matrix is calculated
as:
P
X
wij = xµi yjµ
µ=1

This enables the network to recall a specific output pattern even when the input is noisy.

Characteristics of Associative Memory


• Can retrieve complete patterns from partial or noisy data.

• Can store multiple input-output pairs.

• Convergence to stored patterns is guaranteed under specific conditions.

Storage Capacity
The number of patterns that can be reliably stored depends on the network architecture.
For instance, in Hopfield Networks, the theoretical limit is approximately 0.138 × N for N
neurons, beyond which retrieval accuracy drops.

Applications
• Pattern recognition

• Data reconstruction and denoising

• Error correction

• Associative search engines

Limitations
• Limited storage capacity

• Potential for spurious patterns (incorrect stable states)

• Degradation in performance when stored patterns are not orthogonal

Conclusion
Associative memories provide a powerful model for pattern storage and retrieval. They
emulate how the human brain recalls information based on association and are foundational
in neural computing and cognitive modeling.

25
1.14 Applications of Artificial Neural Networks
Artificial Neural Networks (ANNs) have become a fundamental tool in many real-world
applications due to their ability to learn complex, non-linear relationships from data. Their
flexibility and adaptability allow them to be applied across a wide range of domains, from
pattern recognition to control systems and natural language processing.

1. Pattern Recognition
ANNs are widely used in recognizing patterns in data, such as images, speech, and hand-
writing. They can classify input patterns even when noise or distortion is present.
• Handwritten digit recognition (e.g., MNIST dataset)
• Optical Character Recognition (OCR)
• Facial recognition systems

2. Image Processing and Computer Vision


Neural networks can process visual data for object detection, classification, and segmentation
tasks.
• Image classification and tagging
• Object detection in autonomous vehicles
• Medical image analysis (e.g., tumor detection in MRIs)

3. Natural Language Processing (NLP)


ANNs, especially Recurrent Neural Networks (RNNs) and Transformers, have revolutionized
the way machines understand and generate human language.
• Machine translation (e.g., English to French)
• Sentiment analysis
• Chatbots and conversational agents

4. Speech Recognition and Generation


ANNs are capable of converting spoken language into text and generating synthetic speech
from text.
• Voice assistants (e.g., Siri, Alexa)
• Real-time speech-to-text transcription
• Text-to-speech (TTS) synthesis

26
5. Forecasting and Time Series Prediction
ANNs are effective in modeling and forecasting time-dependent data such as financial mar-
kets, weather, and sales trends.
• Stock market prediction
• Weather forecasting
• Energy demand prediction

6. Robotics and Control Systems


Neural networks are used in robotics for control, navigation, and learning tasks.
• Autonomous navigation and obstacle avoidance
• Adaptive control systems
• Robotic arm manipulation and coordination

7. Medical Diagnosis
ANNs assist healthcare professionals in diagnosing diseases and recommending treatments
based on medical data.
• Diagnosis of heart disease, cancer, and neurological disorders
• Prediction of disease progression
• Personalized treatment planning

8. Games and Reinforcement Learning


Neural networks are used in training intelligent agents capable of playing games and making
decisions through reinforcement learning.
• AlphaGo and AlphaZero by DeepMind
• Game-playing AI in strategy games and simulations
• Decision-making in real-time environments

9. Anomaly Detection and Security


ANNs are applied in cybersecurity and monitoring systems to detect unusual behavior or
anomalies.
• Fraud detection in banking and finance
• Network intrusion detection
• Industrial fault diagnosis

27
10. Recommendation Systems
Neural networks power recommendation engines used in e-commerce, entertainment, and
social media.

• Product recommendations (e.g., Amazon)

• Movie and music recommendations (e.g., Netflix, Spotify)

• Personalized content feeds (e.g., YouTube, Instagram)

Conclusion
Artificial Neural Networks have proven to be versatile and powerful tools across numerous
fields. With the advancement of deep learning, their capabilities continue to expand, making
them a central component in modern artificial intelligence applications.

28

You might also like