unit1
unit1
Contents
1 Unit - 1 2
1.1 Mathematical Models of Neurons . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Perceptron Model (McCulloch–Pitts Neuron) . . . . . . . . . . . . . . 2
1.1.2 Sigmoid Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 ReLU Neuron (Rectified Linear Unit) . . . . . . . . . . . . . . . . . . 3
1.1.4 Leaky ReLU, Tanh, Softmax . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.5 Summary of Neuron Models . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Artificial Neural Network (ANN) Architecture . . . . . . . . . . . . . . . . . 3
1.2.1 Components of ANN Architecture . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Backpropagation (Learning Phase) . . . . . . . . . . . . . . . . . . . 4
1.2.4 ANN Structure Example . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.5 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.6 Summary Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Learning Rules in Artificial Neural Networks . . . . . . . . . . . . . . . . . . 5
1.4 Gradient Descent Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Learning Paradigms – Supervised, Unsupervised, and Reinforcement Learning 8
1.6 ANN Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8 Training Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Delta Training Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.10 Backpropagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11 Multilayer Perceptron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.12 Hopfield Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.13 Associative Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.14 Applications of Artificial Neural Networks . . . . . . . . . . . . . . . . . . . 26
1
Chapter 1
Unit - 1
• xi : Input features
• wi : Weights
• b: Bias (threshold)
• f : Activation function (step, sigmoid, ReLU, etc.)
• y: Output
Step Activation Function:
(
1 if z ≥ 0
f (z) =
0 if z < 0
2
1.1.3 ReLU Neuron (Rectified Linear Unit)
Used in deep learning due to better convergence properties.
f (z) = max(0, z)
Leaky ReLU:
(
z if z > 0
f (z) =
0.01z otherwise
Tanh:
ez − e−z
f (z) = tanh(z) = range: (−1, 1)
ez + e−z
3
1.2.1 Components of ANN Architecture
• Input Layer: Receives raw data. Each neuron corresponds to an input variable. No
computation is performed here.
• Hidden Layer(s): One or more intermediate layers. Each neuron performs a weighted
sum and applies an activation function. Multiple hidden layers form a Deep Neural
Network (DNN).
• Output Layer: Produces the final output (e.g., class label, probability). The number
of neurons depends on the task.
4
1.2.5 Activation Functions
Name Formula Range
Sigmoid 1+e1−z (0, 1)
ez −e−z
Tanh ez +e−z
(-1, 1)
ReLU max(0, z) [0, ∞)
z
Softmax Pe ei zj (0, 1), sum = 1
j
∆wij = η · xi · yj
Where:
• η: Learning rate
• yj : Output of neuron j
• t: Target output
5
• o: Actual output
• xi : Input
• η: Learning rate
• ∂E
∂wi
: Partial derivative of error with respect to weight
Summary
6
Objective
Minimize the cost function: m
1 X
J(θ) = L(y (i) , ŷ (i) )
m i=1
Where:
Update Rule
Each parameter θj is updated using the partial derivative of the cost function:
∂J(θ)
θj := θj − η
∂θj
Where:
• ∂J
∂θj
: Gradient of the cost function with respect to parameter θj
Algorithm Steps
1. Initialize parameters θ randomly or to zero
• Stochastic Gradient Descent (SGD): Updates weights using one sample at a time.
7
Remarks
• The learning rate η affects convergence speed and stability.
• Too large η may cause overshooting; too small slows down learning.
• Advanced optimizers (e.g., Adam, RMSprop) build on gradient descent.
Supervised Learning
In supervised learning, the model is trained on a labeled dataset, which means each training
example includes both input data and the correct output.
• Input: Features x
• Output: Labels y
• Goal: Learn a mapping f : x → y
Examples:
• Classification (e.g., spam detection)
• Regression (e.g., predicting house prices)
Loss Function: Measures the difference between predicted and actual output (e.g.,
Mean Squared Error, Cross Entropy).
Unsupervised Learning
In unsupervised learning, the model works with input data only, without any labeled outputs.
The goal is to discover hidden patterns or structures in the data.
• Input: Features x
• Output: No labels
• Goal: Find patterns, groupings, or reduce dimensionality
Examples:
• Clustering (e.g., K-means, hierarchical clustering)
• Dimensionality Reduction (e.g., PCA, t-SNE)
Applications: Customer segmentation, anomaly detection, market basket analysis
8
Reinforcement Learning
Reinforcement learning (RL) involves training an agent to make a sequence of decisions by
interacting with an environment. The agent receives feedback in the form of rewards or
penalties.
• Robotics
• Forward Propagation: Compute the output of the network based on the current
weights and inputs.
• Error Calculation: Calculate the error or loss between the predicted output and the
actual target output.
• Backpropagation: Adjust the weights and biases to minimize the error using opti-
mization techniques.
• Iterate: Repeat the process for multiple epochs until the network converges.
9
Gradient Descent Algorithm
Gradient Descent is the most widely used optimization technique for training neural net-
works. It updates weights by moving in the direction of the negative gradient of the error
function with respect to the weights.
Backpropagation Algorithm
The Backpropagation algorithm is a widely used supervised learning technique to train multi-
layer neural networks. It involves two main phases: forward propagation and backward
propagation.
Forward Propagation
During forward propagation, the inputs are passed through the network, and the output is
calculated using the current weights. The output is then compared to the target to calculate
the error.
m
1X
E= (ti − oi )2
2 i=1
Where:
• E: Total error
• ti : Target output
• oi : Predicted output
10
Backward Propagation
Backward propagation uses the error computed in the forward pass to update the weights
and biases by applying the chain rule of differentiation.
∂E
∆wij = −η
∂wij
Where:
• η: Learning rate
• ∂E
∂wij
: Gradient of the error with respect to weight wij
• t: Target output
• o: Actual output
• xi : Input feature
• η: Learning rate
The Perceptron learning rule updates the weights in the direction that reduces the error.
∆wij = η · xi · yj
Where:
11
• ∆wij : Change in weight between neurons i and j
• η: Learning rate
This rule adjusts the weights based on the correlation between the activations of two
neurons.
• ∂E
∂wi
: Gradient of the error function with respect to the weight wi
• η: Learning rate
The Delta rule helps the network to adjust its weights by minimizing the error at each
step.
12
1.7 Perceptrons
The perceptron is the most basic unit of a neural network and was first introduced by
Frank Rosenblatt in 1958. It is a supervised learning algorithm used for binary classification
tasks. The perceptron models a single neuron and makes decisions by weighing input signals,
applying an activation function, and producing an output.
Structure of a Perceptron
A perceptron consists of:
• A bias term b
Mathematical Model
The output of the perceptron is given by:
n
!
X
y=f w i xi + b
i=1
Where:
• xi : Input feature
• b: Bias
wi := wi + η(t − y)xi
13
b := b + η(t − y)
Where:
• t: Target output
• y: Actual output
• xi : Input feature
Training Algorithm
The training process for a perceptron involves the following steps:
(a) Compute the output using the current weights and bias.
(b) Update the weights and bias using the learning rule.
Limitations of Perceptrons
• Perceptrons can only solve linearly separable problems.
Applications
Despite their simplicity, perceptrons are foundational and have applications in:
14
Overview of Training Rules
Training rules are based on different learning paradigms (supervised, unsupervised, rein-
forcement). They dictate how weights should be modified based on the inputs, outputs, and
possibly the errors. Some of the most commonly used training rules include:
• η: Learning rate
• t: Target output
• o: Actual output
• xi : Input value
15
Hebbian Learning Rule
This rule is based on the biological principle: “Cells that fire together, wire together.” It is
typically used in unsupervised learning and associative memory.
∆wij = ηxi yj
Where:
∆wi = η(xi − wi )
Only the winning neuron (and possibly its neighbors) gets updated.
Summary
Training rules are essential for enabling neural networks to learn from data. The choice of
rule depends on the type of task (classification, clustering, regression), the architecture of
the network, and the learning paradigm. Advanced training algorithms like backpropagation
build upon these foundational rules to enable deep learning in multilayer networks.
16
1.9 Delta Training Rule
The Delta Rule, also known as the Widrow-Hoff rule, is a gradient descent-based learning rule
used for training artificial neural networks, particularly for linear units. It is an improvement
over the Perceptron Learning Rule and allows learning in cases where the decision boundary
is not linearly separable. The Delta Rule is mainly used in supervised learning.
Basic Idea
The goal of the Delta Rule is to minimize the error between the desired output and the
actual output of the neuron by adjusting the weights. It does this by moving the weight
vector in the direction of the negative gradient of the error function.
Mathematical Formulation
For a neuron with inputs x1 , x2 , . . . , xn , weights w1 , w2 , . . . , wn , and bias b, the net input is
given by:
n
X
net = w i xi + b
i=1
The actual output o is computed using an activation function f (net), typically a linear
or sigmoid function.
The error for a single training sample is:
1
E = (t − o)2
2
Where:
• t: Target output
• o: Actual output
The weight update rule derived using gradient descent is:
wi := wi + ∆wi
Key Characteristics
• It uses a continuous activation function.
• Suitable for training linear units.
• Can be extended to multilayer networks via backpropagation.
• Converges even when the data is not linearly separable.
17
Delta Rule vs. Perceptron Rule
• The Perceptron Rule uses a step activation function and updates weights only for
misclassified inputs.
• The Delta Rule uses a differentiable activation function and applies continuous updates
based on error magnitude.
Applications
• Linear regression models
Conclusion
The Delta Rule is a fundamental building block in neural network training, allowing for
gradual, data-driven updates to model parameters. Its continuous and differentiable nature
enables its integration into more complex training algorithms for deep learning.
Overview
Backpropagation involves two main phases:
1. Forward Pass: Inputs are passed through the network to compute the output.
2. Backward Pass: The error is propagated backward from the output layer to the input
layer, and weights are adjusted using gradient descent.
Mathematical Representation
Given:
• Inputs: x1 , x2 , . . . , xn
18
• Output: ok , Target: tk
The error function (Mean Squared Error for a single output neuron) is:
1X
E= (tk − ok )2
2 k
δk = (tk − ok )f ′ (netk )
∆wjk = η · δk · oj
For hidden layer neurons:
X
δj = f ′ (netj ) δk wjk
k
∆wij = η · δj · oi
Where:
• η: Learning rate
Algorithm Steps
1. Initialize all weights and biases with small random values.
3. Repeat for all training samples until the error is minimized or a stopping criterion is
met.
19
Advantages
• Allows training of multilayer networks.
• Can be combined with techniques like momentum and adaptive learning rate.
Limitations
• Can get stuck in local minima.
Applications
• Handwritten digit recognition
• Function approximation
Conclusion
Backpropagation is a foundational algorithm in neural network training. It has enabled the
development of deep learning architectures by allowing networks to automatically adjust
weights in response to errors, leading to improved learning and performance.
Architecture
An MLP typically consists of the following components:
• Input Layer: Receives input features. Each neuron represents one feature of the
input vector.
20
• Hidden Layer(s): Intermediate layers that learn internal representations of the data.
Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result
through a non-linear activation function.
• Output Layer: Produces the final output of the network, typically using an appro-
priate activation function depending on the task (e.g., softmax for classification).
Mathematical Model
Let x = [x1 , x2 , . . . , xn ] be the input vector, and W (l) be the weight matrix for layer l. The
forward propagation through the layers is defined as:
Activation Functions
Common activation functions used in MLPs include:
• Sigmoid: f (x) = 1
1+e−x
21
Advantages
• Capable of modeling non-linear relationships.
Limitations
• Requires careful tuning of hyperparameters (e.g., learning rate, number of layers).
Applications
• Handwriting and speech recognition
• Financial forecasting
Conclusion
The Multilayer Perceptron is a powerful and flexible neural network architecture. Its ability
to learn complex patterns makes it a core component of many modern machine learning
applications.
Network Architecture
A Hopfield Network consists of a single layer of neurons where each neuron is connected to
every other neuron except itself (i.e., no self-connections). The weights between neurons are
symmetric:
22
Energy Function
The Hopfield Network is governed by an energy function, which is used to determine the
stability of the network:
1 XX X
E=− wij si sj + θi si
2 i j i
Where:
• si : State of neuron i
• θi : Threshold of neuron i
The network updates the states of the neurons asynchronously to reduce the energy,
eventually converging to a stable state, which corresponds to a stored pattern.
Where:
• N : Number of neurons
Operation Phases
1. Training Phase: Store patterns by setting weights using Hebbian learning.
2. Recall Phase: Present a partial or noisy pattern; the network will update its states
to converge to the closest stored pattern (auto-associative memory).
23
Applications
• Pattern and image recognition
• Associative memory
• Solving optimization problems (e.g., Traveling Salesman Problem)
Limitations
• Limited storage capacity
• Possibility of spurious states (unlearned stable patterns)
• Sensitive to noise and overlapping patterns
Conclusion
Hopfield Networks offer a biologically inspired approach to memory and pattern recognition.
Although they have limitations in capacity and accuracy, they laid the foundation for energy-
based learning models and continue to influence modern neural architectures.
Auto-associative Memory
In auto-associative memory, the input and output patterns are the same. When a noisy
version of a stored pattern is presented, the network converges to the original, clean version.
Hopfield Networks are a classic example of auto-associative memory.
Let the stored patterns be ξ µ = [ξ1µ , ξ2µ , . . . , ξN
µ
] for µ = 1, 2, . . . , P . The Hebbian learning
rule for storing these patterns is:
P
1 X µ µ
wij = ξ ξ , wii = 0
N µ=1 i j
24
Hetero-associative Memory
In hetero-associative memory, different input and output patterns are stored. The network
learns a mapping from input pattern x to output pattern y. The weight matrix is calculated
as:
P
X
wij = xµi yjµ
µ=1
This enables the network to recall a specific output pattern even when the input is noisy.
Storage Capacity
The number of patterns that can be reliably stored depends on the network architecture.
For instance, in Hopfield Networks, the theoretical limit is approximately 0.138 × N for N
neurons, beyond which retrieval accuracy drops.
Applications
• Pattern recognition
• Error correction
Limitations
• Limited storage capacity
Conclusion
Associative memories provide a powerful model for pattern storage and retrieval. They
emulate how the human brain recalls information based on association and are foundational
in neural computing and cognitive modeling.
25
1.14 Applications of Artificial Neural Networks
Artificial Neural Networks (ANNs) have become a fundamental tool in many real-world
applications due to their ability to learn complex, non-linear relationships from data. Their
flexibility and adaptability allow them to be applied across a wide range of domains, from
pattern recognition to control systems and natural language processing.
1. Pattern Recognition
ANNs are widely used in recognizing patterns in data, such as images, speech, and hand-
writing. They can classify input patterns even when noise or distortion is present.
• Handwritten digit recognition (e.g., MNIST dataset)
• Optical Character Recognition (OCR)
• Facial recognition systems
26
5. Forecasting and Time Series Prediction
ANNs are effective in modeling and forecasting time-dependent data such as financial mar-
kets, weather, and sales trends.
• Stock market prediction
• Weather forecasting
• Energy demand prediction
7. Medical Diagnosis
ANNs assist healthcare professionals in diagnosing diseases and recommending treatments
based on medical data.
• Diagnosis of heart disease, cancer, and neurological disorders
• Prediction of disease progression
• Personalized treatment planning
27
10. Recommendation Systems
Neural networks power recommendation engines used in e-commerce, entertainment, and
social media.
Conclusion
Artificial Neural Networks have proven to be versatile and powerful tools across numerous
fields. With the advancement of deep learning, their capabilities continue to expand, making
them a central component in modern artificial intelligence applications.
28