Introduction to Neural Network
[Link]
PROFESSOR
CSE - AIML
SRM IST, Ramapuram
1
[Link] Prof/CSE - AIML 1
Unit-1 Introduction to Neural Network
Biological neuron, Motivation from biological neuron, McCulloch Pitts Neuron,
Perceptron, Perceptron learning Algorithm, Representation power of a network of
perceptrons, Activation functions-Sigmoid, tanh, ReLU, leaky ReLU, Sigmoid neuron,
Gradient descent leaming Algorithm, Representation power of multilayer Network of
Sigmoid Neurons, Representation power of function: Complex functions in real world
examples, Feedforward Neural Networks, Learning parameters, output and loss
functions of FFN Networks, Backpropagation learning Algorithm, Applying chain rule
across in a neural network, Computing partial derivatives [Link] a weight
[Link] Prof/CSE - AIML 2 2
Biological Neuron
⦿Neurons are the basic functional units of the nervous system, and they
generate electrical signals called action potentials, which allows them to
quickly transmit information over long distances. Almost all the neurons
have three basic functions essential for the normal functioning of all the
cells in the body.
⦿These are to:
1. Receive signals (or information) from outside.
2. Process the incoming signals and determine whether or not the
information should be passed along.
3. Communicate signals to target cells which might be other neurons or
muscles or glands.
[Link] Prof/CSE-Aiml 3 3
Biological Neuron
4
[Link] Prof/CSE - AIML 4
Main parts of biological neuron
⦿ Dendrite
Dendrites are responsible for getting incoming signals from outside
Incoming signals can be either excitatory — which means they tend to make the
neuron fire (generate an electrical impulse) — or inhibitory — which means that they tend to
keep the neuron from firing.
⦿ Soma
Soma is the cell body responsible for the processing of input signals and deciding whether a
neuron should fire an output signal
⦿ Axon
Axon is responsible for getting processed signals from neuron to relevant cells
⦿ Synapse
Synapse is the connection between an axon and other neuron dendrites
[Link] Prof/CSE- AIML 5 5
Artificial neuron
• Artificial neuron also known as perceptron is the basic unit of the
neural network. In simple terms, it is a mathematical function based on
a model of biological neurons.
• It can also be seen as a simple logic gate with binary outputs .
6
[Link] Prof/CSE - AIML 6
Main Functions of Artificial neuron
• Takes inputs from the input layer
• Weighs them separately and sums them up
• Pass this sum through a nonlinear function to produce output.
7
[Link] Prof/CSE -AIML 7
Biological Neuron Vs
Artificial Neuron
8
[Link] Prof/CSE - AIML 8
McCulloch-Pitts
Neuron Model
Binary neuron model (1943):
o Takes binary inputs (0 or 1).
o Applies weighted sum and threshold.
o Output is 1 if sum ≥ threshold, else 0.
9
[Link] Prof/CSE -AIML 9
Perceptron
10
[Link] Prof/CSE -AIML 10
Parts of Perceptron
⦿ Input layer
⦿Weights and Bias
⦿Activation Function
⦿Output Layer
[Link] Prof/CSE - AIML 11 11
Comparison between MP Neuron
Model and Perceptron Model
• Both, MP Neuron Model as well as the Perceptron model work on linearly
separable data.
• MP Neuron Model only accepts boolean input whereas Perceptron Model can
process any real input.
• Inputs aren’t weighted in MP Neuron Model, which makes this model less
flexible. On the other hand, Perceptron model can take weights with respective
to inputs provided.
• While using both the models we can adjust threshold input to make the model
fit the dataset.
12
[Link] Prof/CSE - AIML 12
Perceptron Learning Algorithm
1. First, multiply all input values with corresponding weight values and then add them to
determine the weighted sum. Mathematically, we can calculate the weighted sum as
follows: ∑wi∗xi=x1∗w1+x2∗w2+…+wn∗xn Add another essential term called bias 'b' to the
weighted sum to improve the model performance. ∑wi∗xi+b
2. Next, an activation function is applied to this weighed sum, producing a binary or a
continuous-valued output. Y=f(∑wi∗xi+b)
3. Next, the difference between this output and the actual target value is computed to get the
error term, E, generally in terms of mean squared error. The steps up to this form the forward
propagation part of the algorithm. E=(Y−Yactual)2
13
[Link] Prof/CSE - AIML 13
Perceptron Learning Algorithm
[Link] optimize this error (loss function) using an optimization algorithm. Generally, some form of
gradient descent algorithm is used to find the optimal values of the hyperparameters like learning
rate, weight, Bias, etc. This step forms the backward propagation part of the algorithm.
14
[Link] Prof/CSE - AIML 14
Importance of Weight and Bias
• Weight increases the steepness of activation function. This means weight decide how fast the
activation function will trigger whereas bias is used to delay the triggering of the activation
function.
• The weight shows the effectiveness of a particular input. More the weight of input, more it will
have impact on network.
• On the other hand Bias is like the intercept added in a linear equation. It is an additional
parameter in the Neural Network which is used to adjust the output along with the weighted sum
of the inputs to the neuron.
• Therefore Bias is a constant which helps the model in a way that it can fit best for the given
data.
[Link] Prof/CSE - AIML 15 15
Importance of Weight and Bias
• y = mx+c
Where m = weight and c = bias
• Now, Suppose if c was absent, then the graph will be
formed like in figure
• Due to absence of bias, model will train over point
passing through origin only, which is not in accordance
with real-world scenario.
• Also with the introduction of bias, the model will
become more flexible.
[Link] Prof/CSE - AIML 16 16
Importance of Weight and Bias
[Link] Prof/CSE - AIML 17 17
Importance of Weight and Bias -
Example
Change in weight
• weight W1 changed from
1.0 to 4.0
• weight W2 changed from -
0.5 to 1.5
• On increasing the weight the
steepness is increasing.
• Therefore it can be inferred
that More the weight earlier
activation function will
trigger.
[Link] Prof/CSE - AIML 18 18
Importance of Weight and Bias -
Example
Bias changed from -1.0 to -
5.0
The change in bias is
increasing the value of
triggering activation function.
Therefore it can be inferred
that from above graph that,
bias helps in controlling the
value at which activation
function will trigger.
[Link] Prof/CSE - AIML 19 19
Example
output = sum (weights * inputs) + bias
y = f(x) = Σxiwi
[Link] Prof/CSE - AIML 20 20
Example
[Link] Prof/CSE - AIML 21 21
Activation Function
• An activation function is a function that is added into an artificial neural network in order to
help the network learn complex patterns in the data.
• When comparing with a neuron-based model that is in our brains, the activation function is at
the end deciding what is to be fired to the next neuron.
• That is exactly what an activation function does in an ANN as well. It takes in the output
signal from the previous cell and converts it into some form that can be taken as input to
the next cell.
1. Sigmoid Function
2. Softmax
22
[Link] Prof/CSE - AIML 22
Activation Function
[Link] Prof/CSE - AIML 23 23
Sigmoid Neuron
• Similar to perceptron but with sigmoid activation.
• Continuous output between 0 and 1.
• Useful for probabilistic interpretation.
[Link] Prof/CSE - AIML 24 24
Softmax Vs Sigmoid
[Link] Prof/CSE - AIML 25 25
Softmax
[Link] Prof/CSE - AIML 26 26
Single Layer Feed Forward
Output
Neurons
y1_out
Input y1_in Y1
Neurons w11
x1 w21
X1 w12 y2_in
w13 y2_out
w14 Y2
w22 w
x2 32
X2 y3_in
w23 Y3 y3_out
w31
x3 w33
X3 w24
w34 y4_in
y4_out
Y4
[Link] Prof/CSE - AIML 27 27
Multi Layer Feed Forward
[Link] Prof/CSE - AIML 28 28
Multi Layer Feed Forward
[Link] Prof/CSE - AIML 29 29
Simple Classification Problem
[Link] Prof/CSE - AIML 30 30
Simple Classification Problem
[Link] Prof/CSE - AIML 31 31
Simple Classification Problem
[Link] Prof/CSE - AIML 32 32
XOR Problem
• Most of the real life classification
problems are not linearly
separable
• A perceptron cannot learn to
compute even a 2-bit XOR as it is
non linearly separable
• There is no single straight line to
separate the patterns producing 1s
{(0,1), (1,0)} from the patterns
producing 0s {(0,0), (1,1)}
• How to overcome this limitation?
1. Draw a curved decision
surface. But a perceptron
cannot model any curved
surface
2. To employ two decision
lines (Multi-layered
perceptron)
[Link] Prof/CSE - AIML 33 33
Error Perceptron
• In the Perceptron Learning
Rule, the predicted output is
compared with the known
output. If it does not match,
the error is propagated
backward to allow weight
adjustment to happen.
[Link] Prof/CSE - AIML 34 34
NEURAL NETWORK IMPLEMENTATION
FROM SCRATCH
[Link] Prof/CSE - AIML 35 35
WHAT IS LOGICAL OR GATE?
• Straightforwardly, when one of the inputs is 1, the output of the OR
gate is going to be 1. It means that the output is 0 only when both
of the inputs are 0.
[Link] Prof/CSE - AIML 36 36
TRUTH-TABLE FOR OR GATE:
[Link] Prof/CSE - AIML 37 37
PERCEPTRON FOR THE OR GATE:
[Link] Prof/CSE - AIML 38 38
[Link] Prof/CSE - AIML 39 39
[Link] Prof/CSE - AIML 40 40
[Link] Prof/CSE - AIML 41 41
ERROR CALCULATION:
[Link] Prof/CSE - AIML 42 42
WHAT IS GRADIENT DESCENT?
• Gradient Descent is an optimization algorithm used in machine
learning models to find the minimum value of a cost function.
• It does this by taking small steps in the direction that is opposite
to the gradient of the cost function until it reaches a local
minimum.
• The learning rate determines the size of each step and can be
adjusted to balance convergence speed and accuracy.
[Link] Prof/CSE - AIML 43 43
WHAT IS GRADIENT DESCENT?
• For updating weight values, we are going to use a gradient
descent algorithm.
• Gradient Descent is a machine learning algorithm that
operates iteratively to find the optimal values for its
parameters. It takes into account, user-defined learning rate,
and initial parameter values.
[Link] Prof/CSE - AIML 44 44
WHAT IS GRADIENT DESCENT?
[Link] Prof/CSE - AIML 45 45
GRADIENT DESCENT WORKING
Working: (Iterative)
• 1. Start with initial values.
• 2. Calculate cost.
• 3. Update values using the update function.
• 4. Returns minimized cost for our cost
function
[Link] Prof/CSE - AIML 46 46
WHY DO WE NEED IT?
• Generally, what we do is, we find the
formula that gives us the optimal values
for our parameter. However, in this
algorithm, it finds the value by itself!.
Formula for Gradient descent algorithm
[Link] Prof/CSE - AIML 47 47
Learning Rate
[Link] Prof/CSE - AIML 48 48
DERIVATION OF THE FORMULA USED IN A
NEURAL NETWORK
• what we want to find is how a particular
weight value affects the error. To find that
we are going to apply the chain rule.
[Link] Prof/CSE - AIML 49 49
CALCULATING DERIVATIVES:
[Link] Prof/CSE - AIML 50 50
• In our case:
• Output = 0.68997
Target = 1
[Link] Prof/CSE - AIML 51 51
FINDING THE SECOND PART OF THE
DERIVATIVE:
[Link] Prof/CSE - AIML 52 52
[Link] Prof/CSE - AIML 53 53
[Link] Prof/CSE - AIML 54 54
FINDING THE THIRD PART OF THE DERIVATIVE
Putting it all together:
[Link] Prof/CSE - AIML 55 55
• Putting it in our main equation:
w2=0.3-(0.5)*(-0.06631)
w2=0.3033
Notice that the value of the weight has increased
here. We can calculate all the values in this way, but
as we can see, it is going to be a lengthy process. So
now we are going to implement all the steps in
Python.
[Link] Prof/CSE - AIML 56 56
SUMMARY OF THE MANUAL
IMPLEMENTATION OF A NEURAL
NETWORK:
a. Input for perceptron:
b. Applying sigmoid function for predicted output :
c. Calculate the error:
57
[Link] Prof/CSE - AIML 57
d. Changing the weight value based on gradient descent formula:
e. Calculating the derivative:
f. Individual derivatives:
g. After then we run the same code with updated weight values.
58
IMPLEMENTATION OF A NEURAL
NETWORK IN PYTHON:
10.1 Import Required libraries:
10.2 Assign Input values:
[Link] Prof/CSE - AIML 59 59
10.3 Target Output:
10.3 Assign the Weights :
[Link] Prof/CSE - AIML 60 60
10.4 Adding Bias Values and Assigning a Learning Rate :
10.5 Applying a Sigmoid Function:
10.6 Derivative of sigmoid function:
[Link] Prof/CSE - AIML 61 61
10.7 The main logic for predicting output and updating the weight values:
[Link] Prof/CSE - AIML 62 62