Unit 2
Unit 2
g: aggregation input
cause the neuron to fire on its own but it will combine with all other inputs that you have seen could
cause the neuron to fire and how.
(i.e.1)
And we also know that x_1 Lets verify that, the g(x) i.e., x_1 +
AND !x_2 would output 1 for x_2 would be ≥ 1 in only 3 cases:
Case 1 (above) so our
thresholding parameter holds Case 1: when x_1 is 1 and x_2 is 0
good for the given function. Case 2: when x_1 is 1 and x_2 is 1
Case 3: when x_1 is 0 and x_2 is 1
Perceptron
Perceptron is a building block of an Artificial Neural Network,Mr. Frank Rosenblatt invented
the Perceptron for performing certain calculations to detect input data capabilities or
business intelligence
Step-1: In the first step first, multiply all input values with corresponding weight values and
then add them to determine the weighted sum. Mathematically, we can calculate the
weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2: In the second step, an activation function is applied with the above-mentioned
weighted sum, which gives us output either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
Based on the layers, Perceptron models are divided into two types. These are as follows:
X1
W1 Bias Activation Function
W2o/p
X2 ∑ Ψ (.)
.
Wn Xn Summation
Where
X= input
The multi-layer perceptron model is also known as the Backpropagation algorithm, which
executes in two stages as follows:
Forward Stage: Activation functions start from the input layer in the forward stage and
terminate on the output layer.
Backward Stage: In the backward stage, weight and bias values are modified as per the
model's requirement. In this stage, the error between actual output and demanded
originated backward on the output layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural
networks having various layers in which activation function does not remain linear, similar
to a single layer perceptron model. Instead of linear, activation function can be executed as
sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and
non-linear patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND,
NOT, XNOR, NOR.
It helps to obtain the same accuracy ratio with large as well as small data.
In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects
each independent variable.
Initially, weights are multiplied with input features, and the decision is made whether the
neuron is fired or not.
The activation function applies a step rule to check whether the weight function is greater
than zero.
The linear decision boundary is drawn, enabling the distinction between the two linearly
separable classes +1 and -1.
If the added sum of all input values is more than the threshold value, it must have an output
signal; otherwise, no output will be shown.
Perceptron can only be used to classify the linearly separable sets of input vectors. If input
vectors are non-linear, it is not easy to classify them properly.
MLP networks are used for supervised learning format. A typical learning algorithm for MLP
networks is also called back propagation's algorithm.
A multilayer perceptron (MLP) is a feed forward artificial neural network that generates a
set of outputs from a set of inputs. An MLP is characterized by several layers of input nodes
connected as a directed graph between the input nodes connected as a directed graph
between the input and output layers. MLP uses backpropagation for training the network.
MLP is a deep learning method.
11
Bias=-2
The red lineshowsa weight value of -1 whereas the blue line represent1
=-1
=1
Input Layer-
Hidden Layer-
Output Layer-
XOR Function: -
Let, be the bias output of the neuron,
XOR
0 0 0 1 0 0 0
0 1 1 0 1 0 0
1 0 1 0 0 1 0
1 1 0 0 0 0 1
<=0
>=
>=
<=
Sigmoid Neuron:
The building block of the deep neural networks is called the sigmoid neuron. Sigmoid
neurons are similar to perceptron, but they are slightly modified such that the output from
the sigmoid neuron is much smoother than the step functional output from perceptron.
Perceptron model takes several real-valued inputs and gives a single binary output. In the
perceptron model,From the mathematical representation, we might say that the
thresholding logic used by the perceptron is very harsh.
e.g. Deciding whether we will like or dislike the moviebased decision only on one I/P
If we set threshold is 0.5 then then what would be the decision for a movie with critics
rating=0.51(like) , 0.49(dislike)
it seems harsh that we would like a movie withrating 0.51 but not one with 0.49
Introducing sigmoid neurons where the output function is much smoother than the step
function. In the sigmoid neuron, a small change in the input only causes a small change in
the output as opposed to the stepped output. There are many functions with the
characteristic of an “S” shaped curve known as sigmoid functions. The most commonly used
function is the logistic function.
Y =
Where wTh = Y=
We no longer see a sharp transition at the threshold b. The output from the sigmoid neuron
is not 0 or 1. Instead, it is a real value between 0–1 which can be interpreted as probability.
Gradient descent.
Gradient Descent is defined as one of the most commonly used iterative optimization
algorithms of deep learning to train the models. It helps in finding the local minimum of a
function.
The main objective of using a gradient descent algorithm is to minimize the cost function
using iteration.To achieve this goal, it performs two steps iteratively:
Calculates the first-order derivative of the function to compute the gradient or slope
of that function
Move away from the direction of the gradient, which means slope increased from
the current point by alpha times, where Alpha is defined as Learning Rate. It is a
tuning parameter in the optimization process which helps to decide the length of the
steps.
Vector of parameters randomly initialized = [w, b]
Changes in w & b ∆ = [∆w, ∆b]
Start with random guess, once we start with random guess then the change that
make to w, b so that we landed up in better situation i.e. error is less
Add small changes which is also a vector
new = [Wnew, bnew] moved in the direction of ∆
Let be bit conservative more only by a small amount ⴄ
new = ⴄ .∆
∆ = ? ∆ is more in the direction opposite to gradient
Let ∆ = u from taylor series
What is Cost-function?
The cost function is defined as the measurement of difference or error between actual
values and expected values at the current position and present in the form of a single real
number.
Before starting the working principle of gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple linear
regression is given as:Y=MX+C
The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.
The main objective of gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are required:
Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is typically a
small value that is evaluated and updated based on the behavior of the cost function. If the
learning rate is high, it results in larger steps but also leads to risks of overshooting the
minimum. At the same time, a low learning rate shows the small step sizes, which
compromises overall efficiency but gives the advantage of more precision.
W x1 y1
Bx2 y2
We assume there is only one point to fit (x , y)
Prof.U.A.S.Gani
(Subject Teacher)