0% found this document useful (0 votes)
13 views

DL_Unit I

Uploaded by

dubeynandini73
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

DL_Unit I

Uploaded by

dubeynandini73
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

UNIT – 1

INTRODUCTION TO NEURAL
NETWORK

Final Year
BTECH Subject : Deep Learning (PE4)
Unit I : Contents
2

Introduction To Neural Network

Introduction, The architecture of an artificial neural network, Types


of ANN architecture, Advantages and disadvantages of ANN,
Perceptron, Sigmoid Neurons, Activation Functions, Loss Function.
Introduction, The architecture of an ANN, Types of ANN
architecture, Advantages and Disadvantages of ANN

Sources:

• https://www.javatpoint.com/artificial-neural-network
• Introduction to Artificial Neural Systems by Jacek M. Zurada
Yegnanarayana B, Artificial Neural Systems , PHP learning

Final Year
BTECH Subject : Deep Learning (PE4)
What is Machine Learning?
4

⮚ Artificial Intelligence (AI) systems learn by extracting patterns from


input and output data.
⮚ Machine Learning (ML) relies on learning patterns based on sample
data. Programs learn from labeled data (supervised learning),
unlabeled data (unsupervised learning), or a combination of both
(semi-supervised learning).
⮚ Artificial Intelligence (AI) came around in the middle of 1900s
when scientists tried to envision intelligent machines. Machine
Learning evolved in the late 1900s. This allowed scientists to train
machines for AI.
⮚ In the early 2000s, certain breakthroughs in multi-layered neural
networks facilitated the advent of Deep Learning.
What is Artificial Neural Network?
5

The term "Artificial Neural Network" is derived from Biological neural


networks that develop the structure of a human brain. Similar to the human
brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in
various layers of the networks. These neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural


Network.

https://www.javatpoint.com/artificial-neural-network
What is Artificial Neural Network?

⮚ ANN in the field of AI where it attempts to mimic the network of neurons


makes up a human brain so that computers will have an option to understand
things and make decisions in a human-like manner.
⮚ It is designed by programming computers to behave simply like
interconnected brain cells.
⮚ There are around 1000 billion neurons in the human brain. Each neuron has
an association point somewhere in the range of 1,000 and 100,000.
⮚ In the human brain, data is stored in such a manner as to be distributed, and
we can extract more than one piece of this data when necessary from our
memory parallelly.
⮚ Human brain is made up of incredibly amazing parallel processors.
What is Artificial Neural Network?
7

⮚ ANN example-
Consider an example of a digital logic gate that takes an input and
gives an output. "OR" gate, which takes two inputs. If one or both
the inputs are "On," then we get "On" in output. If both the inputs
are "Off," then we get "Off" in output. Here the output depends
upon input. Our brain does not perform the same task. The outputs
to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
Relationship between Biological neural network and
artificial neural network
8

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

Table 1: Neural network (biological and artificial)


Typical Artificial Neural Network
9

The typical Artificial Neural Network looks something like the given figure.

Fig 1. Neuron function

Dendrites from Biological Neural Network represent inputs in Artificial Neural


Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon
represents Output.
The architecture of an artificial neural network
10

To understand the concept of the architecture of an artificial neural network, we have to


understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Fig 2 :Neural network’ different architectures


Layers of Artificial Neural Network
11

Input Layer:
◻ As the name suggests, it accepts inputs in several different formats provided
by the programmer.

Hidden Layer:
◻ The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.

Output Layer:
◻ The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.
12
Need of Bais
13
Layers of Artificial Neural Network
14

⮚ The artificial neural network takes input and computes the weighted sum of
the inputs and includes a bias. This computation is represented in the form of
a transfer function.
⮚ It determines weighted total is passed as an input to an activation function to
produce the output. Activation functions choose whether a node should fire
or not. Only those who are fired make it to the output layer.
⮚ There are distinctive activation functions available that can be applied upon
the sort of task we are performing.
Types/Models of ANN Architecture
15

Feedforward Network

Feedback Network

Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems , PHP learning. (Page no. 37)
Types/Models of ANN Architecture
Feedforward Network
16

Figure 2.8(b) shows the block diagram of the feedforward network. As can be seen, the
generic feedforward network is characterized by the lack of feedback. This type of
network can be connected in cascade to create a multilayer network. In such a network,
the output of a layer is the input to the following layer. Even though the feedforward
network has no explicit feedback connection when x(t) is mapped into o(t), the output
values are often compared with the "teacher's" information, which provides the desired
output value, and also an error signal can be employed for adapting the network's
weights.

Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems , PHP learning. (Page no. 37)
Types/Models of ANN Architecture
Feedback Network
17

A feedback network can be obtained from the feedforward network shown in Figure
2.8(a) by connecting the neurons' outputs to their inputs. The result is depicted in Figure
2.10(a).

Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems , PHP learning. (Page no. 42)
Advantages of Artificial Neural Network (ANN)
18

⮚ Parallel processing capability:


Artificial neural networks have a numerical value that can perform more than one task simultaneously.

⮚ Storing data on the entire network:


Data that is used in traditional programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.

⮚ Capability to work with incomplete knowledge:


After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.

⮚ Having a memory distribution:


For ANN to be able to adapt, it is important to determine the examples and to encourage the network
according to the desired output by demonstrating these examples to the network. The succession of the
network is directly proportional to the chosen instances, and if the event can't appear to the network in
all its aspects, it can produce false output.

⮚ Having fault tolerance:


Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature
makes the network fault-tolerance.
Disadvantages of Artificial Neural Network (ANN)
19

⮚ Assurance of proper network structure:


There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.

⮚ Unrecognized behavior of the network:


It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide
insight concerning why and how. It decreases trust in the network.

⮚ Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their structure.
Therefore, the realization of the equipment is dependent.

⮚ Difficulty of showing the issue to the network:


ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the
performance of the network. It relies on the user's abilities.

⮚ The duration of the network is unknown:


The network is reduced to a specific value of the error, and this value does not give us optimum results.
1.1 Neural Computation (With Example)
20

Let us try to inspect the performance of a simple classifier-

Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems ,
PHP learning. (Page no. 3-8)
1.1 Neural Computation (With Example)
21

Assume that a set of eight points, Po, P1, . . . , P7, in three-dimensional space is available.
The set consists of all vertices of a three-dimensional cube as follows:

Elements of this set need to be classified into two categories. The first category is defined
as containing points with two or more positive ones; the second category contains all the
remaining points that do not belong to the first category. Accordingly, points P3, P5, P6,
and P7 belong to the first category, and the remaining points to the second category.
Classification of points P3, P5, P6, and P7 can be based on the summation of coordinate
values for each point evaluated for category membership. Notice that for each point Pi (x,,
x2, x3), where i = 0, . . . , 7, the membership in the category can be established by the
following calculation:

Describes the decision function of


the classifier designed by inspection
of the set that needs to be partitioned
Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems ,
PHP learning. (Page no. 3-8)
1.1 Neural Computation (With Example)
22

The unit from Figure 1 .l(a) maps the entire three-dimensional space into just two points, 1
and - 1. A question arises as to whether a unit with a "squashed" sgn function rather than a
regular sgn function could prove more advantageous. Assuming that the "squashed" sgn
function has the shape as in Figure 1.2, notice that now the outputs take values in the range (-
1,l) and are generally more discernible than in the previous case. Using units with continuous
characteristics offers tremendous opportunities for new tasks that can be performed by neural
networks. Specifically, the fine granularity of output provides more information than the
binary f 1 output of the thresholding element.

Introduction to Artificial Neural Systems by Jacek M. Zurada Yegnanarayana B, Artificial Neural Systems , PHP learning. (Page no. 3-8)
Perceptron, Sigmoid Neurons

Sources:

1. Michael A. Nielsen, "Neural Networks and Deep Learning", Determination


Press, 2015 (Module I- Perceptron, Sigmoid Neurons)
2. https://www.simplilearn.com/what-is-perceptron-tutorial

Final Year
BTECH Subject : Deep Learning (PE4)
Perceptron
24

⮚ A perceptron is a neural network unit (an artificial neuron)


that does certain computations to detect features or business
intelligence in the input data.
⮚ A type of artificial neuron called a perceptron.
⮚ Perceptron was introduced by Frank Rosenblatt in 1957. He
proposed a Perceptron learning rule based on the original
MCP neuron.
⮚ A Perceptron is an algorithm for supervised learning of binary
classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.

Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015 (Module I- Perceptron, Sigmoid Neurons)
How do perceptron's work?
25

• A perceptron takes several binary inputs, x1 , x2 , . . ., and produces a single


binary output

• In the example shown the perceptron has three inputs, x1 , x2 , x3 . In general it


could have more or fewer inputs
• Rosenblatt proposed a simple rule to compute the output. He introduced
weights, w1 ,w2 , . . ., real numbers expressing the importance of the respective
inputs to the output.
• The neuron’s output, 0 or 1, is determined by whether the weighted sum is less
than or greater than some threshold value
• Just like the weights, the threshold is a real number which is a parameter
of the neuron. To put it in more precise algebraic terms
Perceptron
26

⮚ The first column of perceptrons – the first layer of perceptrons – is making


three very simple decisions, by weighing the input evidence.
⮚ Perceptrons in the second layer- Each of those perceptrons is making a
decision by weighing up the results from the first layer of decision-
making.
⮚ Perceptron in the second layer can make a decision at a more complex and
more abstract level than perceptrons in the first layer.
⮚ And even more complex decisions can be made by the perceptron in the
third layer.
⮚ A many-layer network of perceptrons can engage in sophisticated decision
making.
Perceptron
27

⮚ There are two types of Perceptrons: Single layer and Multilayer.


⮚ Single layer Perceptrons can learn only linearly separable patterns.
⮚ Multilayer Perceptrons or feedforward neural networks with two or
more layers have the greater processing power.
⮚ The Perceptron algorithm learns the weights for the input signals in
order to draw a linear decision boundary.
⮚ This enables you to distinguish between the two linearly separable
classes +1 and -1.
⮚ Supervised Learning is a type of Machine Learning used to learn
models from labeled training data. It enables output prediction for
future or unseen data.

https://www.simplilearn.com/what-is-perceptron-tutorial
Perceptron-Single layer
28

◻ It includes a feed-forward network depends on a threshold


transfer function in its model.
◻ It is the easiest type of ANN that able to analyze only linearly
separable objects with binary outcomes(target) i.e. 1, and 0.
Perceptron-Single layer
29

⮚ In single-layered perceptron model, its algorithm doesn’t have


previous information,
⮚ Initially, weights are allocated inconstantly, then the algorithm
adds up all the weighted inputs, if the added value is more
than some pre-determined value( or, threshold value) then
single-layered perceptron is stated as activated and delivered
output as +1.
⮚ Multiple input values feed up to the perceptron model, model
executes with input values, and if the estimated value is the
same as the required output, then the model performance is
found out to be satisfied, therefore weights demand no
changes. In fact, if the model doesn’t meet the required result
then few changes are made up in weights to minimize errors.
Perceptron-Multi layer
30

⮚ It has a structure similar to a single-layered perceptron model with more


number of hidden layers.
⮚ It is also termed as a Backpropagation algorithm. It executes in two
stages; the forward stage and the backward stages.
Perceptron-Multilayer
31

⮚ In the forward stage, activation functions are originated from


the input layer to the output layer,
⮚ In the backward stage, the error between the actual observed
value and demanded given value is originated backward in the
output layer for modifying weights and bias values.
⮚ In simple terms, multi-layered perceptron can be treated as a
network of numerous artificial neurons overhead varied
layers, the activation function is no longer linear, instead, non-
linear activation functions such as Sigmoid functions, TanH,
ReLU activation Functions, etc are deployed for execution.
Perceptron
32
Perceptron Learning Rule
33

⮚ Perceptron Learning Rule states that the algorithm would


automatically learn the optimal weight coefficients. The input
features are then multiplied with these weights to determine if
a neuron fires or not.
⮚ The Perceptron receives multiple input signals, and if the sum
of the input signals exceeds a certain threshold, it either
outputs a signal or does not return an output. In the context of
supervised learning and classification, this can then be used to
predict the class of a sample.
Perceptron Learning Rule
34
Perceptron Function
35

⮚ Perceptron is a function that maps its input “x,” which is multiplied with
the learned weight coefficient; an output value ”f(x)”is generated.

⮚ In the equation given above:


“w” = vector of real-valued weights
“b” = bias (an element that adjusts the boundary away from origin without
any dependence on the input value)
“x” = vector of input x values

“m” = number of inputs to the Perceptron


The output can be represented as “1” or “0.” It can also be represented as “1”
or “-1” depending on which activation function is used.
Inputs of a Perceptron
36

⮚ A Perceptron accepts inputs, moderates them with certain


weight values, then applies the transformation function to
output the final result. The above below shows a Perceptron
with a Boolean output.
⮚ A Boolean output is based on inputs such as salaried, married,
age, past credit profile, etc. It has only two values: Yes and No
or True and False. The summation function “∑” multiplies all
inputs of “x” by weights “w” and then adds them up as
follows:
Inputs of a Perceptron
37
Activation Functions of Perceptron
38

⮚ The activation function applies a step rule (convert the


numerical output into +1 or -1) to check if the output of the
weighting function is greater than zero or not.
Activation Functions of Perceptron
39

⮚ E.g.

If ∑ wixi> 0 => then final output “o” = 1 (issue bank loan)


Else, final output “o” = -1 (deny bank loan)

⮚ Step function gets triggered above a certain value of the


neuron output; else it outputs zero. Sign Function outputs +1
or -1 depending on whether neuron output is greater than zero
or not. Sigmoid is the S-curve and outputs a value between 0
and 1.
Output of Perceptron
40

⮚ Perceptron with a Boolean output:


⮚ Inputs: x1…xn
⮚ Output: o(x1….xn)

⮚ Weights: wi=> contribution of input xi to the Perceptron


output;
⮚ w0=> bias or threshold
⮚ If ∑w.x > 0, output is +1, else -1. The neuron gets triggered
only when weighted input reaches a certain threshold value.
Output of Perceptron
41

⮚ An output of +1 specifies that the neuron is triggered. An


output of -1 specifies that the neuron did not get triggered.
⮚ “sgn” stands for sign function with output +1 or -1.
Error in Perceptron
42

◻ In the Perceptron Learning Rule, the predicted output is


compared with the known output. If it does not match, the
error is propagated backward to allow weight adjustment to
happen.
Perceptron: Decision Function
43

◻ A decision function φ(z) of Perceptron is defined to take a linear


combination of x and w vectors.

◻ The value z in the decision function is given by:

◻ The decision function is +1 if z is greater than a threshold θ, and it is -1


otherwise.
Perceptron: Decision Function
44

◻ Bias Unit
◻ For simplicity, the threshold θ can be brought to the left and represented as
w0x0, where w0= -θ and x0= 1.

◻ The value w0 is called the bias unit.


◻ The decision function then becomes:
Perceptron: Decision Function
45

⮚ Output
⮚ The figure shows how the decision function squashes wTx to
either +1 or -1 and how it can be used to discriminate between
two linearly separable classes.
Perceptron
46

⮚ Perceptron has the following characteristics:


⮚ Perceptron is an algorithm for Supervised Learning of single layer binary
linear classifier.
⮚ Optimal weight coefficients are automatically learned.
⮚ Weights are multiplied with the input features and decision is made if the
neuron is fired or not.
⮚ Activation function applies a step rule to check if the output of the
weighting function is greater than zero.
⮚ Linear decision boundary is drawn enabling the distinction between the
two linearly separable classes +1 and -1.
⮚ If the sum of the input signals exceeds a certain threshold, it outputs a
signal; otherwise, there is no output.
⮚ Types of activation functions include the sign, step, and sigmoid
functions.
Summary Perceptron
47

⮚ The activation function to be used is a subjective decision based on the problem


statement and the form of the desired results.
⮚ If the learning process is slow or has vanishing or exploding gradients, change
the activation function to see if these problems can be resolved.
⮚ An artificial neuron is a mathematical function conceived as a model of
biological neurons, that is, a neural network.
⮚ A Perceptron is a neural network unit that does certain computations to detect
features or business intelligence in the input data. It is a function that maps its
input “x,” which is multiplied by the learned weight coefficient, and generates
an output value ”f(x).
⮚ ”Perceptron Learning Rule states that the algorithm would automatically learn
the optimal weight coefficients.
⮚ Single layer Perceptrons can learn only linearly separable patterns.
⮚ Multilayer Perceptron or feedforward neural network with two or more layers
have the greater processing power and can process non-linear patterns as well.
⮚ Perceptrons can implement Logic Gates like AND, OR, or XOR.
Sigmoid Neuron
48

⮚ Sigmoid neurons are the building block of the deep neural


networks.
⮚ Sigmoid neurons are similar to perceptrons, but they are
slightly modified such that the output from the sigmoid
neuron is much smoother than the step functional output from
perceptron.

https://towardsdatascience.com/sigmoid-neuron-deep-neural-networks-a4cd35b629d7
Why Sigmoid Neuron
49

⮚ Perceptron model takes several real-valued inputs and gives a


single binary output.
⮚ In the perceptron model, every input xi has weight wi
associated with it.
⮚ The weights indicate the importance of the input in the
decision-making process.
⮚ The model output is decided by a threshold Wₒ if the weighted
sum of the inputs is greater than threshold Wₒ output will be 1
else output will be 0.
⮚ In other words, the model will fire if the weighted sum is
greater than the threshold.
Why Sigmoid Neuron
50

⮚ From the mathematical representation, we might say that the thresholding


logic used by the perceptron is very harsh
Why Sigmoid Neuron
51

⮚ Consider the decision making process of a person, whether


he/she would like to purchase a car or not based on only one
input X1 — Salary and by setting the threshold b(Wₒ) = -10
and the weight W₁ = 0.2.
⮚ The output from the perceptron model will look like in the
figure shown below.
Sigmoid Neuron
52

⮚ In sigmoid neurons where the output function is much


smoother than the step function.
⮚ In the sigmoid neuron, a small change in the input only causes
a small change in the output as opposed to the stepped output.
⮚ There are many functions with the characteristic of an
“S” shaped curve known as sigmoid functions. The most
commonly used function is the logistic function.
Sigmoid Neuron
53

⮚ The inputs to the sigmoid neuron can be real numbers unlike the
boolean inputs in MP Neuron and the output will also be a real
number between 0–1.
⮚ In the sigmoid neuron, we are trying to regress the relationship
between X and Y in terms of probability.
⮚ Even though the output is between 0–1, we can still use the
sigmoid function for binary classification tasks by choosing
some threshold.
Sigmoid Neuron
54

⮚ Learning Algorithm
⮚ algorithm for learning the parameters w and b of the sigmoid
neuron model by using the gradient descent algorithm.

⮚ The objective of the learning algorithm is to determine the best


possible values for the parameters, such that the overall loss
(squared error loss) of the model is minimized as much as
possible. Here goes the learning algorithm
Sigmoid Neuron
55
Sigmoid Neuron
56

⮚ Initialize w and b randomly, then iterate over all the


observations in the data, for each observation find the
corresponding predicted outcome using the sigmoid function
and compute the squared error loss.
⮚ Based on the loss value, we will update the weights such that
the overall loss of the model at the new parameters will be
less than the current loss of the model.
⮚ Loss optimization
Activation Functions, Loss Function
Sources:

1. https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
2. https://medium.com/@abhigoku10/activation-functions-and-its-types-in-artifical-ne
ural-network-14511f3080a8

3. https://medium.com/@zeeshanmulla/cost-activation-loss-function-neural-network-d
eep-learning-what-are-these-91167825a4de

4. https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-
should-i-use-ac02f1c56aa8

Final Year
BTECH Subject : Deep Learning (PE4)
A Simple Neural Network
58

A Neuron [2]:
• (x1,x2, …xn) - input signal vector
• (w1,w2,…wn) - weights
• accumulation ( i.e. summation + addition of bias b)
• an activation function f is applied to this sum

https://medium.com/@abhigoku10/activation-functions-and-its-types-in-artifical-neural-network-14511f30
80a8
Activation Function
59

⮚ An activation function is a very important feature of an


Artificial Neural Network to learn and understand the
complex patterns
⮚ A mathematical equation that determine the output of node
⮚ It helps to normalize the output of each neuron to a range
between 1 and 0 or between -1 and 1
⮚ It is also known as Transfer Function
Activation Function
60

⮚ The function
⮚ attached to each neuron in the network
⮚ determines whether neuron should be activated (“fired”)
or not,
⮚ based on whether each neuron’s input is relevant for the
model’s prediction
⮚ Computationally efficient (calculated across
thousands/millions of neurons for each data sample)
⮚ The need for speed has led to the development of
new functions such as ReLu
Activation Function (Types)
61

⮚ The Activation Functions [1] can be basically


divided into 2 types-

⮚ Linear Activation Function

⮚ Non-linear Activation Functions

https://medium.com/@abhigoku10/activation-functions-and-its-types-in-artifical-neural-network-1451
1f3080a8
Activation Function (Linear)
62

⮚ Linear Activation Function:


⮚ Equation : f(x) = x
⮚ Range : (-infinity to infinity)
⮚ It doesn’t help with the complex data
Activation Function (Non-Linear)
63

⮚ Non-linear Activation Functions


⮚ The model generalizes or adapts with variety of data
(images, video, audio, and have high dimensionality)
⮚ It allows backpropagation (derivative function which is
related to the inputs)
⮚ Create a deep neural network (“stacking” of multiple
layers of neurons)
Activation Function (Non-Linear)
64

◻ The Nonlinear Activation Functions are mainly


divided on the basis of their range or curves
◻ Sigmoid or Logistic Activation Function
🞑 A S-shape curve
🞑 Predict the probability
🞑 (0 and 1)
🞑 N/W can stuck at the
training time
Activation Function (Non-Linear)
65

⮚ Tanh or hyperbolic tangent Activation Function


⮚ tanh is also sigmoidal (S - shaped) with range -1 to 1
⮚ the negative inputs will also be mapped
⮚ Mostly tanh &logistic sigmoid are used in feed-forward n/w
Activation Function (Non-Linear)
66

⮚ ReLU (Rectified Linear Unit) Activation


Function
⮚ Widely used function (DL and CNN)
⮚ ReLU is half rectified (from bottom)
⮚ f(z) is zero when z is less than zero
⮚ f(z) is equal to z when z is above or equal to zero
Activation Function (Non-Linear)
67

⮚ ReLU Advantages
⮚ Computationally efficient—allows the network to
converge very quickly
⮚ Non-linear—although it looks like a linear function, ReLU
has a derivative function and allows for backpropagation

⮚ Disadvantages
⮚ The Dying ReLU problem—when inputs approach zero,
or are negative, the gradient of the function becomes zero,
the network cannot perform backpropagation and cannot
learn
Activation Function (Non-Linear)
68

⮚ Leaky ReLU
⮚ Attempt to solve the dying ReLU problem (a small positive
slope in the negative area enables backpropagation)
⮚ The leak helps to increase the range of the ReLU function
⮚ f(x) = ax for x<0 and f(x) = x for x>0
⮚ Range : (0.01 to infinity)
⮚ When a is not 0.01 then it is called Randomized ReLU.
Activation Function (Non-Linear)
69

Src: Sze, Vivienne & Chen, Yu-Hsin & Yang, Tien-Ju & Emer, Joel. (2017). Efficient Processing of Deep Neural Networks: A Tutorial and Survey.
Proceedings of the IEEE. 105.
Activation Function
70

⮚ Heuristics to apply activation function


⮚ Sigmoid functions and their combinations generally work better
in the case of classification problems
⮚ Sigmoids and tanh functions are sometimes avoided due to the
vanishing gradient problem
⮚ Tanh is avoided most of the time due to dead neuron problem
⮚ ReLU activation function is widely used as it yields better results
⮚ In case of dead neurons in the networks, the leaky ReLU
function is the best choice
⮚ ReLU function should only be used in the hidden layers
Summary of Activation Functions
71

⮚ Various activation functions that can be used with Perceptron are shown
here.

https://www.simplilearn.com/what-is-perceptron-tutorial
Loss Function
72

⮚ In a supervised deep learning context the loss function


measures the quality of a particular set of parameters based
on how well the output of the network agrees with the ground
truth labels in the training data

⮚ Loss function is a method of evaluating “how well the


algorithm models the dataset”
Nomenclature

loss function
=
cost function
=
objective function
=
error function
Loss function

How good is the network with the training data?


Deep Network

input output

labels (ground truth)


input

error parameters (weights, biases)


Loss function
75

⮚ If the predictions are totally off


⮚ Loss function will output a higher number
⮚ If the predictions are pretty good
⮚ Loss function output a lower number
⮚ Tune the algorithm to try and improve the model
⮚ Loss function will tell if its improving or not

⮚ ‘Loss’ helps to understand how much the predicted


value differ from actual value
https://medium.com/@zeeshanmulla/cost-activation-loss-function-neural-network-deep-learning-wh
at-are-these-91167825a4de
Types of Loss function
76

⮚ Regression Loss Function:


⮚ Regression models deals with predicting a continuous
value
⮚ Ex. floor area, number of rooms, size of rooms, predict the
price of the room.
⮚ The loss function used in the regression problem is
called “Regression Loss Function”
Types of Loss function
77

⮚ Binary Classification Loss Functions:


⮚ Binary classification is a prediction algorithm where
the output can be either 0 or 1
⮚ The output of binary classification algorithms is a
prediction score (mostly)
⮚ So the classification happens based on the threshold
the value (default value is 0.5)
⮚ If the prediction score > threshold then 1 else 0.
Types of Loss function
78

⮚ Multi-class Classification Loss Functions:

⮚ Multi-Class classification are those predictive


modeling problems where there are more target
variables/class
⮚ It is just the extension of binary classification problem
Loss Function
79
⮚ Regression: Predicting a numerical value
⮚ E.g. predicting the price of a product
⮚ The final layer of the neural network will have one
neuron and the value it returns is a continuous
numerical value
⮚ Compare the true value with predicted value

https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1
c56aa8
Loss Function
80

⮚ Mean squared error (MSE)


⮚ The average squared difference between the predicted value
and the true value
Loss Function
81

⮚ Root Mean Square error (RMSE)


⮚ Root Mean Square error is the extension of MSE
⮚ Its the average of square root of sum of squared differences
between predictions and actual observations
Loss Function
82

⮚ Binary Cross Entropy Loss Function


⮚ Binary cross entropy measures how far away from the
true value (which is either 0 or 1) the prediction is for
each of the classes
⮚ It averages these class-wise errors to obtain the final loss
⮚ Cross entropy is the difference between two probability
distributions p and q, where p is our true output and q is
our estimate of this true output
⮚ This difference is applied to neural networks
References
83

• https://towardsdatascience.com/activation-functions-neural-ne
tworks-1cbd9f8d91d6

• https://medium.com/@abhigoku10/activation-functions-and-it
s-types-in-artifical-neural-network-14511f3080a8

• https://medium.com/@zeeshanmulla/cost-activation-loss-func
tion-neural-network-deep-learning-what-are-these-91167825a
4de

• https://towardsdatascience.com/deep-learning-which-loss-and
-activation-functions-should-i-use-ac02f1c56aa8
References
84

• https://www.simplilearn.com/what-is-perceptron-tutorial
• https://towardsdatascience.com/sigmoid-neuron-deep-neural-
networks-a4cd35b629d7
• https://www.javatpoint.com/artificial-neural-network

You might also like