DL Activation Functions Question Bank
DL Activation Functions Question Bank
To train the model, we give input (departure location, arrival location and,
departure date in
case of train price prediction) to the network and let it predict the output making
use of activation function. Then, we compare predicted output with the actual
output and compute the error between two values. This error between two
values is computed using loss/ cost function. The same process is repeated for
entire training dataset and we get the average loss/error. Now, the objective is to
minimize this loss to make the model accurate. There exist weights between
each connection of 2 neurons. Initially, weights are randomly initialized and the
motive is to update these weights with every iteration to get the minimum value
of the loss/ cost function. We can change the weights randomly but that is not
efficient method. Here comes the role of optimizers which updates weights
automatically.
Loss function is chosen based on the problem.
a.Regression Problem
23. What are different loss functions and their use case?
Once loss for one iteration is computed, optimizer is used to update weights.
Instead of
changing weights manually, optimizers can update weights automatically in
small increments and helps to find the minimum value of the loss/ cost function.
Magic of DL!! Finding minimum value of cost function requires iterating through
dataset many times and thus requires large computational power. Common
technique used to update these weights is gradient descent.
It is used to find minimum value of loss function by updating weights. There are 3
variants:
With the help of this very informative visualization about kernels, we can see
how the kernels work and how padding is done.
propagation, but its computation is worth it. Here is why— Let’s suppose we
have a neural network working without the activation functions. In that case,
every neuron will only be performing a linear transformation on the inputs using
the weights and biases. It’s because it
Sigmoid/Logistic Activation
Function Mathematically it can be
represented as:
Here’s why sigmoid/logistic activation function is one of the most widely used
functions:
It is commonly used for models where we have to predict the probability
as an output. Since probability of anything exists only between the range
of 0 and 1, sigmoid is the right choice because of its range.
The function is differentiable and provides a smooth gradient, i.e.,
preventing jumps in output values. This is represented by an S-shape of
the sigmoid activation function.
The limitations of sigmoid function are discussed below:
The derivative of the function is f'(x) = sigmoid(x)*(1-sigmoid(x)).
Probability
The output of the sigmoid function was in the range of 0 to 1, which can be
thought of as probability. But — This function faces certain problems. Let’s
suppose we have five output values of 0.8, 0.9, 0.7, 0.8, and 0.6, respectively.
How can we move forward with it? The answer is: We can’t. The above values
don’t make sense as the sum of all the classes/output probabilities should be
equal to 1. You see, the Softmax function is described as a combination of
multiple sigmoids. It calculates the relative probabilities. Similar to the
sigmoid/logistic activation function, the SoftMax function returns the probability
of each class. It is most commonly used as an activation function for the last
layer of the neural network in the case of multi-class classification.
Mathematically it can be represented as:
Softmax
Function Let’s go over a simple example
together.
Assume that you have three classes, meaning that there would be three neurons
in the output layer. Now, suppose that your output from the neurons is [1.8, 0.9,
0.68]. Applying the softmax function over these values to give a probabilistic
view will result in the following outcome: [0.58, 0.23, 0.19]. The function returns
1 for the largest probability index while it returns 0 for the other two array
The reinforcement learning provides the means for robots to learn complex
behavior from interaction on the basis of generalizable behavioural primitives.
From the human negative feedback, the robot learns from its own misconduct.
Different Environment
actions
*********************