Deep Learning Tutorial 9
Deep Learning Tutorial 9
vs DL
DL
you think,
what is Deep
Learning
Deep learning is a machine
learning technique that learns
features and task directly from
the data, where data may by
images, text or sound!
What is Neural
Neural Working
Artificial
Normalize/Standardize
Neural Network
𝒎
∑ 𝒘𝒊 𝒙𝒊
𝒊= 𝒙
Applying Activation
Function
Artificial
Normalize/Standardize
Neural Network
𝒎
∑ 𝒘𝒊 𝒙𝒊
𝒊= 𝒙
Applying Activation
Function
What is an Activation Function?
Activation functions are an extremely important feature of the artificial neural networks.
They basically decide whether a neuron should be activated or not. Whether the
information that the neuron is receiving is relevant for the given information or should it
be ignored.
The activation function is the non linear transformation that we do over the input
signal. This transformed output is then seen to the next layer of neurons as input.
What we have done here is that we have simply replaced the horizontal line with a non-
zero, non-horizontal line. Here a is a small value like 0.01 or so.
Softmax Function (for Multiple
Classification)?
Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of
saying, this function will calculate the probabilities of each target class over all possible target classes. Later the
calculated probabilities will be helpful for determining the target class for the given inputs.
The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all
the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the
probabilities of each class and the target class will have the high probability.
The formula computes the exponential (e-power) of the given input value and the sum of exponential values of
all the values in the inputs. Then the ratio of the exponential of the input value and the sum of exponential
values is the output of the softmax function.
Activation Function Example
How Neural Network Work
and
Back Propagation in deep learning
How Neural Network Work with many
neurons
Back Propagation in deep learning
Back-propagation is the essence of neural net training. It is the method of
fine-tuning the weights of a neural net based on the error rate obtained in the
previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce
error rates and to make the model reliable by increasing its generalization.
W*x+b
(BGD)
Gradient Descent is an optimization technique that is used to
improve deep learning and neural network-based models by
minimizing the cost function.
Gradient Descent is a process that occurs in the backpropagation phase where the
goal is to continuously resample the gradient of the model’s parameter in the
opposite direction based on the weight w, updating consistently until we reach
the global minimum of function J(w).
More Precisely,
Gradient descent is an algorithm, which is used to iterate
through different combinations of weights in an optimal way.....to
find the best combination of weights which has a minimum error.
Brute force algorithm
Curse of dimensionality
Brute Force Algorithms refers to a programming style that does not include any shortcuts to
improve performance, but instead relies on sheer computing power to try all possibilities until the
solution to a problem is found. A classic example is the traveling salesman problem (TSP).
What is Gradient Descent
What is Gradient Descent
descent
The word ‘stochastic‘ means a system or a process that is linked with a random
probability. Hence, in Stochastic Gradient Descent, a few samples are selected
randomly instead of the whole data set for each iteration. In Gradient Descent,
there is a term called “batch” which denotes the total number of samples from a
dataset that is used for calculating the gradient for each iteration. In typical
Gradient Descent optimization, like Batch Gradient Descent, the batch is taken to
be the whole dataset. Although, using the whole dataset is really useful for
getting to the minima in a less noisy or less random manner, but the problem
arises when our datasets get really huge.
descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for
optimizing an objective function with suitable smoothness properties (e.g. differentiable
or subdifferentiable). ~Convex Loss function~
descent
descent
descent
Mini-batch gradient descent is a variation of the gradient
descent algorithm that splits the training dataset into small
batches that are used to calculate model error and update
model coefficients.
Implementations may choose to sum the gradient over the
mini-batch which further reduces the variance of the
gradient.
whereas
Backpropagation is a training algorithm
consisting of 2 steps:
•Feedforward the values.
•Calculate the error and propagate it back to
the earlier layers.
Network
A radial basis function (RBF) is a function that assigns a real value to each input from its domain (it is a real-value
function), and the value produced by the RBF is always an absolute value; i.e. it is a measure of distance and cannot
be negative.
f(x) = f(||x||)
Euclidean distance, the straight-line distance between two points in Euclidean space, is typically used.
Radial basis functions are used to approximate functions, much as neural networks act as function approximators.
RBF network represents a radial basis function network. The radial basis functions act as activation functions.
The approximant f(x) is differentiable with respect to the weights W, which are learned using iterative updater
methods coming among neural networks.
Radial basis function neural networks are extensively applied in power restoration systems. In recent
decades, power systems have become larger and more complex.
This increases the risk of blackout. This neural network is used in power restoration systems to restore power in
the least amount of time.
Convolutional Neural Network
Convolutional Neural Networks (CNN) is one of the variants of neural
networks used heavily in the field of Computer Vision. It derives its name
from the type of hidden layers it consists of. The hidden layers of a CNN
typically consist of convolutional layers, pooling layers, fully connected
layers, and normalization layers. Here it simply means that instead of using
the normal activation functions defined above, convolution and pooling
functions are used as activation functions.
Recurrent Neural Network
Recurrent Neural Network(RNN) are a type of Neural Network where the output from
previous step are fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other, but in cases like when it is required to predict the
next word of a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved this issue with the
help of a Hidden Layer. The main and most important feature of RNN is Hidden state, which
remembers some information about a sequence.
working principle
working principle
Step by Step
STEP 1: Randomly initialize the weights to small numbers close to 0 (but not 0)
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each
neuron's activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning).
Or: Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
Keras
Keras is a high-level neural networks API, written in Python and capable of
running on top of TensorFlow, CNTK, or Theano. It was developed with a focus
on enabling fast experimentation. Being able to go from idea to result with the
least possible delay is key to doing good research.
Use Keras if you need a deep learning library that:
Allows for easy and fast prototyping (through user friendliness, modularity, and
extensibility).
Eg: = 0.2
Parameters Tuning in ANN
Parameter VS Hyper-parameters
Model Parameters are something that a model learns on its own. For example,
1) Weights or Coefficients of independent variables in Linear regression model.
2) Weights or Coefficients of independent variables SVM.
3) Split points in Decision Tree.
Role of an optimizer
Optimizers update the weight parameters to minimize the loss function. Loss function acts as
guides to the terrain telling optimizer if it is moving in the right direction to reach the bottom
of the valley, the global minimum.
What is Optimizer and different types it!
Types of Gradient Descent: Different types of Gradient descents are
Gradient Descent with Momentum considers the past gradients to smooth out the update. It computes an exponentially
weighted average of your gradients, and then use that gradient to update your weights instead.
Gradient Descent with Momentum
During backward propagation, we use dW and db to update our parameters W
and b as follows:
W = W – learning rate * dW
b = b – learning rate * db
In momentum, instead of using dW and db independently for each epoch, we
take the exponentially weighted averages of dW and db.
VdW = β x VdW + (1 – β) x dW
Vdb = β x Vdb + (1 – β) x db
Where beta ‘β’ is another hyperparameter called momentum and ranges from 0
to 1. It sets the weight between the average of previous values and the current
value to calculate the new weighted average.
After calculating exponentially weighted averages, we will update our
parameters.
AdaGrad or adaptive gradient allows the learning rate to adapt based on parameters. It performs larger updates for
infrequent parameters and smaller updates for frequent one. Because of this it is well suited for sparse data (NLP or image
recognition). Another advantage is that it basically eliminates the need to tune the learning rate. Each parameter has its own
learning rate and due to the peculiarities of the algorithm the learning rate is monotonically decreasing. This causes the
biggest problem: at some point of time the learning rate is so small that the system stops learning
(i) The gradient component by using V, the exponential moving average of gradients (like in
momentum) and
(ii) The learning rate component by dividing the learning rate α by square root of S, the
exponential moving average of squared gradients (like in RMSprop).
W = W – learning rate * dW
VdW = β x VdW + (1 – β) x dW
Vdb = β x Vdb + (1 – β) x db
Converulation neural network (CNN)
But understanding first understand the two core field of A.I
• Image Processing
• Computer vision
Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. It is a type of signal processing in
which input is an image and output may be image or characteristics/features associated with
that image. Nowadays, image processing is among rapidly growing technologies. It forms core
research area within engineering and computer science disciplines too.
Computer Vision, often abbreviated as CV, is defined as a field of study that seeks to develop
techniques to help computers “see” and understand the content of digital images such as
photographs and videos.
Converulation neural network (CNN)
So, what is CNN
Converulation neural network (CNN)
So, what is CNN
Converulation neural network (CNN)
So, what is CNN
A convolutional neural network (CNN) is a specific type of artificial neural network that uses perceptrons, a
machine learning unit algorithm, for supervised learning, to analyze data. CNNs apply to image processing,
natural language processing and other kinds of cognitive tasks.