optim
optim
THIS BOOK
This book is written (typed) by
Ari, who hails from the South
and has keen interests in
Computer Science, Biology,
and Tamil Literature.
Occasionally, he updates his
website, where you can reach
out to him.
https://arihara-sudhan.github.io
PREREQUISITES
To get the most out of this
book, you should be familiar
with some basics, which you
can learn from the following
book.
FITTING A LINE TO DATA
Assume we do have a data
distribution as shown below.
θ=θ−α⋅∇θJ(θ)
∇θJ(θ) is the gradient of the loss
function with respect to the
parameters.
STOCHASTIC GD
Gradient Descent (GD) has several
drawbacks, including the risk of
getting stuck in local minima or
saddle points, especially in non-
convex functions, which can lead
to suboptimal solutions. It is
sensitive to the learning rate,
requiring careful tuning to avoid
slow convergence or instability.
Stochastic Gradient Descent is a
variant of Gradient Descent where
the parameters are updated using
only a single training example at a
time, rather than the entire
dataset. SGD is more frequent
and faster, but the updates are
noisier and less stable. Over time,
this noise can help the algorithm
escape local minima and explore
more of the solution space.
θ=θ−α⋅∇θJ(θ;x ,y ) (i) (i)
MINI BATCH GD
Mini-Batch Gradient Descent
is a compromise between
Batch Gradient Descent and
Stochastic Gradient Descent.
In this approach, instead of
using the entire dataset or just
a single example, we compute
the gradient using a small
batch of training examples.
We split the training data into
small batches (e.g., 32 or 64
samples per batch), and then
we calculate the gradient for
each batch and update the
parameters accordingly. This
reduces the computational
cost compared to Batch
Gradient Descent, while still
providing more stable update.
The update rule is as the
following:
θ=θ−α⋅∇θJ(θ;Xbatch,Ybatch)
Where Xbatch and Ybatch are the
features and labels of the
mini-batch.
LET’S LIGHT ON IT
Let’s define the network first.
For RMSProp,
And this is for Adam...
MERCI