This document provides a guide on linear regression using gradient descent, explaining how gradient descent is a key optimization strategy in machine learning. It describes the intuition and mathematical principles behind gradient descent, along with different types such as batch, stochastic, and mini-batch gradient descent. Furthermore, it outlines the process for fitting a line to data points using linear regression, including error calculation and parameter updates.
Gradient Descent
Gradient Descentis the most popular optimization
strategy, used machine learning and deep learning
right now.
It can be combined with almost every algorithm
yet it is easy to understand. So, everyone planning
to go on the journey of machine learning should
understand this.
3.
Intuition
It is simplyused to find the values of the
parameters at which the given function reaches
its nearest minimum cost.
4.
Intuition
"A gradient isthe ratio which relates the input and output of a
function. How small changes drives changes in the output of the
function."
Suppose we have a function f(x) = x2. Then the derivative of the
function, f’(x) is 2*x. It means if the x is changed 1 unit then f(x) will
change 2*1.
5.
1. A blindfoldedperson starts at top of hill.
2. Checks for the steepest direction
downward on that point.
3. Take a step in that direction.
4. Again checks for the steepest direction
towards downward.
5. Repeat until the steep/gradient is
acceptable or is flat.
6.
Math proving this
Theequation below shows how it's done: 'x(next)' is the new
position of the person, 'x(current)' is the current position,
subtraction means that we 'gamma is the step and 'nabla f(x)'
is the gradient showing the steepest direction.
7.
Let's take anotherexample
of Linear regression
technique in machine
learning,
We have to find optimal 'w'
and 'b' for the cost function
J(w, b) where J is minimum.
below is an illustration of
convex function, (w and b)
are represented on the
horizontal axes, while J(w, b)
is represented on the vertical
axis.
8.
Learning rate
The stepswhich are taken to reach optimal point decides the rate
of gradient descent. It is often referred to as 'Learning rate'(i.e.,
The size of the steps).
➔ Too big
bounce between the convex function and may not reach
the local minimum.
➔ Too small
gradient descent will eventually reach the local minimum
but it will take too much time for that
➔ Just right
gradient descent will eventually reach the local minimum
but it will take too much time for that
10.
Gradient Descent types
●Batch Gradient Descent
A.k.a. Vanilla Gradient Descent. Calculates error for each
example. Model is updated only after an epoch.
● Stochastic Gradient descent
SGD unlike vanilla, iterates over each example while updating
the model. Frequent updates can be computationally more
expensive.
● Mini Batch Gradient Descent
a combination of concepts of both SGD and Batch Gradient
Descent.
○ Splits data into batches then performs update on
batches balancing between the efficiency of batch
gradient descent and the robustness of SGD.
11.
Linear Regression.
Just Giveme The code.: GradientDescentDemo
Visitors to his store, mostly tourists, speak many
different languages making anything beyond a simple
transaction a challenge.
12.
Y=mX+b
1. Our goalis to best fit a line for given
points.
2. Start by random m and b.
3. Calculate error between predicted Y and
true Y.
4. Adjust m and b with gradient descent
5. Repeat until satisfactory result is achieved.