Linear Regression
with gradient
Descent
A guide by landofai.com
Code: Linear regression
Gradient Descent
Gradient Descent is the most popular optimization
strategy, used machine learning and deep learning
right now.
It can be combined with almost every algorithm
yet it is easy to understand. So, everyone planning
to go on the journey of machine learning should
understand this.
Intuition
It is simply used to find the values of the
parameters at which the given function reaches
its nearest minimum cost.
Intuition
"A gradient is the ratio which relates the input and output of a
function. How small changes drives changes in the output of the
function."
Suppose we have a function f(x) = x2. Then the derivative of the
function, f’(x) is 2*x. It means if the x is changed 1 unit then f(x) will
change 2*1.
1. A blindfolded person starts at top of hill.
2. Checks for the steepest direction
downward on that point.
3. Take a step in that direction.
4. Again checks for the steepest direction
towards downward.
5. Repeat until the steep/gradient is
acceptable or is flat.
Math proving this
The equation below shows how it's done: 'x(next)' is the new
position of the person, 'x(current)' is the current position,
subtraction means that we 'gamma is the step and 'nabla f(x)'
is the gradient showing the steepest direction.
Let's take another example
of Linear regression
technique in machine
learning,
We have to find optimal 'w'
and 'b' for the cost function
J(w, b) where J is minimum.
below is an illustration of
convex function, (w and b)
are represented on the
horizontal axes, while J(w, b)
is represented on the vertical
axis.
Learning rate
The steps which are taken to reach optimal point decides the rate
of gradient descent. It is often referred to as 'Learning rate'(i.e.,
The size of the steps).
➔ Too big
bounce between the convex function and may not reach
the local minimum.
➔ Too small
gradient descent will eventually reach the local minimum
but it will take too much time for that
➔ Just right
gradient descent will eventually reach the local minimum
but it will take too much time for that
Gradient Descent types
● Batch Gradient Descent
A.k.a. Vanilla Gradient Descent. Calculates error for each
example. Model is updated only after an epoch.
● Stochastic Gradient descent
SGD unlike vanilla, iterates over each example while updating
the model. Frequent updates can be computationally more
expensive.
● Mini Batch Gradient Descent
a combination of concepts of both SGD and Batch Gradient
Descent.
○ Splits data into batches then performs update on
batches balancing between the efficiency of batch
gradient descent and the robustness of SGD.
Linear Regression.
Just Give me The code.: GradientDescentDemo
Visitors to his store, mostly tourists, speak many
different languages making anything beyond a simple
transaction a challenge.
Y=mX+b
1. Our goal is to best fit a line for given
points.
2. Start by random m and b.
3. Calculate error between predicted Y and
true Y.
4. Adjust m and b with gradient descent
5. Repeat until satisfactory result is achieved.
Error
Here, Mean Squared Error(MSE)
How to update m and b
Updated value = old value - learning rare * gradient
Good luck!
Thank You for your interest.
AI Adventures

Linear regression with gradient descent

  • 1.
    Linear Regression with gradient Descent Aguide by landofai.com Code: Linear regression
  • 2.
    Gradient Descent Gradient Descentis the most popular optimization strategy, used machine learning and deep learning right now. It can be combined with almost every algorithm yet it is easy to understand. So, everyone planning to go on the journey of machine learning should understand this.
  • 3.
    Intuition It is simplyused to find the values of the parameters at which the given function reaches its nearest minimum cost.
  • 4.
    Intuition "A gradient isthe ratio which relates the input and output of a function. How small changes drives changes in the output of the function." Suppose we have a function f(x) = x2. Then the derivative of the function, f’(x) is 2*x. It means if the x is changed 1 unit then f(x) will change 2*1.
  • 5.
    1. A blindfoldedperson starts at top of hill. 2. Checks for the steepest direction downward on that point. 3. Take a step in that direction. 4. Again checks for the steepest direction towards downward. 5. Repeat until the steep/gradient is acceptable or is flat.
  • 6.
    Math proving this Theequation below shows how it's done: 'x(next)' is the new position of the person, 'x(current)' is the current position, subtraction means that we 'gamma is the step and 'nabla f(x)' is the gradient showing the steepest direction.
  • 7.
    Let's take anotherexample of Linear regression technique in machine learning, We have to find optimal 'w' and 'b' for the cost function J(w, b) where J is minimum. below is an illustration of convex function, (w and b) are represented on the horizontal axes, while J(w, b) is represented on the vertical axis.
  • 8.
    Learning rate The stepswhich are taken to reach optimal point decides the rate of gradient descent. It is often referred to as 'Learning rate'(i.e., The size of the steps). ➔ Too big bounce between the convex function and may not reach the local minimum. ➔ Too small gradient descent will eventually reach the local minimum but it will take too much time for that ➔ Just right gradient descent will eventually reach the local minimum but it will take too much time for that
  • 10.
    Gradient Descent types ●Batch Gradient Descent A.k.a. Vanilla Gradient Descent. Calculates error for each example. Model is updated only after an epoch. ● Stochastic Gradient descent SGD unlike vanilla, iterates over each example while updating the model. Frequent updates can be computationally more expensive. ● Mini Batch Gradient Descent a combination of concepts of both SGD and Batch Gradient Descent. ○ Splits data into batches then performs update on batches balancing between the efficiency of batch gradient descent and the robustness of SGD.
  • 11.
    Linear Regression. Just Giveme The code.: GradientDescentDemo Visitors to his store, mostly tourists, speak many different languages making anything beyond a simple transaction a challenge.
  • 12.
    Y=mX+b 1. Our goalis to best fit a line for given points. 2. Start by random m and b. 3. Calculate error between predicted Y and true Y. 4. Adjust m and b with gradient descent 5. Repeat until satisfactory result is achieved.
  • 13.
  • 14.
    How to updatem and b Updated value = old value - learning rare * gradient
  • 15.
    Good luck! Thank Youfor your interest. AI Adventures