0% found this document useful (0 votes)
34 views8 pages

Assignment 4

Machine Learning SPPU BE COMPUTER Lab Manual Practical 4

Uploaded by

abhinavsargar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views8 pages

Assignment 4

Machine Learning SPPU BE COMPUTER Lab Manual Practical 4

Uploaded by

abhinavsargar1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Assignment no.

• Aim :- Implement Gradient Descent Algorithm to find the local minima


of a function. For example, Find the local minima of the function
y=(x+5)2 starting from the point x=3.

• Title :- Implement Gradient Descent Algorithm to find the local minima


function.

• Objective :-

• Prerequisite :-

• Theory :-
Gradient Descent is known as one of the most commonly used optimization
algorithms to train machine learning models by means of minimizing errors
between actual and expected results. Further, gradient descent is also used to
train Neural Networks.

Gradient Descent is defined as one of the most commonly used iterative


optimization algorithms of machine learning to train the machine learning and
deep learning models. It helps in finding the local minimum of a function.
The best way to define the local minimum or local maximum of a function
using gradient descent is as follows:
• If we move towards a negative gradient or away from the gradient of the
function at the current point, it will give the local minimum of that
function.

• Whenever we move towards a positive gradient or towards the gradient of


the function at the current point, we will get the local maximum of that
function.

 How does Gradient Descent work?


Before starting the working principle of gradient descent, we should
know some basic concepts to find out the slope of a line from linear
regression. The equation for simple linear regression is given as:
Y=mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts
on the y-axis.
he starting point(shown in above fig.) is used to evaluate the performance as it
is considered just as an arbitrary point. At this starting point, we will derive the
first derivative or slope and then use a tangent line to calculate the steepness of
this slope. Further, this slope will inform the updates to the parameters (weights
and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever
new parameters are generated, then steepness gradually reduces, and at the
lowest point, it approaches the lowest point, which is called a point of
convergence.

The main objective of gradient descent is to minimize the cost function or the
error between expected and actual. To minimize the cost function, two data
points are required:

o Direction & Learning Rate

These two factors are used to determine the partial derivative calculation
of future iteration and allow it to the point of convergence or local
minimum or global minimum

Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of
the cost function. If the learning rate is high, it results in larger steps but also
leads to risks of overshooting the minimum. At the same time, a low learning
rate shows the small step sizes, which compromises overall efficiency but gives
the advantage of more precision.
Types-
• Batch Gradient Descent
• Mini-Batch Gradient Descent
• Stochastic Gradient Descent

1. Batch Gradient Descent :-

Batch gradient descent (BGD) is used to find the error for each point in the
training set and update the model after evaluating all training examples. it is a
greedy approach where we have to sum over all examples for each update.

2. Stochastic Gradient Descent :-


Stochastic gradient descent (SGD) is a type of gradient descent that runs one
training example per iteration. Or in other words, it processes a training epoch
for each example within a dataset and updates each training example's
parameters one at a time.

3. Mini-batch Gradient Descent :-


Mini Batch gradient descent is the combination of both batch gradient descent
and stochastic gradient descent. It divides the training datasets into small batch
sizes then performs the updates on those batches separately.

Example:-
Find the local minima of the function y=(x+5)² starting from the point
x=3
Solution : We know the answer just by looking at the graph. y = (x+5)²

reaches it’s minimum value when x = -5 (i.e when x=-5, y=0). Hence x=-5 is

the local and global minima of the function.

Now, let’s see how to obtain the same numerically using gradient descent.

Step 1 : Initialize x =3. Then, find the gradient of the function, dy/dx = 2*(x+5).

Step 2 : Move in the direction of the negative of the gradient (Why?). But wait,

how much to move? For that, we require a learning rate. Let us assume

the learning rate → 0.01

Step 3 : Let’s perform 2 iterations of gradient descent


Step 4 : We can observe that the X value is slowly decreasing and should

converge to -5 (the local minima). However, how many iterations should we

perform?

Let us set a precision variable in our algorithm which calculates the difference

between two consecutive “x” values . If the difference between x values from 2

consecutive iterations is lesser than the precision we set, stop the algorithm

Challenges with the Gradient Descent


Although we know Gradient Descent is one of the most popular methods for
optimization problems, it still also has some challenges. There are a few
challenges as follows:
1. Local Minima and Saddle Point:
2. Vanishing and Exploding Gradient

Standard Deviation :-
A standard deviation (or σ) is a measure of how dispersed the data is in relation
to the mean. Low standard deviation means data are clustered around the mean,
and high standard deviation indicates data are more spread out.

It is represented using sigma (σ).

Formula :-

Mean :-
The Mean of a dataset is the sum of all values divided by the total number of
values. It's the most commonly used measure of central tendency and is often
referred to as the “average.”

Formula :-

Mode :-
The mode is the value that appears most frequently in a data set.
Median :-
The median is the middle number in a sorted, ascending or descending list of
numbers and can be more descriptive of that data set than the average.

Formula :-

You might also like