0% found this document useful (0 votes)

46 views19 pages

Gradient Descent in AI Optimization

Uploaded by

26trkdrmwd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views19 pages

Gradient Descent in AI Optimization

Uploaded by

26trkdrmwd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unit 5

Gradient descent method

Table of Contents

5.1 What is Optimization?

5.1.1 Definition of Optimization
5.1.2 Types of Optimization Problems
5.1.3 Applications in AI and Machine Learning
5.1.4 Real-world Examples of Optimization

5.2 Understanding Gradient Descent

5.2.1 Introduction to Gradient Descent
5.2.2 Types of Gradient Descent
5.2.3 Use Cases
5.2.4 Cumulative Distribution Function

5.3 Using Gradient Descent in AI

5.3.1 Gradient Descent in Machine Learning Algorithms
5.3.2 Using Gradient Descent in AI
5.3.3 Gradient Descent in Logistic Regression

5.4 Python Basics for Gradient Descent

5.4.1 Implementing Gradient Descent in Python
5.4.2 Examples of Gradient Descent in Python

By the end of this unit, you should be able to:

● Gain a clear understanding of optimization and its significance in AI and machine learning.

● Comprehend the concept of gradient descent, its mathematical foundation, and how it helps in

minimizing functions.

● Analyze how gradient descent is applied in training AI models and optimizing machine learning

algorithms.

● Develop basic Python skills to implement the gradient descent algorithm and apply it to solve

AI-related problems.

Unit 5 Gradient descent method 1

● Explore how the choice of learning rate and other parameters impacts the convergence and

performance of gradient descent in real-world AI applications.

Opening Case: Gradient Descent's Role in Search

Optimization—The Google Search Ranking Case
Google, a leader in search technology,
processes billions of search queries daily,
relying heavily on machine learning models
to provide accurate and relevant results. To
refine these models, Google employs
optimization techniques, particularly Gradient
Descent, to continually improve the accuracy
of search result rankings and ad
recommendations.

Each search query Google processes has

multiple possible results, and delivering the
most relevant answer depends on highly
optimized ranking algorithms. Gradient
Descent is fundamental here: it helps
Google’s machine learning models minimize
the error in predicting the relevance of each link in search results. By iteratively adjusting weights
within the model, Gradient Descent moves the model parameters towards values that yield the best
possible results — minimizing irrelevant or misleading results at the top of the list and pushing the
most relevant ones forward.

One specific application is in Google's RankBrain algorithm, an AI system Google uses to understand
complex search queries. Gradient Descent plays a vital role in training RankBrain to analyze and
interpret patterns in vast amounts of search data, gradually improving the model’s predictive
accuracy. Google employs Mini-Batch Gradient Descent, where small subsets of data (rather than
entire datasets) help the model update its parameters more frequently and efficiently, balancing
computational load and precision.

Through this approach, Google has significantly enhanced the user experience, making search results
faster and more relevant. This case study exemplifies how Gradient Descent in a high-stakes, real-
time environment like search optimization can support continuous improvements in AI, refining the
accuracy of predictions and relevancy, and ultimately meeting user expectations more effectively.

Discussion Questions

● Evaluate the effectiveness of Gradient Descent in handling search ranking challenges such
as relevancy and accuracy. How could other optimization methods compare in terms of
performance and computational efficiency?

● Analyze how Google’s choice of Mini-Batch Gradient Descent might influence the speed and
accuracy of its search ranking model. What potential trade-offs could arise from using mini-
batches instead of the full dataset?

5.1 What is Optimization?

5.1.1 Definition of Optimization.

Unit 5 Gradient descent method 2

Definition

Optimization

Optimization is the process of finding the best solution from a set of possible choices by maximizing
or minimizing a specific objective function. In simpler terms, it involves selecting the most efficient or
effective option based on desired outcomes, often within certain constraints.

In mathematical terms, optimization aims to find values for variables that maximize or minimize an
objective function, subject to given conditions. This process is widely used in fields like engineering,
economics, and machine learning to improve decision-making and performance.

Optimization is essentially about making things as good as possible. In everyday life, it’s like planning
the quickest route home, trying to spend the least money, or finding the best way to solve a problem.
In these cases, you’re making choices to achieve a particular goal—whether that’s saving time,
cutting costs, or reaching the best outcome.

In machine learning and AI, optimization is a crucial part of training models. We use it to find the best
settings or parameters that help the model perform as accurately as possible. The better the
optimization, the better the model becomes at making accurate predictions or decisions. The process
of optimization involves adjusting the model’s parameters (like weights and biases) to reduce the error
or cost as much as possible. In machine learning, we commonly use algorithms like Gradient Descent
to perform this optimization. Gradient Descent is a technique that helps us navigate towards the
minimum value of the objective function by:

Calculating the gradient (slope) of the objective function with respect to each parameter.

Moving the parameters in the direction that reduces the error (down the slope) until we reach the
minimum. The ultimate goal of optimization is to find the values of the parameters that minimize the
error in our model’s predictions. This way, the model becomes more accurate and reliable in making
predictions.

Optimization in Mathematical Terms

In math, optimization means finding the values for one or more variables (let’s call them x) that either
maximize or minimize an objective function f(x).

1. Objective Function, f(x):

○ This is the function we want to either maximize (get the highest possible value) or
minimize (get the lowest possible value).

○ For example, in AI, f(x) could be the error rate of a model. In this case, we’d aim to
minimize it to make the model more accurate.

2. Variables x:

○ These are the parameters or inputs we can adjust to achieve the best outcome.

○ For instance, in machine learning, x could be the weights of a neural network, which
are adjusted to minimize error during training.

3. Constraints:

Unit 5 Gradient descent method 3

○ Sometimes, we can’t adjust the variables any way we want due to limitations or
boundaries (e.g., budget limits, physical constraints).

○
Mathematically, these are expressed as conditions on x that must be met (e.g., x0.

The Optimization Problem

The general form of a mathematical optimization problem looks like this:

Optimize (maximize or minimize) f(x) subject to constraints on x.

In a machine learning context, we aim to minimize a function that represents the error of the model
(known as the loss function) or maximize a function that represents accuracy or performance. During
training, optimization algorithms adjust the model’s parameters to find the “best” values—those that
make the error as small as possible or the accuracy as high as possible.

Optimization Techniques in AI and Machine Learning

1. Gradient Descent: A widely-used optimization algorithm that gradually adjusts model
parameters in the direction that reduces the error the most.

2. Stochastic Optimization: In machine learning, we often deal with large datasets, so instead
of computing exact changes, we use random samples (like in stochastic gradient descent) to
approximate the best direction to update parameters.

3. Constrained Optimization: Often in engineering or finance, where certain conditions (e.g.,

resource limits) must be respected, leading to techniques that consider such constraints.

In mathematics and machine learning, optimization is all about finding the best possible outcome or
parameters for a given problem. Think of it like a quest to improve or perfect a model's performance.
Usually, this means adjusting certain parameters or values to either maximize or minimize a specific
outcome, known as the objective function.

5.1.2 Types of Optimization Problems

Optimization problems are all about finding the best solution from a range of possible options, often
within certain limits or boundaries. Let's break down the main types of optimization problems and see
how each type fits into real-world scenarios.

1. Unconstrained vs. Constrained Optimization

● Unconstrained Optimization: Here, we try to find the optimal value (like a maximum or
minimum) for a function without any restrictions on the variables. Imagine you’re trying to find
the highest or lowest point on a hill—there are no limits on where you can look.

2
x Example : A simple math function like f(x) = can be optimized without any constraints.
We can freely adjust x to get the lowest value for f(x) (in this case, it’s zero when x=0).

● Constrained Optimization: This involves optimizing a function, but with specific conditions or
boundaries on the variables. Imagine you’re climbing a hill again, but this time there’s a fence
around part of it, so you can’t explore beyond that fence. Constraints could be things like
limits on resources, budgets, or physical boundaries.

Unit 5 Gradient descent method 4

Example :

Minimizing production costs (the function to optimize) while staying within a specific budget
constraint. Or, maximizing profits but ensuring the investment doesn’t exceed a certain risk level.

2. Single-Objective vs. Multi-Objective Optimization

● Single-Objective Optimization: In this type, we focus on optimizing just one objective, like
maximizing efficiency or minimizing error. It’s straightforward since we only care about one
goal.

Example :

When training a machine learning model, we may want to minimize the error rate, focusing only on
getting the lowest error. This is a single-objective optimization because our sole concern is model
accuracy.

● Multi-Objective Optimization: Here, we aim to balance multiple objectives that often conflict
with each other. Think of it as trying to achieve the best trade-off among several goals. This
situation often arises when there are competing demands or when trying to satisfy different
requirements simultaneously.

Example :

In designing a neural network, we might want both high accuracy and low computational cost.
Optimizing for both can be tricky since increasing accuracy might mean using a more complex
model, which could also increase computation time. So, we need to find a balance that meets both
needs reasonably well.

Unit 5 Gradient descent method 5

5.1.3 Applications in AI and Machine Learning
In AI and machine learning, optimization is like tuning and refining various aspects of a system to
make it work more effectively. When we train machine learning models, optimization becomes
essential because it adjusts the model’s parameters to reach the best possible performance, ensuring
accurate predictions, efficient computations, and lower error rates.

Key Applications of Optimization in AI and Machine Learning

1. Model Training

During model training, the model learns from data by adjusting internal parameters to minimize errors
and make accurate predictions. This process involves an optimization algorithm that adjusts these
parameters to minimize a loss function, which essentially measures the difference between predicted
and actual results.

For instance, gradient descent is one of the most common optimization algorithms used here. In
gradient descent, the algorithm updates parameters iteratively, moving them toward values that
minimize the loss function. This is done step-by-step, like finding the quickest path downhill by taking
small, calculated steps toward the lowest point, which corresponds to the least error.

2. Hyperparameter Tuning

Hyperparameters are like settings that define how the training process should be conducted, including
factors like the learning rate (how big each step in gradient descent should be) and batch size (how
much data the model processes at a time). By optimizing these hyperparameters, we can help the
model learn better and avoid issues like overfitting (where the model performs well on training data
but poorly on new data).

Optimization in Neural Networks

In neural networks, optimization is essential for adjusting weights and biases in neurons, which are
like knobs and dials that the model can tweak to recognize patterns in data.

1. Weight Optimization: Neural networks adjust weights through optimization algorithms like
gradient descent, ensuring the model accurately captures patterns in the data. This process is
repeated across each layer of neurons, creating a network that improves with each step.

2. Backpropagation: Optimization also plays a major role in backpropagation, an algorithm that

works alongside gradient descent to adjust the model’s parameters. In backpropagation, the
error (or loss) is calculated at the output and then "propagated" backward through the
network, adjusting weights to reduce this error.

Optimization in Computer Vision

Optimization is vital in computer vision tasks where we deal with images, such as object detection and
image compression. Optimization is all about finding the best solution, whether that’s reducing error,
maximizing efficiency, or achieving a balance between speed and accuracy.

1. Object Detection: In object detection, optimization algorithms adjust bounding boxes

(rectangles drawn around detected objects) to best fit the objects and minimize classification
errors. The algorithm iteratively improves these boxes to make sure they surround the objects
precisely, reducing misclassifications.

Unit 5 Gradient descent method 6

2. Image Compression: Optimization is also used to reduce image file sizes without losing
significant quality. By optimizing pixel information and compression parameters, algorithms
can produce smaller image files, which are quicker to process and store, without noticeable
loss of visual quality.

5.1.4 Real-world Examples of Optimization

Optimization is a problem-solving approach where
we aim to make things as efficient or effective as
possible. Whether it’s finding the quickest route,
saving money, or maximizing outcomes,
optimization helps us make better decisions.
Whether you’re choosing the fastest route to
school, finding the cheapest shopping options, or
even organizing your time for maximum
productivity, you’re constantly optimizing. By
looking at options and choosing the one that
meets your goals most effectively, you make
decisions that save you time, money, or effort; an
efficient use of resources. Optimization is just a
way of thinking that helps us find the “best” choice
among alternatives, even if we’re doing it
unconsciously. Let’s look at these examples more deeply to see how optimization appears in
everyday life without us even realizing it.

Real-World Example 1: Choosing the Fastest Route to School

Imagine you’re trying to reach school in the shortest time possible. You might have a few different
routes to choose from:

1. The Main Road might have fewer turns but often has more traffic.
2. The Side Roads could have fewer cars but more stop signs or slower speed limits.
3. The Shortcut Through the Park may be shorter but only works if it’s a dry day.

Here, the optimization problem is to find the route that minimizes the time it takes to get to school.

Key Elements:

● Variables: Different routes you could take.

● Objective: Minimize the time to reach school.

So, you might consider factors like distance, traffic, weather, and time of day. If it’s a rainy morning,
the shortcut might not work because it gets muddy, while the main road could be more reliable, albeit
longer. Without realizing it, you’re using optimization to weigh all these factors and pick the best
option.

In the end, you choose the route that has the best balance of distance, time, and ease. By doing this,
you’re solving an optimization problem, aiming to minimize time while still getting to school.

Real-World Example 2: Minimizing Costs in Shopping

Let’s say you’re working on a project and need to
buy several items. However, you have a limited
budget and want to spend as little as possible. You
know that some stores have cheaper options for
specific items, so you compare prices across a few
stores.

Unit 5 Gradient descent method 7

Key Elements:

● Variables: The items’ prices at different stores.

● Objective: Minimize total costs while buying all the necessary items.

You might start by listing the items you need, say paint, brushes, and paper, and then check prices
online or visit a few stores. You see that Store A has the cheapest brushes, Store B offers a discount
on paint, and Store C has lower prices on paper.

Now, you have an optimization problem: find the combination of items from different stores that will
give you everything you need for the lowest cost. Your decision could look like this:

1. Buy brushes from Store A.

2. Buy paint from Store B.
3. Buy paper from Store C.

The result? You’ve spent the least amount possible while getting all the necessary supplies. This is a
classic example of cost minimization, where you’re finding the combination that lets you save the most
while meeting your needs.

5.2 Understanding Gradient Descent

5.2.1 Introduction to Gradient Descent

Gradient Descent is a mathematical technique used to find the minimum of a function. In machine
learning, we use it to minimize the "error" or "loss" in a model, which means making the model's
predictions as accurate as possible. Imagine you’re adjusting the settings of a machine learning
model (like the weights of connections in a neural network) to make it perform better. Gradient
Descent helps you figure out how to adjust those settings gradually to reach the best performance.

In machine learning, we usually start with a model that isn’t very accurate, and we need to "train" it.
Training involves adjusting the model’s parameters (like weights) to reduce the error between the
model’s predictions and actual outcomes. Gradient Descent is an algorithm that helps us make these
adjustments efficiently, so the model improves step by step.

Intuitive Example : Hiking Down a Mountain

Imagine standing at the top of a mountain and wanting to reach the lowest point. You can't see the
bottom directly, so you look around to find the steepest downhill direction. You take a small step in
that direction, then reassess and take another step, repeating this process until you reach the
lowest point.

In Gradient Descent, the mountain represents the "error function" (a measure of how far off our
predictions are), and the bottom of the mountain is the minimum error we aim to achieve. Each step
downhill represents an adjustment to the model's parameters, guided by the steepness (or
"gradient") of the slope.

How Gradient Descent Works Step-by-Step

Gradient Descent is widely used because it’s efficient and works well in a variety of machine learning
models, especially where functions are too complex to solve directly. Gradient Descent is an
optimization algorithm, meaning it helps find the best (or "optimal") solution to a problem. In machine
learning, this often means finding the best model parameters that minimize error. The algorithm
doesn’t reach the best solution in one go. Instead, it iteratively adjusts the parameters, moving closer

Unit 5 Gradient descent method 8

and closer to the minimum error. The gradient gives direction and magnitude, showing us where to
adjust the parameters to improve the model.

● Gradient: A mathematical term that represents the slope or rate of change of a function. In
machine learning, it shows how much the error changes with each parameter.

● Learning Rate: This is the size of each step we take downhill. A large learning rate means
bigger steps (faster, but riskier), while a small learning rate means smaller steps (slower, but
safer).

● Initialize Parameters: Start with some initial values for the model’s parameters (like weights
in a neural network). These values can be random or set based on prior knowledge.

● Calculate Gradient: At each step, calculate the gradient (essentially, the slope) of the error
function. The gradient tells you the direction and rate at which the error is increasing or
decreasing. It’s like checking the steepness of the mountain to decide which way to step.

● Update Parameters: Using the gradient, adjust the parameters slightly in the opposite
direction of the gradient (downhill) to reduce the error. This step is like taking a small step
down the mountain.

● Repeat: Continue this process of calculating the gradient and updating the parameters until
the error can’t be reduced any further, or it reaches an acceptable low level. This is like
reaching the bottom of the mountain, where there’s no more slope to go down.

5.2.2 Types of Gradient Descent

Each of these gradient descent methods has its place, and the choice largely depends on the size of
the dataset, the available computational resources, and the specific needs of the task.

1. Batch Gradient Descent (Entire Dataset at Once)

● How It Works: In Batch Gradient Descent, the algorithm calculates the gradient of the entire
dataset before making any updates to the model’s parameters (e.g., weights and biases in
neural networks). This means each step in the descent process uses every single data point
to determine the direction in which the model should move to minimize error.

● Benefits: Since it uses the entire dataset, the gradient is calculated very accurately in each
step, making it a highly reliable approach for finding the global minimum. Batch Gradient
Descent is also stable, producing consistent updates that lead the model towards better
accuracy.

● Downsides: The main downside is speed. For large datasets, computing gradients for every
data point at each iteration can be very slow and computationally expensive, requiring
significant memory and processing power. This can make it challenging to use Batch Gradient
Descent in real-world applications with large datasets, especially without specialized
hardware like GPUs.

Batch Gradient Descent: Uses the entire dataset for each update, providing accuracy and stability
but often slow for large datasets.

2. Stochastic Gradient Descent (SGD) (One Data Point at a Time)

● How It Works: Unlike Batch Gradient Descent, Stochastic Gradient Descent updates model
parameters after each data point. As a result, it only needs to compute the gradient for one
example at a time rather than the entire dataset.

Unit 5 Gradient descent method 9

● Benefits: The key advantage here is speed. Since it only processes one data point per
update, SGD is much faster and requires less memory. This efficiency is especially useful for
very large datasets, where Batch Gradient Descent would be too slow. Additionally, because
SGD updates after every data point, it can often escape local minima that might trap Batch
Gradient Descent, leading to a better overall solution.

● Downsides: The downside of SGD is that it introduces noise. The path toward the minimum
is less stable, with the model parameters fluctuating around the ideal solution instead of
following a smooth path. This fluctuation can lead to a longer time to converge, though often
methods like learning rate decay or momentum are applied to counterbalance the instability.

Stochastic Gradient Descent (SGD): Updates after each data point, fast and efficient for large
datasets, but introduces noise and less stability.

3. Mini-Batch Gradient Descent (Small Batches of Data)

● How It Works: Mini-Batch Gradient Descent is a middle ground between Batch and
Stochastic Gradient Descent. Instead of using the entire dataset (like Batch Gradient
Descent) or just one data point (like SGD), it updates parameters after computing gradients
on small batches of data points. Typically, batch sizes range from 32 to 256 points, depending
on the dataset size and hardware capabilities.

● Benefits: Mini-Batch Gradient Descent combines the strengths of both approaches. It’s faster
than Batch Gradient Descent because it doesn’t need the entire dataset to update. It’s also
more stable than SGD, as it reduces the noise that comes with using just one data point,
providing smoother convergence. Many modern machine learning frameworks and neural
network training use Mini-Batch Gradient Descent by default for its efficiency and reliability.

● Downsides: Choosing the right batch size can be a challenge. If the batch size is too large,
the algorithm starts behaving more like Batch Gradient Descent, slowing down the training
process. If the batch size is too small, it might resemble SGD, reintroducing instability.
Additionally, it still requires more memory than pure SGD, though it’s far more manageable
than Batch Gradient Descent.

Mini-Batch Gradient Descent: Balances speed and stability by using small batches, making it well-
suited for practical use cases.

5.2.3 Cumulative Distribution Function

The Cumulative Distribution Function (CDF) is a key concept in probability and statistics, often used to
describe the behavior of random variables. Simply put, the CDF gives us the probability that a random
variable, say X , will take on a value that is less than or equal to a specific number, which we'll call х.
The CDF is useful because it gives us a way to understand how probabilities accumulate across
different values of a random variable. Instead of looking at the probability of specific outcomes (which
is what the Probability Density Function (PDF) does), the CDF helps us understand how likely it is that
the outcome will fall below any particular threshold. The CDF is like a running total of probabilities,
showing us how much of the total probability we’ve accounted for as we move along the range of
possible values for the random variable. It’s an essential tool for understanding distributions and
calculating probabilities across intervals.

Imagine you’re looking at the possible values that a random variable X can take. For example, if X
represents the number of hours it takes to finish a project, the CDF tells us, for any given number х,
the likelihood (or probability) that X is less than or equal to that number х.

Mathematically, this is expressed as: F(x)=P(Xx)

Unit 5 Gradient descent method 10

Here’s what this formula means:

● F(x) represents the CDF of X at the value x..

● P( X x) is the probability that X takes a value less than or equal to x.

Key Features of the CDF:

1. Always Increasing: The CDF is a function that starts at 0 for the lowest possible value of X
and gradually increases to 1 as X approaches its maximum possible value. It’s always
moving upward (or staying flat), but it never decreases. This makes sense because as you
consider larger and larger values of х, you're capturing more of the possible outcomes for X ,
meaning the probability accumulates.
2. Range Between 0 and 1: The CDF gives probabilities, so it always stays between 0 and 1.
When х is very small (for instance, less than the smallest possible value of X ), the CDF will
be 0, meaning the probability that X is less than or equal to х is 0. As х grows larger and
larger, eventually it will reach 1, indicating that there's a 100% probability that X is less than
or equal to х.
3. Complete Description of the Distribution: The CDF provides a full picture of the distribution
of the random variable. By knowing the CDF, you can answer any question about the
probability of X falling within any range. For example:

The probability that X

is between two numbers a and b can be found by subtracting the CDF values: P(aXb) = F(b) F(a)

Example for Intuition :

Let’s say we have a random variable X that represents a test score, which can range from 0 to 100.
Now, imagine we want to know the probability that someone scores 70 or below on the test.

● The CDF at 70, or F(70), gives us this probability. If F(70)=0.85, it means there’s an 85%
chance that someone will score 70 or less.
● As we increase the score x, say to 90, the CDF value will increase because the probability
that someone scores 90 or less is higher than the probability of scoring 70 or less. If
F(90)=0.95, it means there's a 95% chance of scoring 90 or less.

5.3 Using Gradient Descent in AI

Gradient Descent is an optimization algorithm commonly used to train machine learning models by
finding the minimum of a function. In the context of AI and machine learning, this function is typically
the cost or loss function which measures how well a model's predictions match the actual data. By
minimizing this function, gradient descent iteratively updates the model's parameters to optimize its
performance.

5.3.1 Gradient Descent in Machine Learning Algorithms

Gradient Descent is integral to training a wide range of machine learning algorithms. It works by
adjusting the weights or parameters of a model based on the gradient (or derivative) of the loss
function with respect to those parameters. Here’s a detailed look at its application in some common
algorithms:

Unit 5 Gradient descent method 11

of the linear equation 𝑦 = 𝑚𝑥 +𝑏 to minimize the cost function, typically the Mean Squared Error
Linear Regression: In linear regression, gradient descent is used to optimize the coefficients (weights)

(MSE). The algorithm adjusts the weights iteratively until it finds the values that minimize the error
between the predicted and actual target values.

Neural Networks: In deep learning, gradient descent is essential for training neural networks. During
the backpropagation phase, the gradients of the loss function are calculated with respect to each
weight and bias in the network. These gradients are used to adjust the weights in the direction that
minimizes the loss. Optimization algorithms like Stochastic Gradient Descent (SGD) and Adam are
used to enhance the basic gradient descent algorithm by introducing momentum or adaptive learning
rates.

Real-World Example: In image recognition, a convolutional neural network (CNN) uses gradient
descent to update its weights during training. The CNN processes images in layers, where each layer
learns to identify features (e.g., edges, textures, and objects). Gradient descent ensures the weights
in these layers are adjusted to minimize the difference between the predicted output and the actual
label, improving the model’s accuracy over time.

5.3.2 Using Gradient Descent in AI

Gradient descent is applied to various AI tasks, such as natural language processing (NLP), computer
vision, and recommendation systems. It is used for training models that require optimization over
large, multidimensional parameter spaces.

NLP: In training models for sentiment analysis or language translation, gradient descent is used to
optimize the weights of recurrent neural networks (RNNs) or transformers. This allows models to
adjust their parameters so they can correctly predict the sentiment of a sentence or translate phrases
between languages.

Computer Vision: Gradient descent helps train models for image classification or object detection.
For example, training a deep CNN for facial recognition involves updating weights in response to the
gradients calculated during backpropagation, which helps the model learn to differentiate between
various facial features.

Recommendation Systems: In systems like Netflix or Amazon, gradient descent can be used in
collaborative filtering algorithms to optimize the parameters of the recommendation function,
improving the accuracy of user-item recommendations.

Real-World Example: A company like Google uses gradient descent to train language models for
Google Translate. The model learns to translate between languages by minimizing the loss function
that measures the difference between the predicted translation and the actual translation.

5.3.3 Gradient Descent in Logistic Regression

Logistic Regression is a classification algorithm that is often used to predict binary outcomes (e.g.,
spam vs. non-spam emails). Unlike linear regression, which predicts continuous values, logistic
regression outputs probabilities. The sigmoid function

maps the input to a value between 0 and 1, which can be interpreted as a probability.

Unit 5 Gradient descent method 12

The objective in logistic regression is to minimize the log loss or cross-entropy loss function. Gradient
descent updates the weights in the model to minimize this loss. The algorithm calculates the gradient
of the loss function with respect to the weights and iteratively updates the weights in the opposite
direction of the gradient.

Real-World Example: In medical diagnosis, logistic regression can be used to predict the probability
of a patient having a disease based on input features like age, blood pressure, and cholesterol levels.
Gradient descent adjusts the model parameters to find the best fit that minimizes the prediction error,
making the model more accurate in classifying patients as having or not having the disease.

5.4 Python Basics for Gradient Descent

Python is a powerful programming language with a rich ecosystem of libraries for implementing
machine learning algorithms, including gradient descent. Libraries such as NumPy, TensorFlow,
PyTorch, and Scikit-learn are commonly used for this purpose.

5.4.1 Implementing Gradient Descent in Python

To implement gradient descent in Python, you need to understand how to update the model
parameters iteratively using the gradients calculated at each step.

Example 1: Using how to implement gradient descent for a simple linear regression model

python code:

import numpy as np

# Sigmoid function

def sigmoid(z):

return 1 / (1 + [Link](-z))

# Generate synthetic data

X = [Link]([[1, 2], [1, 3], [2, 3], [3, 4], [4, 5]])

y = [Link]([0, 0, 1, 1, 1])

# Initialize parameters

w = [Link]([Link][1]) # weights

b = 0 # bias

learning_rate = 0.01

iterations = 1000

Unit 5 Gradient descent method 13

m = len(X)

# Gradient Descent Algorithm

for i in range(iterations):

linear_model = [Link](X, w) + b

predictions = sigmoid(linear_model)

cost = - (1/m) * [Link](y * [Link](predictions) + (1 - y) * [Link](1 - predictions))

dw = (1/m) * [Link](X.T, (predictions - y))

db = (1/m) * [Link](predictions - y)

w -= learning_rate * dw

b -= learning_rate * db

if i % 100 == 0:

print(f"Iteration {i}, Cost: {cost}")

print(f"Trained model parameters: w = {w}, b = {b}")

Example 2: Using Python Libraries for Gradient Descent

Python libraries like Scikit-learn simplify gradient descent implementations. Here’s an example using
Scikit-learn for linear regression:

python code :

import numpy as np

# Sigmoid function

def sigmoid(z):

return 1 / (1 + [Link](-z))

# Generate synthetic data

Unit 5 Gradient descent method 14

X = [Link]([[1, 2], [1, 3], [2, 3], [3, 4], [4, 5]])

y = [Link]([0, 0, 1, 1, 1])

# Initialize parameters

w = [Link]([Link][1]) # weights

b = 0 # bias

learning_rate = 0.01

iterations = 1000

m = len(X)

# Gradient Descent Algorithm

for i in range(iterations):

linear_model = [Link](X, w) + b

predictions = sigmoid(linear_model)

cost = - (1/m) * [Link](y * [Link](predictions) + (1 - y) * [Link](1 - predictions))

dw = (1/m) * [Link](X.T, (predictions - y))

db = (1/m) * [Link](predictions - y)

w -= learning_rate * dw

b -= learning_rate * db

if i % 100 == 0:

print(f"Iteration {i}, Cost: {cost}")

print(f"Trained model parameters: w = {w}, b = {b}")

Theory to Practice
● A company wants to optimize its delivery routes to reduce fuel costs and delivery times.
Discuss how concepts from optimization and gradient descent could be adapted to solve
this problem. What challenges might arise in implementing such a solution?

Unit 5 Gradient descent method 15

● Think about planning a trip. How would you use optimization to decide the best route or
choose between modes of transport? What factors would you consider to make the
decision better?

● Imagine you’re picking a movie to watch, and you try different types until you find one you
love. How can this process of trial and error relate to the way models are trained to make
better predictions?

Summary

● The lesson starts with an introduction to optimization, explaining its definition, types of
problems, and applications in AI and Machine Learning. It progresses to real-world
examples to show how optimization techniques are used across industries.
● Optimization is the process of finding the best solution by minimizing or maximizing an
objective function within constraints.
● Types of optimization problems include linear, non-linear, convex, and combinatorial
optimization.
● Applications of optimization in AI and ML include model training, hyperparameter tuning,
and resource allocation.
● Real-world examples of optimization include route optimization in logistics and portfolio
optimization in finance.
● Gradient Descent is an iterative optimization algorithm that minimizes loss functions by
updating parameters in the direction of the steepest descent.
● Types of Gradient Descent include Batch, Stochastic, and Mini-batch Gradient Descent,
each with unique advantages for efficiency and stability.
● Gradient Descent is used in machine learning algorithms like linear regression, logistic
regression, and neural networks.
● Applications of Gradient Descent in AI include deep learning tasks such as image
classification and language translation.
● Logistic Regression uses Gradient Descent to minimize the cross-entropy loss function for
binary classification.
● Python enables Gradient Descent implementation from scratch using NumPy or through
advanced tools like TensorFlow, PyTorch, and Scikit-learn.

Exercise Your Mind

MCQs

Unit 5 Gradient descent method 16

Choose the correct answer from the options given below:

1. Understand What is the primary purpose of gradient descent in Artificial Intelligence?

a) To calculate partial derivatives
b) To optimise parameters and minimise a cost function
c) To formulate optimization problems using Lagrange multipliers
d) To compare different optimization algorithms

2. Apply Which theorem is used to solve optimization problems with constraints?

a) Langrange's theorem
b) Gradient descent theorem
c) Partial derivatives theorem
d) Cost function theorem

3. Identify Which variant of gradient descent processes the entire training dataset in each
iteration?
a) Batch gradient descent
b) Stochastic gradient descent
c) Mini-batch gradient descent
d) Adaptive gradient descent

4. Analyse What is the main advantage of stochastic gradient descent over batch gradient
descent?
a) It converges faster
b) It uses less computational resources
c) It avoids local minima
d) It provides more precise parameter updates

5. Explain Why is optimization important in the context of artificial intelligence?

a) It improves computational efficiency
b) It guarantees accurate predictions
c) It allows for personalised recommendations
d) It maximises performance and minimises errors

Short Answer Questions

Answer the following questions briefly:

1. Define gradient descent.

2. How is gradient descent calculated?

3. Explain Lagrange's theorem and its role in optimization.

4. Name the different variants of gradient descent.

5. Explain What is the significance of gradient descent in Artificial Intelligence?

Higher Order Thinking Skills (HOTS) Questions

Unit 5 Gradient descent method 17

Answer the following questions in detail:

1. What is the role of the learning rate in the gradient descent algorithm, and how does it
impact the optimization process?

2. Explain the concept of Lagrange's theorem and its application in optimization problems.
How does it help handle constraints in gradient descent optimization?

3. Compare and contrast the different variants of gradient descent, namely batch gradient
descent, stochastic gradient descent, and mini-batch gradient descent. What are the
advantages and limitations of each approach?

4. In the context of Artificial Intelligence, why is gradient descent considered a powerful

optimization technique? Discuss its significance and impact in improving AI models and
algorithms.

5. Reflect on the challenges and future advancements in gradient descent for Artificial
Intelligence. How can researchers and practitioners overcome the limitations and further
enhance the effectiveness of gradient descent in optimising AI systems?

Answers

MCQs
1. Correct answer: b) To optimise parameters and minimise a cost function. Explanation:
Gradient descent is primarily used in AI to optimise parameters by minimising a cost
function. It iteratively adjusts the parameters based on the computed gradients to find the
optimal values that minimise the cost.

2. Correct answer: a) Langrange's theorem. Explanation: Lagrange's theorem is employed to

solve optimization problems with constraints. It allows the formulation of constrained
optimization problems using Lagrange multipliers, enabling the optimization of the objective
function while satisfying the given constraints.

3. Correct answer: a) Batch gradient descent. Explanation: Batch gradient descent processes
the entire training dataset in each iteration. It computes the gradient using all the training
examples, which can be computationally expensive but provides accurate updates.

4. Correct answer: b) It uses less computational resources. Explanation: Stochastic gradient

descent offers the advantage of using less computational resources compared to batch
gradient descent. It updates the parameters based on a single training example at a time,
making it more suitable for large datasets and online learning scenarios.

5. Correct answer: d) It maximises performance and minimises errors. Explanation:

Optimization in AI plays a crucial role in maximising performance and minimising errors. By
optimising parameters and models through techniques like gradient descent, AI systems
can achieve higher accuracy, improve computational efficiency, and deliver better results,
such as personalised recommendations and accurate predictions.

Unit 5 Gradient descent method 18

Reference

● Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.

● Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT press.

● Russell, S. J., & Norvig, P. (2010). Artificial intelligence: A modern approach. Pearson
Education.

● Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In

Proceedings of the 19th international conference on computational statistics (COMPSTAT)
2010 (pp. 177-186). Springer.

● Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint

arXiv:1609.04381.

● Sutskever, I., Martens, J., Dahl, G. E., & Hinton, G. E. (2013). On the importance of
initialization and momentum in deep learning. arXiv preprint arXiv:1308.0859.

● Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.

Unit 5 Gradient descent method 19

Optimization Techniques in Data Analytics
No ratings yet
Optimization Techniques in Data Analytics
36 pages
Local Search in Linear Programming Optimization
No ratings yet
Local Search in Linear Programming Optimization
41 pages
Understanding Optimization Techniques
No ratings yet
Understanding Optimization Techniques
28 pages
Adagrad in Machine Learning Optimization
No ratings yet
Adagrad in Machine Learning Optimization
7 pages
AI Optimization Techniques Overview
No ratings yet
AI Optimization Techniques Overview
42 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
30 pages
Understanding Optimization in AI
No ratings yet
Understanding Optimization in AI
36 pages
Introduction to Mathematical Optimization
No ratings yet
Introduction to Mathematical Optimization
25 pages
Mathematical Optimisation Overview
No ratings yet
Mathematical Optimisation Overview
23 pages
Descent Algorithms in AI Optimization
No ratings yet
Descent Algorithms in AI Optimization
8 pages
Optimization Course by J. Zico Kolter
No ratings yet
Optimization Course by J. Zico Kolter
59 pages
Optimization Techniques in Machine Learning
No ratings yet
Optimization Techniques in Machine Learning
23 pages
Introduction to Mathematical Optimization
No ratings yet
Introduction to Mathematical Optimization
8 pages
Understanding Constraint Satisfaction Problems
No ratings yet
Understanding Constraint Satisfaction Problems
6 pages
Python Optimization for Machine Learning
0% (1)
Python Optimization for Machine Learning
21 pages
Optimization in Machine Learning Techniques
No ratings yet
Optimization in Machine Learning Techniques
18 pages
Understanding Optimization Concepts
No ratings yet
Understanding Optimization Concepts
34 pages
Introduction to Convex Optimization
No ratings yet
Introduction to Convex Optimization
32 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
27 pages
Gradient Descent in Neural Networks
No ratings yet
Gradient Descent in Neural Networks
13 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
12 pages
Continuous Optimization Techniques
No ratings yet
Continuous Optimization Techniques
25 pages
Understanding Gradient Descent Basics
No ratings yet
Understanding Gradient Descent Basics
9 pages
History and Applications of Optimization Methods
No ratings yet
History and Applications of Optimization Methods
27 pages
Cuckoo Search Algorithm: Advances & Applications
No ratings yet
Cuckoo Search Algorithm: Advances & Applications
6 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
6 pages
Neural Network Optimization Challenges
No ratings yet
Neural Network Optimization Challenges
11 pages
Computer-Based Optimization Techniques
No ratings yet
Computer-Based Optimization Techniques
148 pages
Convex Lecture 1
No ratings yet
Convex Lecture 1
22 pages
Unconstrained Optimization with NumPy
No ratings yet
Unconstrained Optimization with NumPy
14 pages
Optimization Models (Giuseppe C. Calafiore, Laurent El Ghaoui) (Z-Library)
No ratings yet
Optimization Models (Giuseppe C. Calafiore, Laurent El Ghaoui) (Z-Library)
648 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
36 pages
Cost Function and Gradient Descent Basics
No ratings yet
Cost Function and Gradient Descent Basics
142 pages
Mathematical Optimization Linear Program
100% (1)
Mathematical Optimization Linear Program
167 pages
Understanding Optimization Algorithms
No ratings yet
Understanding Optimization Algorithms
33 pages
Optimization Techniques for Gradient Descent
No ratings yet
Optimization Techniques for Gradient Descent
37 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
27 pages
Mathematical Optimization in Data Science
No ratings yet
Mathematical Optimization in Data Science
5 pages
Gradient Descent Optimization Techniques
No ratings yet
Gradient Descent Optimization Techniques
214 pages
Local Search Optimization Techniques
No ratings yet
Local Search Optimization Techniques
62 pages
Analytical vs Numerical Solutions in Optimization
No ratings yet
Analytical vs Numerical Solutions in Optimization
14 pages
Backpropagation and Gradient Descent Explained
No ratings yet
Backpropagation and Gradient Descent Explained
7 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
9 pages
Math 404: Optimization Techniques Overview
No ratings yet
Math 404: Optimization Techniques Overview
28 pages
CHP 3 Part One
No ratings yet
CHP 3 Part One
57 pages
Data Science Applications in Optimization
No ratings yet
Data Science Applications in Optimization
22 pages
Engineering Optimization Methods Guide
No ratings yet
Engineering Optimization Methods Guide
75 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
20 pages
Gradient Descent Explained with Examples
No ratings yet
Gradient Descent Explained with Examples
9 pages
Minimizing Gradient Problems in ML
No ratings yet
Minimizing Gradient Problems in ML
37 pages
Search Strategies in Soft Computing
No ratings yet
Search Strategies in Soft Computing
135 pages
Linear Optimization Lecture Notes
No ratings yet
Linear Optimization Lecture Notes
43 pages
Deep Learning and Gradient Descent Overview
No ratings yet
Deep Learning and Gradient Descent Overview
84 pages
Gradient Search in Machine Learning
No ratings yet
Gradient Search in Machine Learning
4 pages
Introduction to Optimization Concepts
100% (1)
Introduction to Optimization Concepts
22 pages
Optimization Techniques in ML by Paik
No ratings yet
Optimization Techniques in ML by Paik
37 pages
Gradient Descent for Cost Function Optimization
No ratings yet
Gradient Descent for Cost Function Optimization
14 pages
Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
62 pages
The Python Bible
97% (34)
The Python Bible
506 pages
Instant Orgasms: 7 Positions To Give Her An Instant Explosion
55% (58)
Instant Orgasms: 7 Positions To Give Her An Instant Explosion
16 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
92% (49)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
96% (24)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
95% (20)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Full Course of Machine Learning
100% (20)
Full Course of Machine Learning
660 pages
Best Sexual Positions For Having An Orgasm
39% (59)
Best Sexual Positions For Having An Orgasm
30 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
100% (16)
Python Programming for Beginners_ From Basics to AI Integrations. 5-Minute Illustrated Tutorials, Coding Hacks, Hands-On Exercises & Case Studies to Master Python in 7 Days and Get Paid More by Prince
244 pages
Machine Learning With Python
100% (17)
Machine Learning With Python
692 pages
Sex Positions - This Book Includes - Sex Positions For Couples Sex Positions Beginner's Guide Sex
58% (33)
Sex Positions - This Book Includes - Sex Positions For Couples Sex Positions Beginner's Guide Sex
539 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
95% (19)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
Data Structures Overview and Examples
91% (22)
Data Structures Overview and Examples
90 pages
Data Structure and Algorithms With Python
100% (18)
Data Structure and Algorithms With Python
369 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (11)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Python Programming Guide Book
100% (23)
Python Programming Guide Book
323 pages
The Huge Load Formula
38% (138)
The Huge Load Formula
27 pages
The Sex Muscle Workout For Men
60% (90)
The Sex Muscle Workout For Men
70 pages
Dark Psychology and Manipulation Discover 40 Covert Emotional Manipulation Techniques Mind Control Brainwashing. Learn How... Cooper William
96% (54)
Dark Psychology and Manipulation Discover 40 Covert Emotional Manipulation Techniques Mind Control Brainwashing. Learn How... Cooper William
328 pages
The Complete Illustrated Kama Sutra
82% (28)
The Complete Illustrated Kama Sutra
308 pages
100 Use Cases for Generative AI
96% (23)
100 Use Cases for Generative AI
119 pages
AI Concepts with Python Guide
100% (10)
AI Concepts with Python Guide
428 pages
Practical Projects
100% (32)
Practical Projects
478 pages
Python For Data Science The Ultimate Beginners Guide To Learning Python Data Science Step by Step - Compress
100% (11)
Python For Data Science The Ultimate Beginners Guide To Learning Python Data Science Step by Step - Compress
148 pages
Diagrammatic Reasoning in AI
100% (8)
Diagrammatic Reasoning in AI
347 pages
The Python Manual
94% (33)
The Python Manual
196 pages
Electrical Engineering Competitive Capsule
94% (16)
Electrical Engineering Competitive Capsule
209 pages
Understanding Machine Learning Concepts
100% (73)
Understanding Machine Learning Concepts
416 pages
Comprehensive Looksmaxing Guide
95% (37)
Comprehensive Looksmaxing Guide
123 pages
Let Us C Book
100% (4)
Let Us C Book
357 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
21 pages
AWS AI Practitioner Exam Insights
No ratings yet
AWS AI Practitioner Exam Insights
37 pages
Advanced Analytics in Power BI
No ratings yet
Advanced Analytics in Power BI
33 pages
Warehouse Layout Optimization Techniques
No ratings yet
Warehouse Layout Optimization Techniques
31 pages
English Test for 11th Grade Students
No ratings yet
English Test for 11th Grade Students
189 pages
ENG 111 Composition I Syllabus Fall 2025
No ratings yet
ENG 111 Composition I Syllabus Fall 2025
7 pages
ELPR 140 Speaking Task Guide
No ratings yet
ELPR 140 Speaking Task Guide
2 pages
Data Mining Techniques for Business Insights
No ratings yet
Data Mining Techniques for Business Insights
11 pages
AI & ML Faculty Development Program
No ratings yet
AI & ML Faculty Development Program
4 pages
AR Solutions in Quality Control
No ratings yet
AR Solutions in Quality Control
8 pages
Public Speaking: Choices and Responsibility 4th Edition William Keith Ebook PDF Available
100% (1)
Public Speaking: Choices and Responsibility 4th Edition William Keith Ebook PDF Available
52 pages
CNN Classifier for MNIST Dataset
No ratings yet
CNN Classifier for MNIST Dataset
3 pages
2025 Global Data Center Market Insights
No ratings yet
2025 Global Data Center Market Insights
42 pages
Gradients in Modern Branding
No ratings yet
Gradients in Modern Branding
3 pages
Neuralink: Elon Musk's AI Symbiosis
No ratings yet
Neuralink: Elon Musk's AI Symbiosis
6 pages
Safety of Advanced AI: Interim Report
No ratings yet
Safety of Advanced AI: Interim Report
6 pages
AI Cost Volatility Insights for 2024
No ratings yet
AI Cost Volatility Insights for 2024
16 pages
AI-Driven Heart Disease Detection System
No ratings yet
AI-Driven Heart Disease Detection System
56 pages
AI Automation Agency Case Study Guide
No ratings yet
AI Automation Agency Case Study Guide
12 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
6 pages
Text Mining Concepts and Quiz Questions
No ratings yet
Text Mining Concepts and Quiz Questions
3 pages
Deep Forestry: AI-Driven Solutions
No ratings yet
Deep Forestry: AI-Driven Solutions
6 pages
AI Security Solutions for IoT Devices
No ratings yet
AI Security Solutions for IoT Devices
3 pages
MIT Study: AI Use Linked to Cognitive Decline
No ratings yet
MIT Study: AI Use Linked to Cognitive Decline
5 pages
AI Detection of ISKCON Kirtans
No ratings yet
AI Detection of ISKCON Kirtans
4 pages
Data Analytics in Tyre Manufacturing
No ratings yet
Data Analytics in Tyre Manufacturing
24 pages
Engineering Faculty Development Program
No ratings yet
Engineering Faculty Development Program
2 pages
Deep Learning for Event Extraction
No ratings yet
Deep Learning for Event Extraction
22 pages
Human Brain vs AI: Key Differences
No ratings yet
Human Brain vs AI: Key Differences
3 pages
Grammarly's Role in Academic Writing Skills
100% (1)
Grammarly's Role in Academic Writing Skills
38 pages