5.1Loss Function, Optimization,Gd
5.1Loss Function, Optimization,Gd
AY 2024-2025 SEM-V
1
MIT School of Computing
Department of Information Technology
Unit-V Syllabus
2
Role of Loss Function and Optimization
• In machine learning, loss functions and
optimization work together to improve a
model's performance by finding the best
parameters for a given data set
Loss function
• Loss function is a measure of the distance
between the model prediction and the correct
answer (the label), loss function answer the
question of how the model is doing?
• Loss function is what we need to minimize in
order to get the best model parameters (It is an
optimization problem).
• Loss function's output is higher when
predictions are off and lower when they're good.
• The choice of the loss function is an important
factor
• There are a lot of loss functions and which one
to use is depends on the problem to solve.
Loss function
Regression :
• MSE
• MAE
• RMSE
Classification
• Hunge loss
• Log likelihood
Linear Regression
• A simple regression model of life satisfaction:
• life_satisfac tion = θ0 + θ1× GDP_per_capita.
• a linear model makes a prediction by simply
computing a weighted sum of the input
features, plus a constant called the bias term
(also called the intercept term)
• Linear Regression model prediction
vectorized form
Note : When using Gradient Descent, you should ensure that all features have a
similar scale (e.g., using Scikit-Learn’s StandardScaler class), or else it will take much
longer to converge
Optimization Algorthms
• Gradient Descent, gradient is the derivative, we calculate the derivative of the loss
function with respect to the weights dl/dw and from that we get the direction to the
minimum, we need another thing beside the direction, we need the step size .
• The step size which is determined by the learning rate and it's one of the most
important hyperparameters that we tune during the training process
• If the learning rate was too large there will be divergence and we will not reach the
minimum and if we choose the learning rate to be very small the learning process will
be too slow, so the learning rate is a hyperparameter that we need to tune.
• We put a minus sign because the derivative give us the upward direction and we need
the descent direction.
Variants of Gradient Descent
• Batch Gradient Descent where the entire training data used to compute
the gradients at every step so it's computationally expensive .
• Hyperparameter optimization means the machine learning model can solve the
problem it was designed to solve as efficiently and effectively as possible.
• The aim is to achieve maximum accuracy and efficiency, and minimum errors.