UE21CS352A - Machine Intelligence
Dr. Uma D
Professor
Department of Computer Science & Engineering
Teaching Assistant : (K S Ramalakshmi - Semester VII)
UE21CS352A - Machine Intelligence
What is Regression in Machine Learning?
Regression:
• Regression is a method/ statistical measure for understanding the relationship
between independent variables(X) or features and a dependent variable(Y) or
outcome. Outcomes can then be predicted once the relationship between
independent and dependent variables has been estimated.
• Forecast value of a dependent variable (Y) from the value of independent
variables (X1 , X2 ,… ).
• Analyse the specific relationships between two or more variables.
• This is done to gain information about the one through knowing values of the
others.
UE21CS352A - Machine Intelligence
Examples of Regression
Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Prediction of road accidents due to rash driving.
• Forecasting continuous outcomes like house prices, stock process or sales.
• Predicting the success of future retail sales or marketing campaigns to ensure resources are
used effectively.
• Analyzing datasets to establish the relationships between variables and output.
• Creating time series visualizations
UE21CS352A - Machine Intelligence
Types of Regression
UE21CS352A - Machine Intelligence
Scatter Plots
Choosing the type of regression using scatter plot:
• A scatter plot is a useful visualization tool that can help
you decide which type of regression to use when
analyzing a relationship between two variables.
• Scatter plots show individual data points as dots on a
two-dimensional plane, with one variable on the x-axis
and the other variable on the y-axis.
• The pattern of the data points in a scatter plot can
provide insights into the nature of the relationship
between the variables, and this can guide you in
choosing the appropriate type of regression analysis.
UE21CS352A - Machine Intelligence
Linear Regression
Linear Regression:
• Linear regression is a statistical regression method which is used
for predictive analysis, that shows a linear relationship between
a dependent (y) and one or more independent (x) variables.
• It is one of the very simple and easy algorithms.
• It is used when you have continuous numerical data.
• Linear Regression is best used when the relationship between the
variables can be approximated by a straight line.
UE21CS352A - Machine Intelligence
Types of Linear Regression
Types of Linear Regression:
Simple Linear Regression:
If there is only one input variable (x), then such linear regression is called simple linear
regression.
Multiple Linear Regression:
If there is more than one input variable, then such linear regression is called multiple
linear regression.
NOTE: In this course, we will be focusing on Simple Linear Regression only.
UE21CS352A - Machine Intelligence
Simple Linear Regression
Simple Linear Regression:
Simple Linear Regression is a type of Regression algorithms that models
the relationship between two continuous variables: a dependent variable
and a single independent variable.
The equation of a simple linear regression model is represented as:
y = mx + b
where:
y is the dependent variable.
x is the independent variable.
m is the slope of the line (coefficient), representing how much y changes
for a unit change in x.
b is the y-intercept, which is the value of y when x is 0.
UE21CS352A - Machine Intelligence
Best Fitted Line
Best Fitted Line:
• Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression line
is minimum.
• The distance between datapoints and line tells whether a model
has captured a strong relationship or not.
• The goal is to find the best fitted line which requires values for m
and b for the given dataset.
UE21CS352A - Machine Intelligence
Gradient Descent Algorithm
GRADIENT DESCENT ALGORITHM:
The parameters m and b can be found using
gradient descent.
Gradient Descent is an iterative optimization
algorithm used to iteratively update the
values of m and b to minimize a loss
function, which measures the difference
between predicted and actual y values.
Note: In the figure, theta 0 and theta 1 are
corresponding to y-intercept(i.e. b) and
slope(i.e. m) respectively.
UE21CS352A - Machine Intelligence
Gradient Descent Algorithm
GRADIENT DESCENT ALGORITHM:
The parameters m and b can be
found using gradient descent.
Gradient Descent is an iterative
optimization algorithm used to
iteratively update the values
of m and b to minimize a loss Fig.(a)
function, which measures the
difference between predicted and Fig.(b)
actual y values.
Note: Fig.(a) represents visualization of cost function(J) and parameters(b and m) (3D
graph) and how Gradient Descent algorithm started with some initial random guess for
the parameters. Fig.(b) represents after several iteration, Gradient Descent algorithm
approaching minimum of cost/ loss function.
UE21CS352A - Machine Intelligence
Gradient Descent Algorithm
Visualization of Cost Function in 2D:
This figure represents the visualization of cost
function vs. one parameter (2D graph) and how
gradient descent algorithm takes steps to reach
the minimum of the cost function with learning
rate(alpha).
Learning Rate Alpha: is a positive constant
determines how much bigger the steps are
going to be.
Note :
If alpha is extremely large then it is going to overshoot the minimum value of cost function(J) and
might diverge.
If alpha is smaller value then smaller steps are taken to reach the minimum of cost function.
UE21CS352A - Machine Intelligence
Gradient Descent Algorithm – Working
• Gradient descent works by moving downward toward the
pits or valleys in the graph to find the minimum value.
• This is achieved by taking the derivative of the loss function,
as illustrated in the coming slides.
• During each iteration, gradient descent step-downs the cost
function in the direction of the steepest descent.
• By adjusting the parameters in this direction, it seeks to
reach the minimum of the loss function and find the
best-fit values for the parameters.
• The size of each step is determined by parameter L or
α known as Learning Rate.
UE21CS352A - Machine Intelligence
Gradient Descent
Steps to Find Optimum Values of m and b:
1. Initialize m and b with random values.
2. Define the loss function: The loss function measures how far off the predictions
are from the actual values. A common loss function for linear regression is the
Mean Squared Error (MSE):
Where n is the number of data points, yi is the actual y value for the ith data point
and the corresponding x value.
3. Calculate the gradient: Calculate the partial derivatives of the loss function with
respect m and b. This tells us how much the loss will change if we make small
adjustments to m and b:
UE21CS352A - Machine Intelligence
Gradient Descent Algorithm
Steps to Find Optimum Values of m and b:
4. Update parameters: Update m and b using the gradients and a learning rate (α).
The learning rate determines the step size in each iteration and should be chosen
carefully,
5. Iterate: Repeat the steps 3 and 4 for a certain number of iterations or until the
parameters converge.
Note: To get rid of 2 in the numerator of partial derivatives of MSE( which is our cost
function J in Linear Regression) w.r.to m and b, we can also take MSE as
Note: In the above MSE formula, the predicted line(mxi+b), considers only two
parameters. However, the same idea can be extended to any number of parameters.In
such case, the error space dimension will be higher which can’t be visualized.
UE21CS352A - Machine Intelligence
Numerical
Using y = mx+b, assume m=10, b=300, LR(α)= 0.0001. Perform five iterations of
gradient descent on this linear regression model to find out the new parameters
and observe the reduction in error through each iteration.
First iteration:
y = 10*x + 300
MSE = =
= 1/7 (sum ( (800-600)^2 + (950-670)^2 + (600-550)^2 +
(1050-730)^2 + (1200 - 800 )^2 + (740-590 )^2 + (1100 - 760)^2)
= 74485.7143
UE21CS352A - Machine Intelligence
Numerical
Doing a partial derivative wrt m of the MSE,
Doing a partial derivative wrt b of the MSE,
UE21CS352A - Machine Intelligence
Numerical
Now,
y = 12.03 * x + 300.0497
= 1/7 (sum ((800-660.94)^2 + (950 - 745.1597)^2 + (600-600.7997)^2
+ (1050-817.3397)^2 + (1200-901.5497)^2 + (740-648.9197)^2 +
(1100-853.4297)^2)
= 38957.23
Note that in comparison to the start, the error value after one
iteration has been reduced.
Let’s now predict new y values.
UE21CS352A - Machine Intelligence
Numerical
Second iteration:
y = 12.0388 * x + 300.0497
MSE =
= 1/7 (sum((800 – 660.9)^2 +
(950 – 745.15)^2 + (600 –
600.79)^2 + (1050 – 1050.8)^2 +
(1200 – 901.5)^2 + (740 – 648.9)^2
+ (1100 – 853.4)^2))
= 38957.23
UE21CS352A - Machine Intelligence
Numerical
Doing a partial derivative wrt m of
the MSE,
-14443.2337
12.0388 -14443.2337
13.4831
Doing a partial derivative wrt b of
the MSE,
= -345.59
Therefore, after two iterations,
300.0497 -345.59 Value of m is 13.4831 and
300.0843 Value of b is 300.0843.
UE21CS352A - Machine Intelligence
Numerical
Table for iteration 1: Table for iteration 2:
Initial m = 10, b = 300 New m = 12.0388, b = 300.0497
UE21CS352A - Machine Intelligence
Numerical
Table for iteration 3: Table for iteration 4:
New m = 13.4831, b = 300.0843 New m = 14.5063, b = 300.1081
UE21CS352A - Machine Intelligence
Numerical
Table for iteration 5:
New m = 15.231, b = 300.1243
Note that in comparison to the start,
the error value has been reduced
significantly.
UE21CS352A - Machine Intelligence
Measuring Model Performance
Measuring Model Performance:
• The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization.
R-squared method:
• R-squared is a statistical method that determines the goodness of fit.
• It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
UE21CS352A - Machine Intelligence
Measuring Model Performance
• The high value of R-square determines the less difference between the predicted
values and actual values and hence represents a good model.
• It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
• It can be calculated from the formula:
UE21CS352A - Machine Intelligence
Outlier – Graphical Representation
Effect of Outlier in Model Prediction:
An outlier can be higher or lower than expected,
or displaced more to the right or left than
expected. Outliers can effect regression Outlier
lines, making the regression lines less accurate in
predicting other data.
UE21CS352A - Machine Intelligence
Advantages vs Disadvantages of Linear Regression
Advantages Disadvantages
Outliers can have huge effects on the
Simple to implement and easier to
regression and boundaries are linear in this
interpret the output coefficients.
technique.
Linear regression assumes a linear
When you know the relationship between
relationship between dependent and
the independent and dependent variable
independent variables. That means it
have a linear relationship, this algorithm is
assumes that there is a straight-line
the best to use because of it’s less
relationship between them. It assumes
complexity compared to other algorithms.
independence between attributes.
UE21CS352A - Machine Intelligence
Advantages vs Disadvantages of Linear Regression
Advantages Disadvantages
It also looks at a relationship between the
Linear Regression is susceptible to mean of the dependent variables and the
over-fitting but it can be avoided using independent variables. Just as the mean is
some dimensionality reduction techniques, not a complete description of a single
regularization (L1 and L2) techniques and variable, linear regression is not a
cross-validation. complete description of relationships
among variables.
UE21CS352A - Machine Intelligence
References
• https://www.educba.com/linear-regression-analysis/
• https://towardsdatascience.com/linear-regression-using-gradient-descent
• https://www.geeksforgeeks.org/ml-advantages-and-disadvantages-of-linear-regression/
• https://www.javatpoint.com/linear-regression-in-machine-learning
• https://www.ibm.com/topics/linear-regression
• https://www.javatpoint.com/multiple-linear-regression-in-machine-learning
• https://www.scribbr.com/statistics/multiple-linear-regression/
• https://towardsdatascience.com/linear-regression-using-gradient-descent-97a6c8700931
• https://www.vedantu.com/maths/linear-regression
• https://www.geeksforgeeks.org/gradient-descent-in-linear-regression/
• https://www.analyticsvidhya.com/blog/2021/10/everything-you-need-to-know-about-linear-regression/#:~:t
ext=Linear%20regression%20shows%20the%20linear,is%20called%20simple%20linear%20regression.
THANK YOU
Dr. Uma D
Professor
Department of Computer Science & Engineering
[email protected]