LINEAR REGRESSION
➢ PRESENTED BY :
➢ SURAJ RATHOD (23BEC045)
➢ RIDHAM GUPTA (23BEC046)
➢ RISHABH VERMA (23BEC047)
➢ INTRODUCTION
➢What is Regression?
› Regression is a statistical method used to analyze the relationship
between one dependent variable and one or more independent
variables. The goal of regression is to model this relationship in such a
way that the dependent variable can be predicted or explained based
on the values of the independent variables using least square
method.
› In simple terms, regression tries to fit a line (or curve) to data points
in a way that minimizes the difference between the observed values
and the values predicted by the model.
› Linear Regression: Used when the relationship between the
dependent and independent variables is assumed to be linear.
➢ Why use linear regression
› Understanding Relationships:Regression allows us to model the
relationship between a dependent variable (e.g., house price, sales)
and one or more independent variables (e.g., square footage,
marketing spend). By understanding this relationship, we can make
informed predictions.
› Example: Predicting the price of a house based on its square footage
› Prediction: The main goal of linear regression is prediction. Once we
have a linear model, we can use it to predict the dependent variable
for new, unseen data.
› Example: Given the size of a new house (square footage), predict its
potential price.
➢ APPLICATIONS IN REAL LIFE PROBLEMS
› Real Estate Market: It is used to predict housing prices based on
features like square footage, location, number of rooms, etc.
› Sales & Marketing: Businesses use linear regression to forecast sales
performance based on advertising spend, marketing campaigns, and
other related factors.
› Healthcare & Medicine: In healthcare, linear regression helps in
analyzing the relationship between patient characteristics (age,
weight, etc.) and health outcomes (recovery time, disease
progression).
› In Technology & Artificial Intelligence: widely used in predictive
analytics, machine learning, and data science to identify trends in
large datasets and optimize algorithms.
➢ PROS & CONS
PROS CONS
› The algorithm is quick to train, › Highly sensitive to outliers (data
especially compared to more complex points that are far from the trend).
models like neural networks, making it Outliers can skew the results, leading
ideal for situations where time or to inaccurate predictions.
resources are limited.
› struggles to capture complex
› Performs well with smaller datasets, interactions between multiple
making it useful in situations where variables, especially when the
data availability is limited. relationship is more intricate or
involves higher-order terms.
› Well-Suited for Linear Relationships
› oversimplify complex data
relationships
MATHEMATICAL FORMULATION
➢ Least Square method
› The least squares method is a mathematical approach used to find
the best-fitting line (or model) by minimizing the sum of the squared
differences (residuals) between the observed data points and the
predicted values.
› In linear regression, it aims to minimize the sum of the squared
vertical distances between the data points and the regression line,
ensuring the best approximation of the data in a way that reduces
errors.
› EQUATION OF ST LINE:
› y= 𝑎0 +𝑎1 x where,
FORMULAS USED
› 𝑎0 = intercept
› 𝑎1 = slope of the line
› x= independent variable
› y= dependent variable
› n= number of observations
𝑦𝑖 −𝑎1 𝑥𝑖
𝑎0 (intercept)=
𝑛
𝑛 𝑥𝑖 𝑦𝑖 −𝑥𝑖 𝑦𝑖
𝑎1 (slope)= 2
𝑛 𝑥𝑖2 − 𝑥𝑖
➢ Question
x(independent) Y(dependent)
1 55
2 58
3 65
4 70
5 75
6 80
7 85
8 88
➢ SOLUTION
› The values in the below table need to be calculated
› Type𝑥 equation 𝑦here. 𝑥2 𝑥𝑦
1 55 1 55
2 58 4 116
3 65 9 195
4 70 16 280
5 75 25 375
6 80 36 480
7 85 49 595
8 88 64 704
› σ 𝑥 = 36 , 𝛴𝑦=616
› 𝑥 2 =204, σ 𝑥𝑦 = 2800, n=8 , 𝑥ҧ = 4.5, 𝑦ത = 616
𝑛 𝑥𝑖 𝑦𝑖 −𝑥𝑖 𝑦𝑖
› Now using 𝑎1 (slope)= 2
𝑛 𝑥𝑖2 − 𝑥𝑖
› by substituting values we get slope=4.95
𝑦𝑖 −𝑎1 𝑥𝑖
› Now using , 𝑎0 (intercept)=
𝑛
› Substituting values we get intercept= 49.71
› The line of regression is y= 4.95x+49.71
➢ MATLAB CODE
› Algorithm
› Step1:Initialize Data:Define the arrays X and Y with the given
values.
› Step2:Compute the Mean of x & y using the formulas
› Step3: Calculate the Slope (m)
› Step4:Compute the numerator and denominator
› Step5:Calculate the Intercept (c)
› Step6:Compute Predicted Values (𝑌𝑝𝑟𝑒𝑑 )
› Step7: Display results
➢ CODE EXPLAINED
› Suppose X is the Hours studied by a student and Y is the exam
scores he gets by studying X hours.
› First create two vectors X, Y and store their values.
› Then calculate the mean of X,Y and store their values in
respective variables mean_X and mean_Y.
› Then calculate the sum of (x-𝑥)ҧ ∗ ሺy −
𝑦ത ) 𝑎𝑛𝑑 𝑠𝑡𝑜𝑟𝑒 𝑖𝑡 𝑖𝑛 𝑡ℎ𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟.
› Store the value of sum of 𝑥 − 𝑥ҧ 2
in denominator.
› Take the ratio of numerator and denominator and store it in the
slope(m).
› Calculate the intercept and store it in c.
› Calculate the predicted value of y ie line of regression.
› Plot and label the graph for x vs y.
›
➢ Graphical Representation