0% found this document useful (0 votes)

4 views

Linear Regression For Intermediate

Uploaded by

Anisha Koushal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Linear Regression For Intermediate

Uploaded by

Anisha Koushal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Simple Linear Regression

Introduction

Least Square “Linear Regression” is a statistical method to regress the data with dependent
variable having continuous values whereas independent variables can have either continuous
or categorical values. In other words, “Linear Regression” is a method to predict dependent
variable (Y) based on values of independent variables (X).

Prerequisites

To start with Linear Regression, you must be aware of a few basic concepts of statistics. i.e.,
 Correlation (r) – Explains the relationship between two variables, possible values -1
to +1
 Variance (σ2)– Measure of spread in your data
 Standard Deviation (σ) – Measure of spread in your data (Square root of Variance)
 Normal distribution
 Residual (error term) – {Actual value – Predicted value}

Assumptions

 Dependent variable is continuous

 Linear relationship between Dependent Variable and Independent Variable.
 No Multicollinearity (no relationship between Independent variables)
 Residuals should follow Normal Distribution.
 Residuals should have constant variance: Homoscedasticity
 Residuals should be independently distributed/no autocorrelation

To check relationship between dependent and independent variable:

1. Perform Bivariate Analysis
2. Calculate Variance Inflation factor: value which is closer to 1 and till maximum 4

To find whether residuals are normally distributed or not:

1. Perform Histogram/ Boxplot
2. Perform Kolmogorov Smirnov K’s test

To check Homoscedasticity:
1. Plot Residuals Vs. Predicted values, and there should be no pattern.
2. Perform Non Constant Variance Test.

Linear Regression Line

While doing linear regression our objective is to fit a line through the distribution which is
nearest to most of the points. Hence reducing the distance (error term) of data points from the
fitted line.

For example, in above figure (left) dots represent various data points and line (right)
represents an approximate line which can explain the relationship between ‘x’ & ‘y’
axes. Through, linear regression we try to find out such a line. For example, if we have one
dependent variable ‘Y’ and one independent variable ‘X’ – relationship between ‘X’ & ‘Y’
can be represented in a form of following equation:

Y= β0 + β1X
Where,
Y = Dependent Variable
X = Independent Variable
β0 = Constant term a.k.a Intercept
β1 = Coefficient of relationship between ‘X’ & ‘Y’

Few properties of linear regression line

 Regression line always passes through mean of independent variable (x) as well as
mean of dependent variable (y)
 Regression line minimizes the sum of “Square of Residuals”. That’s why the method
of Linear Regression is known as “Ordinary Least Square (OLS)”.
 β1 explains the change in Y with a change in X by one unit. In other words, if we
increase the value of ‘X’ by one unit then what will be the change in value of Y.

Finding a Linear Regression Line

Using a statistical tool e.g., Excel, R, SAS etc. you will directly find constants (β0 and β1) as
a result of linear regression function. But conceptually as discussed it works on OLS concept
and tries to reduce the square of errors, using the very concept software packages calculate
these constants.

For example, let say we want to predict ‘y’ from ‘x’ given in following table and let’s assume
that our regression equation will look like “y=B0+B1*x”

X y Predicted 'y'

1 2 Β0+B1*1

2 1 Β0+B1*2

3 3 Β0+B1*3

4 6 Β0+B1*4

5 9 Β0+B1*5

6 11 Β0+B1*6

7 13 Β0+B1*7

8 15 Β0+B1*8

9 17 Β0+B1*9
10 20 Β0+B1*10

Where,
Table 1:

Std. Dev. of x 3.02765

Std. Dev. of y 6.617317

Mean of x 5.5

Mean of y 9.7

Correlation between x & y .989938

If we differentiate the Residual Sum of Square (RSS) WRT. B0 & B1 and equate the results
to zero, we get the following equations as a result:
B1 = Correlation * (Std. Dev. of y/ Std. Dev. of x)
B0 = Mean(Y) – B1 * Mean(X)
Putting values from table 1 into the above equations,

B1 = 2.64
B0 = -2.2
Hence, the least regression equation will become –

Y = -2.2 + 2.64*x

Model Performance

Once you build the model, the next logical question comes in mind is to know whether your
model is good enough to predict in future or the relationship which you built between
dependent and independent variables is good enough or not.
For this purpose, there are various metrics which we look into-

i. R – Square (R2)
Formula for calculating R2 is given by:

 Total Sum of Squares (TSS): TSS is a measure of total variance in the

response/ dependent variable Y and can be thought of as the amount of variability
inherent in the response before the regression is performed.
 Residual Sum of Squares (RSS): RSS measures the amount of variability that
is left unexplained after performing the regression.
 (TSS – RSS) measures the amount of variability in the response that is explained
(or removed) by performing the regression.
Where N is the number of observations used to fit the model, σx is the standard deviation of x,
and σy is the standard deviation of y.
 R2 ranges from 0 to 1.
 R2 of 0 means that the dependent variable cannot be predicted from the independent
variable.
 R2 of 1 means the dependent variable can be predicted without error from the
independent variable.
 An R2 between 0 and 1 indicates the extent to which the dependent variable is
predictable. An R2 of 0.20 means that 20 percent of the variance in Y is predictable
from X; an R2 of 0.40 means that 40 percent is predictable; and so on.

ii. Root Mean Square Error (RMSE)

RMSE tells the measure of dispersion of predicted values from actual values. The formula for
calculating RMSE is

N: Total number of observations

Though RMSE is a good measure for errors but the issue with it is that it is susceptible
to the range of your dependent variable. If your dependent variable has thin range, your
RMSE will be low and if dependent variable has wide range RMSE will be high. Hence,
RMSE is a good metric to compare between different iterations of a model.

iii. Mean Absolute Percentage Error (MAPE)

To overcome the limitations of RMSE, analyst prefer MAPE over RMSE which gives
error in terms of percentages and hence comparable across models. Formula for
calculating MAPE can be written as:

Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
3M ESPE Catalog
No ratings yet
3M ESPE Catalog
124 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
3 Da
No ratings yet
3 Da
16 pages
Regression
No ratings yet
Regression
56 pages
Regression
No ratings yet
Regression
14 pages
Regression
No ratings yet
Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
LinearStatisticalModels and Regression Analysis
No ratings yet
LinearStatisticalModels and Regression Analysis
27 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
1.5.Linear Regression
No ratings yet
1.5.Linear Regression
5 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Chapter 1 Simple Linear Regression Model
No ratings yet
Chapter 1 Simple Linear Regression Model
2 pages
Data Science
100% (1)
Data Science
14 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Data Analytics Unit 3 Notes
100% (2)
Data Analytics Unit 3 Notes
28 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Unit III
No ratings yet
Unit III
18 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Kiran Triangle
No ratings yet
Kiran Triangle
4 pages
Jimmy Lamba Resume PDF
No ratings yet
Jimmy Lamba Resume PDF
8 pages
Executive Profile Core Competencies: Manoj Mohandas Menon
No ratings yet
Executive Profile Core Competencies: Manoj Mohandas Menon
2 pages
Amit Dalal Resume PDF
100% (1)
Amit Dalal Resume PDF
2 pages
Promit Mallick Resume
No ratings yet
Promit Mallick Resume
3 pages
Ufgs 02 84 33
No ratings yet
Ufgs 02 84 33
18 pages
Accomplishment Report and Clearance
No ratings yet
Accomplishment Report and Clearance
4 pages
Composite Beam Design
No ratings yet
Composite Beam Design
33 pages
Mud Rushes and Methods of Combating Them
No ratings yet
Mud Rushes and Methods of Combating Them
8 pages
Evaluation of Damage Equivalent Factor For Roadway Bridges
No ratings yet
Evaluation of Damage Equivalent Factor For Roadway Bridges
8 pages
Style, Grammar and Other Considerations 23 - SS
No ratings yet
Style, Grammar and Other Considerations 23 - SS
21 pages
Name: Şeyma Nur KALKAN Number:183407049 Date: 08.04.2020 SOC211.1
No ratings yet
Name: Şeyma Nur KALKAN Number:183407049 Date: 08.04.2020 SOC211.1
1 page
quiz empathy and respect
No ratings yet
quiz empathy and respect
2 pages
English HSSC-I (HA)
No ratings yet
English HSSC-I (HA)
7 pages
Lesson - Charities - Students
No ratings yet
Lesson - Charities - Students
4 pages
Standard Specification: Tecnicas Reunidas, S.A
No ratings yet
Standard Specification: Tecnicas Reunidas, S.A
9 pages
Components of The Mental Status Examination
No ratings yet
Components of The Mental Status Examination
3 pages
Innovative Use of Technology in BABINA Diagnostics
No ratings yet
Innovative Use of Technology in BABINA Diagnostics
9 pages
Non-Stationary Signal Analysis Software - WT9362 & WT9364: Brüel &
No ratings yet
Non-Stationary Signal Analysis Software - WT9362 & WT9364: Brüel &
4 pages
(2012) Physiology, Biochemistry and Possible Applications of Microbial Caffeine Degradation PDF
0% (1)
(2012) Physiology, Biochemistry and Possible Applications of Microbial Caffeine Degradation PDF
10 pages
Exp. No. 7-Sizing 1-Screens
No ratings yet
Exp. No. 7-Sizing 1-Screens
34 pages
q125 Sonicator and Accessories
No ratings yet
q125 Sonicator and Accessories
2 pages
WH-04 - Work at Height
No ratings yet
WH-04 - Work at Height
3 pages
CV Welly Update
No ratings yet
CV Welly Update
31 pages
Thornleigh Salesian College, Bolton 1
No ratings yet
Thornleigh Salesian College, Bolton 1
23 pages
CG PrePhDandPhD
No ratings yet
CG PrePhDandPhD
33 pages
Bridgestone OTR Product Guide 17.1!06!28 2017
No ratings yet
Bridgestone OTR Product Guide 17.1!06!28 2017
44 pages
Language Research Reviewer
No ratings yet
Language Research Reviewer
69 pages
Stanley Catalogue
100% (1)
Stanley Catalogue
33 pages
St6911 Ipt™ Acceleration Vibration Transducer St6917 Ipt™ Velocity Vibration Transducer
No ratings yet
St6911 Ipt™ Acceleration Vibration Transducer St6917 Ipt™ Velocity Vibration Transducer
8 pages
SLM10G11Q2W5 Relative and Absolute Dating 1
No ratings yet
SLM10G11Q2W5 Relative and Absolute Dating 1
27 pages
Practice Exam Questions
No ratings yet
Practice Exam Questions
3 pages
Lab Report-3: Conservation of Resource and Ecological Footprint Course Code: ENV107L Section: 13
No ratings yet
Lab Report-3: Conservation of Resource and Ecological Footprint Course Code: ENV107L Section: 13
8 pages
Nursing Theories
No ratings yet
Nursing Theories
13 pages

Linear Regression For Intermediate

Uploaded by

Linear Regression For Intermediate

Uploaded by

Simple Linear Regression

 Dependent variable is continuous

To check relationship between dependent and independent variable:

To find whether residuals are normally distributed or not:

Linear Regression Line

Few properties of linear regression line

Finding a Linear Regression Line

Std. Dev. of x 3.02765

Std. Dev. of y 6.617317

Correlation between x & y .989938

 Total Sum of Squares (TSS): TSS is a measure of total variance in the

ii. Root Mean Square Error (RMSE)

N: Total number of observations

iii. Mean Absolute Percentage Error (MAPE)

You might also like