0% found this document useful (0 votes)

29 views

Unit 5

Uploaded by

jobowo3828

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Unit 5

Uploaded by

jobowo3828

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Simple Linear Regression

BUSINESS ANALYTICS

P R O F. AD I T YA S U R E S H K AS AR

Unit 5
Linear Regression

• Linear regression stands for a function that is linear in regression coefficients.

• The following equation will be treated as linear as far as regression is

concerned.

Y  1  1 X1   2 X1 X 2  3 X 22
Regression Model Development
Simple Linear Regression Model Building
A simple linear regression model is developed to understand how the value of a KPI is associated with
changes in the values of an independent variable.
Some examples are as follows:
1. A hospital may be interested in finding how the total treatment cost of a patient varies with the body
weight of the patient.
2. E-commerce companies such as Amazon, Bigbasket and Flipkart would like to understand the
number of customer visits to their portal and the revenue.
3. Retailers such as Walmart, Target, Reliance Retail, Hyper City, etc. would be interested in
understanding the impact of price cut promotions on the revenue of their private labels (store brands
or house brands).
4. Original equipment manufacturers (OEMs) would like to know the impact of duration of warranty on
the profit.
Framework for SLR model development
Estimation of Parameters using Ordinary Least Squares

Given a set of dependent variable values (Yi) and the corresponding independent variable
values (Xi), each subject to a random error (i), one has to find the best equation to represent
the relationship between the dependent and independent variables.
Assumptions
The method of least squares gives the best equation under the assumptions stated below
(Harter 1974, 1975):

o The regression model is linear in regression parameters.

o The explanatory variable, X, is assumed to be non-stochastic (i.e., X is deterministic).

o The conditional expected value of the residuals, E(i|Xi), is zero.

o In case of time series data, residuals are uncorrelated, that is, Cov(i, j) = 0 for all i  j.

o The residuals, i, follow a normal distribution.

o The variance of the residuals, Var(i|Xi), is constant for all values of Xi. When the variance
of the residuals is constant for different values of Xi, it is called homoscedasticity. A non-
constant variance of residuals is called heteroscedasticity
Example

Salary of Graduating MBA Students versus Their Percentage Marks in

Grade 10
Table in next slide provides the salary of 50 graduating MBA students of a
Business School in 2016 and their corresponding percentage marks in grade
10 . Develop a linear regression model by estimating the model parameters.
Salary of MBA students versus their grade 10 marks
Percentage in Grade Percentage in
S. No. Salary S. No. Salary
10 Grade 10
1 62 270000 26 64.6 250000
2 76.33 200000 27 50 180000
3 72 240000 28 74 218000
4 60 250000 29 58 360000
5 61 180000 30 67 150000
6 55 300000 31 75 250000
7 70 260000 32 60 200000
8 68 235000 33 55 300000
9 82.8 425000 34 78 330000
10 59 240000 35 50.08 265000
11 58 250000 36 56 340000
12 60 180000 37 68 177600
13 66 428000 38 52 236000
14 83 450000 39 54 265000
15 68 300000 40 52 200000
16 37.33 240000 41 76 393000
17 79 252000 42 64.8 360000
18 68.4 280000 43 74.4 300000
19 70 231000 44 74.5 250000
20 59 224000 45 73.5 360000
21 63 120000 46 57.58 180000
22 50 260000 47 68 180000
23 69 300000 48 69 270000
24 52 120000 49 66 240000
25 49 120000 50 60.8 300000
Solution

From SPSS output, the values of coefficients are:

 
 0  61555.3553 and  1  3076.1774
The corresponding regression equation is given by


Yi  61555.3553  3076.1774X i

Where is the predicted value of Y for a given value of Xi.

The equation can be interpreted as follows: for every one

percentage increase in grade 10 marks, the salary of the MBA
students will increase at the rate of 3076.177 on an average.
 
The notations  0 and  1 are used to denote that these are
estimated values of the regression coefficients from the
sample of 50 students.
Interpretation of Simple Linear Regression Coefficients
• Interpretation of regression coefficients is important for understanding the
relationship between the response variable and the explanatory variable and the
impact of change in the values of explanatory variables on the response variable.

• The interpretation will depend on the functional form of the relationship between
the response and the explanatory variables.

• Interpretation of 0 and 1 in Y = 0 + 1 X

When the functional form is Y = 0 + 1 X, the value of 0 = E(Y|X=0).

Y
1 = X that is 1 is the change in the value of Y for the unit change in the value of
X. Where Y is the partial derivative of Y with respect to X.
X
Validation of the Simple Linear Regression Model
It is important to validate the regression model to ensure its validity and goodness of fit before
it can be used for practical applications. The following measures are used to validate the
simple linear regression models:

o Co-efficient of determination (R-square).

o Hypothesis test for the regression coefficient

o Analysis of Variance (ANOVA) for overall model validity (relevant more for multiple linear
regression).

o Residual analysis to validate the regression model assumptions.

o Outlier analysis.
The above measures and tests are essential, but not exhaustive.
Validation of the SLR
Coefficient of Determination (R-Square or R2)
o The co-efficient of determination (or R-square or R2) measures the percentage of variation in Y
explained by the model (0 + 1 X).
o The simple linear regression model can be broken into explained variation and unexplained variation as
shown in

Yi

  0  1 X i  i

Variation in Y Variation in Y explained Variation in Y not explained
by the model by the model

In absence of the predictive model for Yi, the users will

use the mean value of Yi. Thus, the total variation is
measured as the difference between Yi and mean value
of Yi (i.e. Yi - Y ).
Validation of the SLR
Coefficient of Determination (R-Square or R2)

Description of total variation, explained variation and unexplained variation

Variation Type Measure Description

Total Variation (SST) ( Yi  Y

 ) Total variation is the difference between the actual value
and the mean value.

Variation explained by the model (  

Yi  Y
) Variation explained by the model is the difference between
the estimated value of Yi and the mean value of Y

Variation not explained by model ( 

Yi  Yi
) Variation not explained by the model is the difference
between the actual value and the predicted value of Yi
(error in prediction)
Validation of the SLR
Coefficient of Determination (R-Square or R2)
The relationship between the total variation, explained variation and the unexplained variation
is given as follows:
   
Yi  Y  Yi  Y  Yi  Yi
    
Total Variation in Y Variation in Y explained by the model Variation in Y not explained by the model

It can be proved mathematically that sum of squares of total variation is equal to sum of
squares of explained variation plus sum of squares of unexplained variation

n   2 n    2 n   2
 
 Yi  Y    
 Yi  Y    
 Yi  Yi 
i 1  i 1  i 1 
  
SST SSR SSE

where SST is the sum of squares of total variation, SSR is the sum of squares of variation
explained by the regression model and SSE is the sum of squares of errors or unexplained
variation.
Validation of the SLR
Coefficient of Determination (R-Square or R2)

Coefficient of Determination or R-Square

The coefficient of determination (R2) is given by

2
  
 Yi  Y 
Explained variation SSR  

Coefficien t of determinat ion  R 2   
Total variation SST  2

 Yi  Y 
 
 

Since SSR = SST – SSE, the above Eq. can be written as

2
  
 Yi  Yi 
 
 1  
SSE
R2  1 
SST  2
 
 Yi  Y 
 
 
Validation of the SLR
Coefficient of Determination (R-Square or R2)

Coefficient of Determination or R-Square

Thus, R2 is the proportion of variation in response variable Y explained by the regression model.
Coefficient of determination (R2) has the following properties:

o The value of R2 lies between 0 and 1.

o Higher value of R2 implies better fit, but one should be aware of spurious regression.
o Mathematically, the square of correlation coefficient is equal to coefficient of determination (i.e., r2 =
R2).
o We do not put any minimum threshold for R2; higher value of R2 implies better fit. However, a minimum
value of R2 for a given significance value  can be derived using the relationship between the F-
statistic and R2
Validation of the SLR
Coefficient of Determination (R-Square or R2)

Spurious Regression
Number of Facebook users and the number of people who died of helium poisoning in UK

Year Number of Facebook users in Number of people who died of

millions (X) helium poisoning in UK (Y)
2004 1 2
2005 6 2
2006 12 2
2007 58 2
2008 145 11
2009 360 21
2010 608 31
2011 845 40
2012 1056 51
Facebook users versus helium poisoning in UK
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996442
R Square 0.992896
Standard Error 1.69286
Observations 9
ANOVA

SS MS F Significance F
Regression 1 2803.94 2803.94 978.4229 8.82E-09
Residual 7 20.06042 2.865775
Total 8 2824
Coefficients Standard Error t-stat P-value Lower 95% Upper 95%
Intercept 1.9967 0.76169 2.62143 0.034338 0.195607 3.79783

The R-squareFBvalue for regression

0.0465 0.00149
model 31.27975
between 8.82E-09 of deaths
the number 0.043074 due 0.050119
to helium
poisoning in UK and the number of Facebook users is 0.9928. That is, 99.28% variation in the
number of deaths due to helium poisoning in UK is explained by the number of Facebook
users.
The regression model is given as Y = 1.9967 + 0.0465 X
Validation of the SLR
Hypothesis Test for Regression Co-efficient (t-Test)

o The regression co-efficient ( 1) captures the existence of a linear relationship between the response
variable and the explanatory variable.

o If 1 = 0, we can conclude that there is no statistically significant linear relationship between the two
variables.

The null and alternative hypotheses for the SLR model can be stated as follows:

H0: There is no relationship between X and Y

HA: There is a relationship between X and Y

1 = 0 would imply that there is no linear relationship between the response variable Y and the
explanatory variable X. Thus, the null and alternative hypotheses can be restated as follows:

H0: 1 = 0

HA: 1  0
Validation of the SLR
Test for Overall Model: Analysis of Variance (F-test)

The null and alternative hypothesis for F-test is given by

H0: There is no statistically significant relationship between Y and any of the explanatory
variables (i.e., all regression coefficients are zero).

HA: Not all regression coefficients are zero

• Alternatively:

H0: All regression coefficients are equal to zero

HA: Not all regression coefficients are equal to zero

MSR MSR / 1
• The F-statistic is given by F 
MSE MSE / n  2
Validation of the SLR
Residual Analysis

Residual (error) analysis is important to check whether the assumptions of regression models
have been satisfied. It is performed to check the following:

• The residuals (Yi  Yi )are normally distributed.

• The variance of residual is constant (homoscedasticity).

• The functional form of regression is correctly specified.

• If there are any outliers

Validation of the SLR
Residual Analysis

Checking for Normal Distribution of Residuals (Yi  Yi )

• The easiest technique to check whether the residuals follow normal distribution is to use the P-P plot
(Probability-Probability plot).
• The P-P plot compares the cumulative distribution function of two probability distributions against each
other
Validation of the SLR
Residual Analysis
Test of Homoscedasticity
An important assumption of regression model is that the residuals have constant variance
(homoscedasticity) across different values of the explanatory variable (X).
That is, the variance of residuals is assumed to be independent of variable X. Failure to meet this
assumption will result in unreliability of the hypothesis tests.

Testing the Functional Form of Regression Model

Any pattern in the residual plot would indicate incorrect specification (misspecification) of the model.
Validation of the SLR
Outlier Analysis
o Outliers are observations whose values show a large deviation from mean value, that is

( Yi  Y ) large
o Presence of an outlier can have significant influence on values of regression coefficients.
Thus, it is important to identify the existence of outliers in the data
Example
Use the data on body weight of patients and their treatment cost provided in the data file “DAD” and answer the following
questions:
1. Is there a statistical evidence to support that the cost of treatment and body weight are related? Support your answer
with all necessary tests.
2. Comment on the value of R-square. Does a low R-square value indicate that the model is not useful?
3. Interpret the value of the coefficient of weight in the model developed in question 1. What will be average difference
in cost of treatment for patient aged 50 and patient aged 51?
Example
1. Is there a statistical evidence to support that the cost of treatment and the body weight are related?
Support your answer with all necessary tests.
Solution:
Let Y = cost of treatment and X = weight of the patient. The corresponding simple linear regression model is
given by
Y = b0 + b1 Body weight
The regression output for the model using the software SPSS is shown in Tables,

That is, the relationship between the cost of treatment and the body weight is given by
Y = 127498.079 + 1678.933 × Body Weight -------Eq-1
The p-value for the coefficient “Body Weight” is 0.030 which is less than 0.05; thus, the independent variable
body weight is significant at a = 0.05 or at 95% confidence level.
From the model we can interpret that the cost of treatment increases at the rate of INR 1678.933 per 1 kg
increase in the body weight.
Example
However, before we accept the model, we have to check the important assumptions of normality and homoscedasticity.
Figure 5.1 below is the P-P plot that shows the observed cumulative probability of standardized residuals and expected
cumulative probability of a normal distribution (diagonal line). Figure 5.2 is a plot between the standardized residual
and the standardized response variable (Y). The plot between residual and independent variable values can also be
used for finding existence of heteroscedasticity.

It is evident from Figures 5.1 and 5.2 that both the normality and homoscedasticity assumptions are not satisfied by
the model defined in Eq-1, which puts doubt over the model.

FIGURE 5.1 P-P plot for the model FIGURE 5.2 Plot of standardized predicted versus standardized residual
for model
Example
Whenever the assumptions of regression model are not met, we have to use a remedial measure and one of the
popular remedial measures is Transformation of Variables (transformation of variables will be discussed in Chapter
10). In this case, we try the following model in which instead of Y, we build the model between ln(Y) and X, where ln(Y)
is natural logarithm of Y:
ln(Y) = a0 + a1 × Body Weight

The model outputs for the regression are provided.

That is, the relationship between the cost of treatment and the weight is given by
ln(Y) = 11.804 + 0.0074 × Body Weight ----Eq-2
Example
The p-value for the coefficient ‘body weight’ is less than 0.05, thus the variable body weight is significant at 95%
confidence level.
Figures 5.3 and 5.4 provide the P-P plot and the residual plot between the standardized residual and the standardized
response variable ln(Y). Figure 5.3 (for normality) and Figure 5.4 (for homoscedasticity) are looking better than Figures
5.1 and 5.2.
Thus, the model in Eq. 2 may be used for predicting the cost of treatment since it satisfies important assumptions of
SLR model.

FIGURE 5.3 P-P plot for the model FIGURE 5.4 Plot of standardized predicted versus standardized residual
for model
Example
2. Comment on the value of the R-square. Does a low R-square value indicate that the model is not useful?

Answer: The R-square value for the model ln(Y) = a0 + a1 × Body Weight is only 0.046. That is, the model is explaining
only 4.6% of the variation in the value of ln(Y).
Low R-square values do not imply that the model is not useful. The primary objective of regression is to find whether
there is a relationship between the response variable (cost of treatment) and the independent variable (body weight of
the patient).
The regression model establishes this relationship since the p-value of the weight coefficient is less than 0.05 and both
normality and homoscedasticity assumptions are satisfied reasonably.
Low R-square may create problem when we use the model for prediction since the error is likely to be higher.
Example
3. Interpret the value of the coefficient of weight in the model developed in question 1. What will be
average difference in cost of treatment for someone aged 50 and 51?

Answer:The regression model is given by

ln(Y) = 11.804 + 0.0074 × Body Weight
⇒ Y = exp(11.804 + 0.0074 × Body Weight)
The coefficient for weight is 0.0074, that is, for every 1 kg increase in weight, the cost of treatment increases by a
factor of e11.804+0.0074X (e0.0074 −1) .The average cost of treatment for persons aged 50 and 51 are given by:
X = 50; Y = exp(11.804 + 0.0074 × 50) = exp(12.174) = 1,93,687.2
X = 51; Y = exp(11.804 + 0.0074 × 51) = exp(12.1814) = 1,95,125.80
The difference in the average cost of treatment for patients aged 50 and 51 is INR 1438.602 (note that the
regression coefficient values are truncated after 3 decimals, inclusion of more decimals will give slightly different
answer).
Alternatively, e11.804+0.0074X (e0.0074 −1) = 1438.602
Example
Table 5.3 provides the winning margin of all 20 Lok Sabha constituencies of Kerala in 2014 parliament
elections of India and maximum delay of top 20 flights (origin-destination) of Air India between 15 July
2014 and 15 September 2014.

a) Develop a simple linear regression model between winning margin (Y) and maximum flight delay (X)
and calculate the regression coefficients.
b) What is the value of R2?
c) Is the model statistically significant, what can you infer from the regression model?
Example
a) The model outputs for the regression equation are provided below

Y = -136368.7379 + 851.2274 × maximum flight delay

b) Value of R2 = 0.921

c) The estimated values of ᵝ0 and ᵝ1 and from the SPSS output are given by ᵝ0 = -136368.738 and ᵝ1 = 851.227

The t-stat value for ᵝ0 is -10.42 and the corresponding p-value is less than 0.001 which is less than 0.05 and hence
statistically significant.
Similarly, the t-stat value for ᵝ1 is 14.49 and the corresponding p-value is less than 0.001 which is less than 0.05 and hence
statistically significant. So, we can say that the model is statistically significant.
And the R-square value for the model is 0.921.
That means the model is explaining 92.10% of the variation in the value of Y (winning margin).

Fundamentals of Biostatistics 8th Edition by Rosner ISBN 130526892X Solution Manual
100% (49)
Fundamentals of Biostatistics 8th Edition by Rosner ISBN 130526892X Solution Manual
19 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
ITECH5500 Mini Research Proposal
No ratings yet
ITECH5500 Mini Research Proposal
11 pages
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
No ratings yet
CS5228 Project 2 Twitter Sentiment Analysis Group No.: 29: 1 Problem Statement
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Class 9 Validation of The Linear Regression Model
No ratings yet
Class 9 Validation of The Linear Regression Model
20 pages
4.3_classroom_notes_key
No ratings yet
4.3_classroom_notes_key
7 pages
LinearStatisticalModels and Regression Analysis
No ratings yet
LinearStatisticalModels and Regression Analysis
27 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
Module Three: Determining Cause and Making Reliable Forecasts
No ratings yet
Module Three: Determining Cause and Making Reliable Forecasts
44 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
49 pages
Notes On Linear Regression - 2
No ratings yet
Notes On Linear Regression - 2
4 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
L1 QM07 High Yield Notes
No ratings yet
L1 QM07 High Yield Notes
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
F_Regression
No ratings yet
F_Regression
65 pages
UE20CS312 Unit2 Slides
No ratings yet
UE20CS312 Unit2 Slides
206 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
7 pages
Regression
100% (1)
Regression
43 pages
Regression and Correlation
No ratings yet
Regression and Correlation
17 pages
ACC 324 Wk8to9
No ratings yet
ACC 324 Wk8to9
18 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Correlation R and r2
No ratings yet
Correlation R and r2
5 pages
Interactive Lecture Notes 12-Regression Analysis
No ratings yet
Interactive Lecture Notes 12-Regression Analysis
22 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Business Statistics: A First Course: Simple Linear Regression
No ratings yet
Business Statistics: A First Course: Simple Linear Regression
65 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
File4 Session3 Introduction To Regression
No ratings yet
File4 Session3 Introduction To Regression
50 pages
R2
No ratings yet
R2
4 pages
Module 4.0 Improve Phase - 4.1 Simple Linear Regressions
No ratings yet
Module 4.0 Improve Phase - 4.1 Simple Linear Regressions
11 pages
Regression
No ratings yet
Regression
24 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Course 6 Econometrics Regression
No ratings yet
Course 6 Econometrics Regression
6 pages
Statistics Overview Part II
No ratings yet
Statistics Overview Part II
29 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 4
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 4
5 pages
Linear Regression Model
No ratings yet
Linear Regression Model
36 pages
Chapter13 MAS202
No ratings yet
Chapter13 MAS202
32 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Regression
No ratings yet
Regression
12 pages
Simple Linear Regressionclassroom
No ratings yet
Simple Linear Regressionclassroom
37 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
Linear correlation and linear regression
No ratings yet
Linear correlation and linear regression
37 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
The Ball Is Round PDF
No ratings yet
The Ball Is Round PDF
9 pages
Uts Master Data Science and Innovation Mdsi Course Guide 2023
No ratings yet
Uts Master Data Science and Innovation Mdsi Course Guide 2023
16 pages
Statistics in Discriminant Validity Testing
No ratings yet
Statistics in Discriminant Validity Testing
18 pages
Findings On Offline Handwritten Signature Verification
No ratings yet
Findings On Offline Handwritten Signature Verification
41 pages
Particle Filter
No ratings yet
Particle Filter
16 pages
Populasi Sampel Sampling
No ratings yet
Populasi Sampel Sampling
12 pages
Six Steps To Master Machine Learning With Data Preparation
No ratings yet
Six Steps To Master Machine Learning With Data Preparation
44 pages
UTSSI2B028
No ratings yet
UTSSI2B028
6 pages
Introduction To Nursing Research
No ratings yet
Introduction To Nursing Research
12 pages
Brand Awareness-Brand Quality Inference and Consumer's Risk Perception in Store Brands of Food Products
No ratings yet
Brand Awareness-Brand Quality Inference and Consumer's Risk Perception in Store Brands of Food Products
36 pages
MEFA - Unit-I Digital Content
No ratings yet
MEFA - Unit-I Digital Content
21 pages
Lecture 1: Introduction To Uncertainty Quantification: Today
No ratings yet
Lecture 1: Introduction To Uncertainty Quantification: Today
12 pages
(FREE PDF Sample) An Introduction To Scientific Research Methods in Geography and Environmental Studies 2nd Edition Daniel R. Montello Ebooks
100% (4)
(FREE PDF Sample) An Introduction To Scientific Research Methods in Geography and Environmental Studies 2nd Edition Daniel R. Montello Ebooks
84 pages
Bachelor of Science in Actuarial Science
100% (1)
Bachelor of Science in Actuarial Science
2 pages
Random Sample Consensus (RANSAC) : 1 Subsampling of The Input Data
No ratings yet
Random Sample Consensus (RANSAC) : 1 Subsampling of The Input Data
4 pages
Quartiles and Interquartile Range
No ratings yet
Quartiles and Interquartile Range
27 pages
Stats Quiz.1
No ratings yet
Stats Quiz.1
66 pages
ES Calculator
No ratings yet
ES Calculator
58 pages
Análisis de Supervivencia
No ratings yet
Análisis de Supervivencia
8 pages
9922 30210 2 PB
No ratings yet
9922 30210 2 PB
9 pages
Market Forecast
No ratings yet
Market Forecast
23 pages
Ch6-Comparisons of Several
No ratings yet
Ch6-Comparisons of Several
43 pages
Hediwaw
No ratings yet
Hediwaw
22 pages
DAL Workshop 2
No ratings yet
DAL Workshop 2
5 pages
Midterm Exam Shuffled Basis
No ratings yet
Midterm Exam Shuffled Basis
5 pages
Journal of The American Statistical Association
No ratings yet
Journal of The American Statistical Association
9 pages
ARTICULO 3 Curve of Spee Modification in Different Vertical Skeletal Patterns After Clear Aligner Therapy A 3D Set-Up Retrospective Study
No ratings yet
ARTICULO 3 Curve of Spee Modification in Different Vertical Skeletal Patterns After Clear Aligner Therapy A 3D Set-Up Retrospective Study
8 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

Simple Linear Regression

• Linear regression stands for a function that is linear in regression coefficients.

• The following equation will be treated as linear as far as regression is

o The regression model is linear in regression parameters.

o The explanatory variable, X, is assumed to be non-stochastic (i.e., X is deterministic).

o The conditional expected value of the residuals, E(i|Xi), is zero.

o The residuals, i, follow a normal distribution.

Salary of Graduating MBA Students versus Their Percentage Marks in

From SPSS output, the values of coefficients are:

Where is the predicted value of Y for a given value of Xi.

The equation can be interpreted as follows: for every one

When the functional form is Y = 0 + 1 X, the value of 0 = E(Y|X=0).

o Co-efficient of determination (R-square).

o Hypothesis test for the regression coefficient

o Residual analysis to validate the regression model assumptions.

In absence of the predictive model for Yi, the users will

Description of total variation, explained variation and unexplained variation

Total Variation (SST) ( Yi  Y

Variation explained by the model (  

Variation not explained by model ( 

Coefficient of Determination or R-Square

Since SSR = SST – SSE, the above Eq. can be written as

Coefficient of Determination or R-Square

o The value of R2 lies between 0 and 1.

Year Number of Facebook users in Number of people who died of

The R-squareFBvalue for regression

H0: There is no relationship between X and Y

HA: There is a relationship between X and Y

The null and alternative hypothesis for F-test is given by

HA: Not all regression coefficients are zero

H0: All regression coefficients are equal to zero

HA: Not all regression coefficients are equal to zero

• The variance of residual is constant (homoscedasticity).

• The functional form of regression is correctly specified.

• If there are any outliers

Testing the Functional Form of Regression Model

The model outputs for the regression are provided.

Answer:The regression model is given by

Y = -136368.7379 + 851.2274 × maximum flight delay

You might also like