0% found this document useful (0 votes)
8 views13 pages

L9 - Simple Linear Regression

oke

Uploaded by

Ngô Thùy Trang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

L9 - Simple Linear Regression

oke

Uploaded by

Ngô Thùy Trang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

8/12/25

BES

Business and Economics Statistics


Lecture 9
Simple Linear Regression

Intro Least Square method Assessing Model 1

BES
Regression Analysis
Regression analysis is used to model relationships and predict the
value of one variable (the dependent variable) on the basis of
other variables (the independent variables).

Dependent variable: denoted Y


Independent variables: denoted X1 , X2 , …, Xk

Simple linear regression


We will first consider situations where there is
one independent variable X.

Intro Least Square method Assessing Model 2

BES
Simple linear regression Model

A linear model with one independent variable is called


a Simple linear regression model. It is written as:
y = β! + β"x + ε Error variable
y-intercept
Dependent variable
Dependent variable
slope of
the line
𝛽! and 𝛽" are called parameters of the model,
𝜀 is a random variable called the error term
Intro Least Square method Assessing Model 3

1
8/12/25

BES
Assumptions about the error term 𝜀

1. The error 𝜀 is a random variable with mean of zero.

2. The variance of 𝜀 , denoted by σ 2 , is the same for


all values of the independent variable.
3. The values of 𝜀 are independent.

4. The error 𝜀 is a normally distributed random


variable.

Intro Least Square method Assessing Model 4

BES
Regression Analysis

The simple linear regression model is:


𝑬 𝒀 = 𝜷𝟎 + 𝜷𝟏𝑿
E(Y) is the expected value of y for a given x value

The estimated simple linear regression equation is:


𝒚
( = 𝒃𝟎 + 𝒃𝟏𝒙

Intro Least Square method Assessing Model 5

Estimation PROCESS BES

Regression Model Sample Data:


y = b0 + b1x + e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn yn

b 0 and b 1 Estimated
provide estimates of Regression Equation
b 0 and b 1 𝑦! = 𝑏! + 𝑏" 𝑥
Sample Statistics b 0, b 1

Intro Least Square method Assessing Model 6

2
8/12/25

BES
Regression Analysis
Recent family home sales in San Antonio provided the
data displayed (partly) (𝑛 = 20). We wish to predict the
home prices using the square footage.
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage

Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000

What is the dependent variable?


What is the independent variable?

Intro Least Square method Assessing Model 7

BES
Regression Analysis
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage

Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000

Histograms of Square Footage Histograms of Price


2500 180000
2224 155900
160000
1941 142500 145000 142000
2000 140000 128000
1698
1580 1572 1556 115000 115000
120000
Sq u are Fo o tage

1435 1438 100000


1500 1352 95000
100000
Price

1089
80000
1000
55000
60000

500 40000

20000

0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Intro Least Square method Assessing Model 8

BES
Regression Analysis
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage

Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000

Square Footage Price


Mean 1588.5 119340
Median 1564 121500
Mode #N/A 115000
Sample Standard Deviation 315.022486 30301.63912
Sample Variance 99239.16667 918189333.3
Range 1135 100900
Minimum 1089 55000
Maximum 2224 155900
Sum 15885 1193400
Count 10 10

Intro Least Square method Assessing Model 9

3
8/12/25

BES
Linear relationship
A scatter diagram can describe the relationship between Home price
and the square footage

Intro Least Square method Assessing Model 10

10

BES
Possible Regression Lines in Simple
Linear Regression

Intro Least Square method Assessing Model 11

11

The least square method


BES

Intro Least Square method Assessing Model 12

12

4
8/12/25

BES
The least square method

We can draw a line through the scatter points


Fact to get an idea about the direction and strength
of the relationship

Different people will draw different lines, how


Problem
do we choose the best?
➔ The best line is the one that minimizes the sum of squared
deviations between the points and the line.
Intro Least Square method Assessing Model 13

13

BES
The least square method
s
nce
ere
d diff
are
squ
f the
o …
sum line
s the d the
ze an
imi nts
min poi
s line n the
i
Th twee
be

Intro Least Square method Assessing Model 14

14

BES
Least squares Criterion

min *(y' − y-')(


Where
• y#: observed value of the dependent variable for the 𝑖$%
• y)# : predicted value of the dependent variable for the 𝑖$%

Intro Least Square method Assessing Model 15

15

5
8/12/25

BES
Least squares method
The least squares line is:
𝒚
( = 𝒃𝟎 + 𝒃𝟏𝐱

𝑺𝑺𝒙𝒚 ∑ 𝒙𝒊 𝟐
𝒃𝟏 = 𝑺𝑺𝒙 = /𝒙𝟐𝒊 −
𝑺𝑺𝒙 𝒏
) − 𝒃𝟏𝒙
𝒃𝟎 = 𝒚 ∑ 𝒚𝒊 𝟐
𝑺𝑺𝒚 = /𝒚𝟐𝒊 −
𝒏
∑ 𝒙𝒊 ∑ 𝒚𝒊
𝑺𝑺𝒙𝒚 = /𝒙𝒊 𝒚𝒊 −
𝒏
Intro Least Square method Assessing Model 16

16

BES
R output for linear model
Call:
lm(formula = Price ~ Square.Footage, data = Price)

Residuals:
Min 1Q Median 3Q Max
-31843 -17952 1784 13409 29680

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19170 on 18 degrees of freedom


Multiple R-squared: 0.5695, Adjusted R-squared: 0.5456
F-statistic: 23.82 on 1 and 18 DF, p-value: 0.0001204
Intro Least Square method Assessing Model 17

17

BES
R output for linear model
Regression Model : 𝐲 = 𝛃𝟎 + 𝛃𝟏𝐱 + 𝛆

𝒚
) = 𝟐𝟐𝟔𝟑𝟓. 𝟗𝟓 + 58.96 𝒙

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***

Intro Least Square method Assessing Model 18

18

6
8/12/25

BES
Assessing the Regression Model
The least squares method will produce a regression line whether or not
there is a linear relationship between x and y.
Consequently, it is important to assess how well the linear model fits the
data.
We look at descriptive measurements and one test procedure to assess
the model:
• Standard error of estimate
• Coefficient of determination (R2 ).
• Test for the existence of linear relationship between X and Y - Is
the x - Variable Important? (t-test of the slope)
• Coefficient of correlation (can be used to test for linear
relationships between two variables.)
Intro Least Square method Assessing Model 19

19

BES
Sum of squares for errors
• This is the sum of the squared vertical differences
between the points and the regression line.
• It can serve as a measure of how well the line fits
the data. )
𝑺𝑺𝟐𝒙𝒚
𝑆𝑆𝐸 = E(𝒚𝒊 − 𝒚(𝒊)+= 𝑺𝑺𝒚 −
𝑺𝑺𝒙
&'(
• This statistic plays a role in every statistical
technique we employ to assess the model.

Intro Least Square method Assessing Model 20

20

BES
Standard error of estimate
• The mean error is equal to zero.
• If 𝜎, is small, the errors tend to be close to zero (close to the
mean error). Then the model fits the data well.
• Therefore we can use se as a measure of the suitability of
using a linear model.
• An unbiased estimator of 𝜎, - is given by 𝑠, -
Standard Error of estimate
𝑺𝑺𝑬
𝒔𝜺 =
𝒏−𝟐

Intro Least Square method Assessing Model 21

21

7
8/12/25

BES
Coefficient of determination
• When we want to measure the strength of the linear relationship, we
use the coefficient of determination.

+
𝑆𝑆/0 𝑆𝑆𝐸
𝑅+ = =1−
𝑆𝑆/𝑆𝑆0 𝑆𝑆0
• 𝑅- takes on any value between zero and one.
• R2 = 1: perfect match between the line and the data points.
• R2 = 0: there is no linear relationship between x and y.

Intro Least Square method Assessing Model 22

22

BES
Coefficient of determination

explained in part by The regression


model
Overall variability in y
Remains, in part,
unexpected the error

Intro Least Square method Assessing Model 23

23

BES
Coefficient of determination
• The coefficient of determination measures the amount of
variation in the dependent variable that is explained by the
variation in the independent variable.
R output for linear model
Residual standard error: 19170 on 18 degrees of freedom
Multiple R-squared: 0.5695, Adjusted R-squared: 0.5456
F-statistic: 23.82 on 1 and 18 DF, p-value: 0.0001204

For Example: R2 = 0.5695


This tells us that 56.95% of the variation in the home prices is
explained by the variation in square footage. The rest (43.05%) remains
unexplained by this model.
Intro Least Square method Assessing Model 24

24

8
8/12/25

Testing the slope BES


Testing for the existence of linear relationship
• Hypothesis:
H0: b1 = 0 (no linear relationship exists)
HA: b1 ≠ 0 (linear relationship exists)- Two tail
• Test statistic
𝒃𝟏 − 𝛽"
𝑡=
𝑠𝒃𝟏
• Rejection region:
𝑡 < −𝑡/,12- or 𝑡 > 𝑡/,12-
- -
• Conclusion:

Intro Least Square method Assessing Model 25

25

Testing the slope BES


Testing for the existence of linear relationship
• Test statistic
𝒃𝟏 − 𝛽" 58.96 − 0
𝑡= = = 4.88
𝑠𝒃𝟏 12.08
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***

Intro Least Square method Assessing Model 26

26

Testing the slope BES


Testing for the existence of linear relationship
• Rejection region:
o t − value
𝑡 < −𝑡/,12- = −𝑡!.!4 = −2.101
- - ,-!2-
or 𝑡 > 𝑡/,12- = 2.101
-
o p − value = 0.00012
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***

• Conclusion: Linear relationship exists


Intro Least Square method Assessing Model 27

27

9
8/12/25

Coefficient of correlation
BES
• Test the coefficient of correlation to determine if a linear
relationship exists.
• The coefficient of correlation is used to measure the strength of
a linear association between two variables.
• The coefficient values range between –1 and 1.
• If r = –1 (perfect negative linear association) or r = +1
(perfect positive linear association): every point falls on the
regression line.
• If r = 0: there is no linear association.
• The coefficient can be used to test for linear relationships
between two variables.

Intro Least Square method Assessing Model 28

28

Coefficient of correlation
BES

Intro Least Square method Assessing Model 29

29

Testing the coefficient of correlation


BES
• When there is no linear relationship between two variables, 𝜌 = 0.
• The hypotheses are:
𝐻0: 𝜌 = 0
𝐻𝐴: 𝜌 ≠ 0
• The test statistic is:
𝑛−2 The statistic is Student t-distributed
𝑡=𝑟 with d.f. = n – 2, provided the variables
1 − 𝑟- are bivariate normally distributed.
Where 𝑟 is the sample coefficient of correlation calculated by
𝑆𝑆78
𝑟= = (sign) 𝑅-
𝑆𝑆7 𝑆𝑆8

Intro Least Square method Assessing Model 30

30

10
8/12/25

Testing the coefficient of correlation


BES
Test the coefficient of correlation to determine if a linear relationship
exists. R output is provided below:
Pearson's product-moment correlation

data: Price$Square.Footage and Price$Price


t = 4.8802, df = 18, p-value = 0.0001204
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4686849 0.8974788
sample estimates:
cor
0.7546824
Intro Least Square method Assessing Model 31

31

BES
Using the Regression equation
• Before using the regression model, we need to
assess how well it fits the data.
• If we are satisfied with how well the model fits the
data and the model assumptions are satisfied, we
can use it to make predictions for y.
Example
• Predict the price of a home with square footage = 1500

Intro Least Square method Assessing Model 32

32

BES
Prediction interval and confidence interval
• Two intervals can be used to discover how closely the predicted value
will match the true value of y
• prediction interval – for a particular value of y
x ( : the particular value of the independent variable x
1 (𝑥/ − 𝑥 ) *
𝑦J ± 𝑡) ,, - * 𝑠. 𝟏 + + ,
* 𝑛 ∑(𝑥0 − 𝑥 ) *

• confidence interval – for the expected value of y.


1 (𝑥/ − 𝑥 ) *
𝑦J ± 𝑡) ,, - * 𝑠. +
* 𝑛 ∑(𝑥0 − 𝑥 ) *
• The prediction interval is wider than the confidence interval.
Intro Least Square method Assessing Model 33

33

11
8/12/25

BES
Prediction interval and confidence interval
Provide an 95% confidence interval estimate for the price
of a home with square footage = 1500

R output:
> predict(reg.ex1,data.frame(Square.Footage=1500),interval="predict")
fit lwr upr
1 111075.3 69625.17 152525.3
> predict(reg.ex1,data.frame(Square.Footage=1500),interval="confidence")
fit lwr upr
1 111075.3 101239.8 120910.8

Intro Least Square method Assessing Model 34

34

BES
Regression diagnostics
• The three important conditions required for the validity
of the regression analysis are:
• The error variable is normally distributed.
• The error variance is constant for all values of x.
• The errors are independent of each other.
• How can we diagnose violations of these conditions?
(Self study)

Intro Least Square method Assessing Model 35

35

BES
Outliers
• An outlier is an observation that is unusually small or
large.
• Several possibilities need to be investigated when an
outlier is observed:
• There was an error in recording the value.
• The point does not belong in the sample.
• The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an outlier if
its |standard residual| > 2.
Intro Least Square method Assessing Model 36

36

12
8/12/25

BES
Outliers
an outlier an influential observation

+++++++++++
+ +
+
+ + … but some outliers
+ +
+
may be very influential.
+ + + +
+
+ +
+

The outlier causes a shift


in the regression line ...

Intro Least Square method Assessing Model 37

37

Procedure for BES


simple linear regression analysis
• Develop a model that has a theoretical basis.
• Gather data for the two variables in the model.
• Draw the scatter diagram to determine whether a linear
model appears to be appropriate.
• Check the required conditions for the errors.
• Assess the model fit.
• If the model fits the data and the assumptions are satisfied,
use the regression equation.
Intro Least Square method Assessing Model 38

38

Simple linear regression MODEL

The ESTIMATED simple linear regression equation is: 9 = 𝒃𝟎 + 𝒃𝟏𝒙


𝒚

Coefficient of determination 𝑹𝟐 The coefficient of determination measures the amount of variation


in the dependent variable that is explained by the variation in the independent variable. (This tells us
that 56.95% of the variation in the home prices is explained by the variation in square footage. The rest (43.05%) remains
unexplained by this model.)

Intro Least Square method Assessing Model 39

39

13

You might also like