L9 - Simple Linear Regression
L9 - Simple Linear Regression
BES
BES
Regression Analysis
Regression analysis is used to model relationships and predict the
value of one variable (the dependent variable) on the basis of
other variables (the independent variables).
BES
Simple linear regression Model
1
8/12/25
BES
Assumptions about the error term 𝜀
BES
Regression Analysis
b 0 and b 1 Estimated
provide estimates of Regression Equation
b 0 and b 1 𝑦! = 𝑏! + 𝑏" 𝑥
Sample Statistics b 0, b 1
2
8/12/25
BES
Regression Analysis
Recent family home sales in San Antonio provided the
data displayed (partly) (𝑛 = 20). We wish to predict the
home prices using the square footage.
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage
Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000
BES
Regression Analysis
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage
Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000
1089
80000
1000
55000
60000
500 40000
20000
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
BES
Regression Analysis
Square
1580 1572 1352 2224 1556 1435 1438 1089 1941 1698
Footage
Price 142500 145000 115000 155900 95000 128000 100000 55000 142000 115000
3
8/12/25
BES
Linear relationship
A scatter diagram can describe the relationship between Home price
and the square footage
10
BES
Possible Regression Lines in Simple
Linear Regression
11
12
4
8/12/25
BES
The least square method
13
BES
The least square method
s
nce
ere
d diff
are
squ
f the
o …
sum line
s the d the
ze an
imi nts
min poi
s line n the
i
Th twee
be
14
BES
Least squares Criterion
15
5
8/12/25
BES
Least squares method
The least squares line is:
𝒚
( = 𝒃𝟎 + 𝒃𝟏𝐱
𝑺𝑺𝒙𝒚 ∑ 𝒙𝒊 𝟐
𝒃𝟏 = 𝑺𝑺𝒙 = /𝒙𝟐𝒊 −
𝑺𝑺𝒙 𝒏
) − 𝒃𝟏𝒙
𝒃𝟎 = 𝒚 ∑ 𝒚𝒊 𝟐
𝑺𝑺𝒚 = /𝒚𝟐𝒊 −
𝒏
∑ 𝒙𝒊 ∑ 𝒚𝒊
𝑺𝑺𝒙𝒚 = /𝒙𝒊 𝒚𝒊 −
𝒏
Intro Least Square method Assessing Model 16
16
BES
R output for linear model
Call:
lm(formula = Price ~ Square.Footage, data = Price)
Residuals:
Min 1Q Median 3Q Max
-31843 -17952 1784 13409 29680
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
17
BES
R output for linear model
Regression Model : 𝐲 = 𝛃𝟎 + 𝛃𝟏𝐱 + 𝛆
𝒚
) = 𝟐𝟐𝟔𝟑𝟓. 𝟗𝟓 + 58.96 𝒙
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22635.95 20460.07 1.106 0.28314
Square.Footage 58.96 12.08 4.880 0.00012 ***
18
6
8/12/25
BES
Assessing the Regression Model
The least squares method will produce a regression line whether or not
there is a linear relationship between x and y.
Consequently, it is important to assess how well the linear model fits the
data.
We look at descriptive measurements and one test procedure to assess
the model:
• Standard error of estimate
• Coefficient of determination (R2 ).
• Test for the existence of linear relationship between X and Y - Is
the x - Variable Important? (t-test of the slope)
• Coefficient of correlation (can be used to test for linear
relationships between two variables.)
Intro Least Square method Assessing Model 19
19
BES
Sum of squares for errors
• This is the sum of the squared vertical differences
between the points and the regression line.
• It can serve as a measure of how well the line fits
the data. )
𝑺𝑺𝟐𝒙𝒚
𝑆𝑆𝐸 = E(𝒚𝒊 − 𝒚(𝒊)+= 𝑺𝑺𝒚 −
𝑺𝑺𝒙
&'(
• This statistic plays a role in every statistical
technique we employ to assess the model.
20
BES
Standard error of estimate
• The mean error is equal to zero.
• If 𝜎, is small, the errors tend to be close to zero (close to the
mean error). Then the model fits the data well.
• Therefore we can use se as a measure of the suitability of
using a linear model.
• An unbiased estimator of 𝜎, - is given by 𝑠, -
Standard Error of estimate
𝑺𝑺𝑬
𝒔𝜺 =
𝒏−𝟐
21
7
8/12/25
BES
Coefficient of determination
• When we want to measure the strength of the linear relationship, we
use the coefficient of determination.
+
𝑆𝑆/0 𝑆𝑆𝐸
𝑅+ = =1−
𝑆𝑆/𝑆𝑆0 𝑆𝑆0
• 𝑅- takes on any value between zero and one.
• R2 = 1: perfect match between the line and the data points.
• R2 = 0: there is no linear relationship between x and y.
22
BES
Coefficient of determination
23
BES
Coefficient of determination
• The coefficient of determination measures the amount of
variation in the dependent variable that is explained by the
variation in the independent variable.
R output for linear model
Residual standard error: 19170 on 18 degrees of freedom
Multiple R-squared: 0.5695, Adjusted R-squared: 0.5456
F-statistic: 23.82 on 1 and 18 DF, p-value: 0.0001204
24
8
8/12/25
25
26
27
9
8/12/25
Coefficient of correlation
BES
• Test the coefficient of correlation to determine if a linear
relationship exists.
• The coefficient of correlation is used to measure the strength of
a linear association between two variables.
• The coefficient values range between –1 and 1.
• If r = –1 (perfect negative linear association) or r = +1
(perfect positive linear association): every point falls on the
regression line.
• If r = 0: there is no linear association.
• The coefficient can be used to test for linear relationships
between two variables.
28
Coefficient of correlation
BES
29
30
10
8/12/25
31
BES
Using the Regression equation
• Before using the regression model, we need to
assess how well it fits the data.
• If we are satisfied with how well the model fits the
data and the model assumptions are satisfied, we
can use it to make predictions for y.
Example
• Predict the price of a home with square footage = 1500
32
BES
Prediction interval and confidence interval
• Two intervals can be used to discover how closely the predicted value
will match the true value of y
• prediction interval – for a particular value of y
x ( : the particular value of the independent variable x
1 (𝑥/ − 𝑥 ) *
𝑦J ± 𝑡) ,, - * 𝑠. 𝟏 + + ,
* 𝑛 ∑(𝑥0 − 𝑥 ) *
33
11
8/12/25
BES
Prediction interval and confidence interval
Provide an 95% confidence interval estimate for the price
of a home with square footage = 1500
R output:
> predict(reg.ex1,data.frame(Square.Footage=1500),interval="predict")
fit lwr upr
1 111075.3 69625.17 152525.3
> predict(reg.ex1,data.frame(Square.Footage=1500),interval="confidence")
fit lwr upr
1 111075.3 101239.8 120910.8
34
BES
Regression diagnostics
• The three important conditions required for the validity
of the regression analysis are:
• The error variable is normally distributed.
• The error variance is constant for all values of x.
• The errors are independent of each other.
• How can we diagnose violations of these conditions?
(Self study)
35
BES
Outliers
• An outlier is an observation that is unusually small or
large.
• Several possibilities need to be investigated when an
outlier is observed:
• There was an error in recording the value.
• The point does not belong in the sample.
• The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an outlier if
its |standard residual| > 2.
Intro Least Square method Assessing Model 36
36
12
8/12/25
BES
Outliers
an outlier an influential observation
+++++++++++
+ +
+
+ + … but some outliers
+ +
+
may be very influential.
+ + + +
+
+ +
+
37
38
39
13