0% found this document useful (0 votes)

13 views

Simple Linear Regression 69

This document provides an overview of simple linear regression. It defines linear regression as describing the linear relationship between a predictor variable (X) and response variable (Y). Key aspects covered include: - The linear regression equation Y = β0 + β1X + ε - How least squares regression fits the linear model by minimizing the sum of squared residuals - Descriptive statistics used in linear regression like variance, covariance, correlation - Inferential regression statistics like R2, standard error, ANOVA, hypothesis tests on coefficients - Assumptions of the linear regression model like independent and normally distributed errors. The document also briefly introduces multiple linear regression, extending the single predictor model to multiple predictors. Matrix algebra

Uploaded by

Härêm Ôd

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Simple Linear Regression 69

Uploaded by

Härêm Ôd

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Simple Linear

Regression (SLR)
Types of Correlation

Positive correlation Negative correlation No correlation

Simple linear regression describes the
linear relationship between a predictor
variable, plotted on the x-axis, and a
response variable, plotted on the y-axis
dependent Variable (Y)

Independent Variable (X)

Y   o  1 X

1
Y

1.0

o
X
Y   o  1 X

1
Y

1.0

o
X
Y

X
ε
Y

X
Fitting data to a linear model

Yi   o  1 X i   i

intercept slope residuals

How to fit data to a linear model?

The Ordinary Least Square Method (OLS)

Least Squares Regression

Model line: Y    0  1 X
Residual (ε) = Y  Y 
=
Sum of squares of residuals min (Y

 (Y
Y ) 2
 Y ) 2

• we must find values of  o and 1 that minimise

Regression Coefficients

S xy  xy
b1   2
S xx  x

b0  Y  b1 X
Required Statistics

n  number of observatio ns

X 
 X
n

Y 
 Y
n
Descriptive Statistics

 X  X 
n
2 S xx
Var ( X )  i 1
n 1
S yy (SST )
 Y  Y 
n
2

Var (Y )  i 1
n 1

 X  X Y  Y 
n
S xy
Covar ( X , Y )  i 1
n 1
Regression Statistics

SST   (Y  Y ) 2

SSR   (Y   Y ) 2

SSE   (Y  Y ) 2
Variance to be
explained by predictors
(SST)

Y
X1

Variance
Y
explained by X1 Variance NOT
explained by X1
(SSR)
(SSE)
Regression Statistics

SST  SSR  SSE

Regression Statistics

SSR
R 2

SST
Coefficient of Determination
to judge the adequacy of the regression model
Regression Statistics

R R 2

S xy  xy
R 
S xx S yy  x y

Correlation
measures the strength of the linear association between
two variables.
Regression Statistics
Standard Error for the regression model

S e  S  ˆ
2
e
2

SSE SSE   (Y  Y ) 2
S 
2

n2
e

S e  MSE
2
ANOVA
H 0 : 1  0
H A : 1  0

df SS MS F P-value

Regression 1 SSR SSR / df MSR / MSE P(F)

Residual n-2 SSE SSE / df

Total n-1 SST

If P(F)<a then we know that we get significantly better prediction of Y from the
regression model than by just predicting mean of Y.

ANOVA to test significance of regression

Hypothesis Tests for Regression
Coefficients

H 0 : i  0
H1 :  i  0

bi   i
t( n  k 1) 
Sbi
Hypotheses Tests for Regression
Coefficients

H 0 : 1  0
H A : 1  0

b1  1 b1  1
t( n k 1)  
S e (b1 ) 2
Se
S xx
Confidence Interval on Regression
Coefficients

Se2 Se2
b1  ta / 2,( n k 1)  1  b1  ta / 2,( n k 1)
S xx S xx

Confidence Interval for 1

Hypothesis Tests on Regression
Coefficients
H 0 : 0  0
H A : 0  0

b0   0 b0   0
t( n k 1)  
Se (b0 ) 1 X  2
S  
2
e

 n S xx 
Confidence Interval on Regression
Coefficients

 1 X 2
  1 X 2

b0  ta / 2,( nk 1) Se  
2
   0  b0  ta / 2,( nk 1) Se  
2

 n S xx   n S xx 

Confidence Interval for the intercept

Hypotheses Test the Correlation Coefficient

H0 :   0
HA :   0

R n2
T0 
1 R 2

We would reject the null hypothesis if t0  ta / 2,n 2

Diagnostic Tests For Regressions
Expected distribution of residuals for a linear
model with normal distribution or residuals
(errors).

i

Yi
Diagnostic Tests For Regressions
Residuals for a non-linear fit

i

Yi
Diagnostic Tests For Regressions
Residuals for a quadratic function
or polynomial

i

Yi
Diagnostic Tests For Regressions
Residuals are not homogeneous
(increasing in variance)

i

Yi
Regression – important points

1. Ensure that the range of values

sampled for the predictor variable
is large enough to capture the full
range to responses by the response
variable.
Y

X
Y

X
Regression – important points

2. Ensure that the distribution of

predictor values is approximately
uniform within the sampled range.
Y

X
Y

X
Assumptions of Regression

1. The linear model correctly

describes the functional relationship
between X and Y.
Assumptions of Regression

1. The linear model correctly

describes the functional relationship
between X and Y.
Y

X
Assumptions of Regression

2. The X variable is measured

without error
Y

X
Assumptions of Regression

3. For any given value of X, the sampled

Y values are independent
4. Residuals (errors) are normally
distributed.
5. Variances are constant along the
regression line.
Multiple Linear
Regression (MLR)
The linear model with a single
predictor variable X can easily
be extended to two or more
predictor variables.

Y  o  1 X1  2 X 2  ...   p X p  
Common variance
explained by X1 and X2
Unique variance
explained by X2

X2
X1

Y
Unique variance
Variance NOT
explained by X1
explained by X1 and X2
A “good” model

X1 X2

Y
Y  o  1 X1  2 X 2  ...   p X p  

intercept Partial Regression residuals

Coefficients

Partial Regression Coefficients (slopes):

Regression coefficient of X after controlling for
(holding all other predictors constant) influence
of other variables from both X and Y.
The matrix algebra of
Ordinary Least Square
Intercept and Slopes:

  ( X ' X ) X 'Y 1

Predicted Values:

Y   X
Residuals:

Y Y
Regression Statistics
How good is our model?

SST   (Y  Y ) 2

SSR   (Y   Y ) 2

SSE   (Y  Y ) 2
Regression Statistics

SSR
R 2

SST
Coefficient of Determination
to judge the adequacy of the regression model
Regression Statistics

n 1
R 2
 1 (1  R )
2

n  k 1
adj

n = sample size
k = number of independent variables

Adjusted R2 are not biased!

Regression Statistics
Standard Error for the regression model

S e  S  ˆ
2
e
2

SSE SSE   (Y  Y ) 2
S 
2

n  k 1
e

S e  MSE
2
ANOVA
H 0 : 1   2  ...   k  0
H A :  i  0 at least one!

df SS MS F P-value

Regression k SSR SSR / df MSR / MSE P(F)

Residual n-k-1 SSE SSE / df

Total n-1 SST

If P(F)<a then we know that we get significantly better prediction of Y from the
regression model than by just predicting mean of Y.

ANOVA to test significance of regression

Hypothesis Tests for Regression
Coefficients

H 0 : i  0
H1 :  i  0

bi   i
t( n  k 1) 
Sbi
Hypotheses Tests for Regression
Coefficients

H 0 : i  0
H A : i  0

b1   i bi   i
t( n  k 1)  
S e (bi ) 2
S e Cii S 2
e
S xx
Confidence Interval on Regression
Coefficients

bi  ta / 2,( nk 1) Se2Cii  i  bi  ta / 2,( nk 1) Se2Cii

Confidence Interval for i

  ( X ' X ) X 'Y
1
  ( X ' X ) X 'Y
1
  ( X ' X ) X 'Y
1
b1   i bi   i
t( n  k 1)  
S e (bi ) 2
S e Cii
Diagnostic Tests For Regressions
Expected distribution of residuals for a linear
model with normal distribution or residuals
(errors).

i
X Residual Plot

Residuals
5
0
-5 0 2 4 6 8
X

Xi
Standardized Residuals
ei
di 
2
S e
Standard Residuals

2.5
2
1.5
1
0.5
0
-0.5 0 5 10 15 20 25
-1
-1.5
-2
Model Selection

Avoiding predictors (Xs)

that do not
contribute significantly
to model prediction
Model Selection

- Forward selection
The „best‟ predictor variables are entered, one by one.

- Backward elimination
The „worst‟ predictor variables are eliminated, one by one.
Forward Selection
Backward
Elimination
Model Selection: The General Case
H 0 :  q 1   q  2  ...   k  0
H1 : at least one in not zero

SSE ( x1 , x2 ,..., xq )  SSE ( x1 , x2 ,..., xq , xq 1 ,..., xk )

k q
F
SSE ( x1 , x2 ,..., xq , xq 1 ,..., xk )
n  k 1

Reject H0 if : F  Fa ,k q ,n k 1
Multicolinearity
 The degree of correlation between Xs.

 A high degree of multicolinearity produces

unacceptable uncertainty (large variance) in
regression coefficient estimates (i.e., large
sampling variation)

 Imprecise estimates of slopes and even the

signs of the coefficients may be misleading.

 t-tests which fail to reveal significant factors.

Scatter Plot
Multicolinearity
 If the F-test for significance of regression is
significant, but tests on the individual
regression coefficients are not,
multicolinearity may be present.

 Variance Inflation Factors (VIFs) are very

useful measures of multicolinearity. If any VIF
exceed 5, multicolinearity is a problem.

1
VIF (  i )   Cii
1  Ri2
Model Evaluation

n
PRESS   ( yi  y(i ) )
 2

i 1

Prediction Error Sum of Squares

(leave-one-out)
Thank You!

Bahnsen - Philosophy Outline
0% (1)
Bahnsen - Philosophy Outline
8 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Real Statistics Examples ANOVA 1
No ratings yet
Real Statistics Examples ANOVA 1
317 pages
Butler Highway
No ratings yet
Butler Highway
16 pages
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
No ratings yet
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
39 pages
CSS (Also IAS/CSM/PMS) Philosophy Notes
100% (21)
CSS (Also IAS/CSM/PMS) Philosophy Notes
92 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Regression
No ratings yet
Regression
24 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Chapter 3 Multiple Linear Regression - We Use This One
No ratings yet
Chapter 3 Multiple Linear Regression - We Use This One
6 pages
Tutorial2_SLR(1)(1)
No ratings yet
Tutorial2_SLR(1)(1)
10 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
F_Regression
No ratings yet
F_Regression
65 pages
1725857551_SMA32
No ratings yet
1725857551_SMA32
30 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
C2 English
No ratings yet
C2 English
34 pages
Regression Model
No ratings yet
Regression Model
30 pages
Regression
No ratings yet
Regression
11 pages
Chapter 4 Multiple Regression Model
No ratings yet
Chapter 4 Multiple Regression Model
31 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Multiple Regression
No ratings yet
Multiple Regression
100 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
No ratings yet
Summary of Topics For Midterm Exam #2: STA 371G, Fall 2017
6 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Regression and Correlation
No ratings yet
Regression and Correlation
17 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
ch11
No ratings yet
ch11
55 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
No ratings yet
Lesson 12 - Introduction To Regression and Correlation Analysis Regression Analysis
39 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Lecture Week 13 - Regression
No ratings yet
Lecture Week 13 - Regression
10 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chapter 14 Simple Linear Regression
No ratings yet
Chapter 14 Simple Linear Regression
45 pages
Regression Lecture Summary
No ratings yet
Regression Lecture Summary
31 pages
Multiple Regression
No ratings yet
Multiple Regression
60 pages
Complete Business Statistics: Multiple Regression
No ratings yet
Complete Business Statistics: Multiple Regression
64 pages
Course 10-Part 2
No ratings yet
Course 10-Part 2
27 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
C1 English
No ratings yet
C1 English
26 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Exercises of Tensors
From Everand
Exercises of Tensors
Simone Malacrida
No ratings yet
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
BS33004FP BC Course Notes - Students Copy FINAL-V3 NEW
No ratings yet
BS33004FP BC Course Notes - Students Copy FINAL-V3 NEW
107 pages
Phil110 Week1 Intro
No ratings yet
Phil110 Week1 Intro
5 pages
Heteroscedasticity Week 1 Econometrics
No ratings yet
Heteroscedasticity Week 1 Econometrics
33 pages
Multiple Choice Questions: Answer: D
82% (11)
Multiple Choice Questions: Answer: D
41 pages
Jawapan Soalan Latihan Sains Tahun 4, 5, 6
No ratings yet
Jawapan Soalan Latihan Sains Tahun 4, 5, 6
27 pages
Lecture 7 (Two Sample Tests)
No ratings yet
Lecture 7 (Two Sample Tests)
28 pages
Submission Nich
No ratings yet
Submission Nich
5 pages
CSAT Comprehension & Reasoning - Without Coaching Studyplan
No ratings yet
CSAT Comprehension & Reasoning - Without Coaching Studyplan
55 pages
Tugas2 TA2201 JuwitaSasiMaulidya12116018 AstriedMaulidya12116084
No ratings yet
Tugas2 TA2201 JuwitaSasiMaulidya12116018 AstriedMaulidya12116084
36 pages
NANDA 2024-2026
0% (1)
NANDA 2024-2026
26 pages
24 Linreg 2
No ratings yet
24 Linreg 2
12 pages
L02 - Knowledge-Rep - Part 1
No ratings yet
L02 - Knowledge-Rep - Part 1
47 pages
5.3 Logical Reasoning
No ratings yet
5.3 Logical Reasoning
4 pages
Univariate Analysis of Variance: Between-Subjects Factors
No ratings yet
Univariate Analysis of Variance: Between-Subjects Factors
3 pages
OPMC001 - Business Statistics - Both Assignment
No ratings yet
OPMC001 - Business Statistics - Both Assignment
189 pages
Artificial Intelligence (Ai) - Knowledge Representation Schemes
No ratings yet
Artificial Intelligence (Ai) - Knowledge Representation Schemes
17 pages
Modern Symbolic Logic Strategies
No ratings yet
Modern Symbolic Logic Strategies
5 pages
AITools Unit 1
No ratings yet
AITools Unit 1
14 pages
Course Plan Ugqb 3033-September 2013 (Updated) Doc
No ratings yet
Course Plan Ugqb 3033-September 2013 (Updated) Doc
8 pages
Effectiveness of Cartoons As A Uniquely Visual Medium For Orienting Social Issues
No ratings yet
Effectiveness of Cartoons As A Uniquely Visual Medium For Orienting Social Issues
49 pages
Language
No ratings yet
Language
481 pages
AI Question and Answer
No ratings yet
AI Question and Answer
13 pages
Applying Regression Using e Views (With Commands)
No ratings yet
Applying Regression Using e Views (With Commands)
85 pages
Golledge, R. & Amedeo, D. - On Laws in Geography
No ratings yet
Golledge, R. & Amedeo, D. - On Laws in Geography
15 pages
Tinubong 1
No ratings yet
Tinubong 1
10 pages

Simple Linear Regression 69

Uploaded by

Simple Linear Regression 69

Uploaded by

Simple Linear

Positive correlation Negative correlation No correlation

Independent Variable (X)

intercept slope residuals

The Ordinary Least Square Method (OLS)

• we must find values of  o and 1 that minimise

SST  SSR  SSE

Regression 1 SSR SSR / df MSR / MSE P(F)

Residual n-2 SSE SSE / df

Total n-1 SST

ANOVA to test significance of regression

Confidence Interval for 1

Confidence Interval for the intercept

We would reject the null hypothesis if t0  ta / 2,n 2

1. Ensure that the range of values

2. Ensure that the distribution of

1. The linear model correctly

1. The linear model correctly

2. The X variable is measured

3. For any given value of X, the sampled

intercept Partial Regression residuals

Partial Regression Coefficients (slopes):

Adjusted R2 are not biased!

Regression k SSR SSR / df MSR / MSE P(F)

Residual n-k-1 SSE SSE / df

Total n-1 SST

ANOVA to test significance of regression

bi  ta / 2,( nk 1) Se2Cii  i  bi  ta / 2,( nk 1) Se2Cii

Confidence Interval for i

Avoiding predictors (Xs)

SSE ( x1 , x2 ,..., xq )  SSE ( x1 , x2 ,..., xq , xq 1 ,..., xk )

 A high degree of multicolinearity produces

 Imprecise estimates of slopes and even the

 t-tests which fail to reveal significant factors.

 Variance Inflation Factors (VIFs) are very

Prediction Error Sum of Squares

You might also like