0% found this document useful (0 votes)
7 views34 pages

11 SimpleRegression

Probability and statistic course content

Uploaded by

sonaak292
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views34 pages

11 SimpleRegression

Probability and statistic course content

Uploaded by

sonaak292
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

DISC 203 – PROBABILITY &

STATISTICS

SIMPLE LINEAR
REGRESSION

1 Lecturer: Muhammad Asim


SIMPLE LINEAR REGRESSION
MODEL

Slope Random
Intercept Independe Error
Coefficien
nt Variable term
t
Dependent
Variable

Yi β0  β1Xi  ε i
Deterministic component Random Error
component
EXAMPLE
 You are a marketing analyst for Teddy Bears. You
gather the following data and want to find a
simple relationship between advertising and
sales.
Advertising - Sales Data
Month Advertising Sales
Expenditure Revenue y
x ($100) ($1,000)
1 1 1
2 2 1
3 3 2
4 4 2 3

5 5 4
SCATTERGRAM
SALES VS. ADVERTISING

Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
4
SCATTERGRAM
SALES VS. ADVERTISING

5
LEAST SQUARES ESTIMATORS
 Prediction equation
yˆi ˆ0  ˆ1 xi
 Sample slope
SS xy  xi  x yi  y 
ˆ1  
2
SS xx  ix  x 
 Sample y - intercept  y  y 
SS xy  x  x

 x  x 
2
SS xx
ˆ0  y  ˆ1x SS yy  y  y 
2

6
COMPUTATIONS – LEAST SQUARES LINE
xi (adv) yi (sales) (xi - 3)2 (xi - 3)(yi - 2)
1 1 4 2
2 1 1 1
3 2 0 0
4 2 1 0
5 4 4 4
∑xi = 15 ∑yi = 10 SSxx= ∑(xi - 3) SSxy=∑(xi - 3)(yi -
Mean = 3 Mean = 2 2
2)
= 10 =7

7
COEFFICIENT INTERPRETATIONS

^
1. Slope (1)
• Sales Volume (y) is expected to increase by $ 700
for each $100 increase in advertising (x), over the
sampled range of advertising expenditures from
$100 to $500
^
2. y-Intercept (0)
• Since 0 is outside of the range of the sampled
values of x, the y-intercept has no meaningful
interpretation 8
MEASURES OF VARIATION

 SST = total sum of squares


 Measures the variation of the yi values around their
mean, y
 SSR = regression sum of squares
 Explained variation attributable to the linear
relationship between x and y
 SSE = error sum of squares
 Variation attributable to factors other than the linear
relationship between x and y
MEASURES OF VARIATION

 Total variation is made up of two parts:

SST  SSR  SSE


Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST  (y i  y)2 SSR  (yˆ i  y)2 SSE  (y i  yˆ i )2


where:
y = Average value of the dependent
variable
ŷ = Observed values of the dependent
y i
Advertising Sales
yhat=b0+b1* (y- Revenue
x yhat)^2 Expenditur y ($1,000)
(y-
yhat) e x ($100)

n=5

11
STANDARD ERROR OF THE
REGRESSION MODEL

2 SSE SSE
s  
Degrees of freedom for error n  2

 We refer to s as the standard error of the


regression model
 s measures the spread of the distribution of y

values about the least squares line


 We expect most of the observed y-values to

lie within 2s of their respective least squares


predicted values
CALCULATING S2 AND S

2 SSE 1.1
s   .36667
n 2 5 2

s  .36667 .6055
We would expect most of the observed revenues to
fall within 2s or $1,220 of the least squares line.

13
COEFFICIENT OF DETERMINATION, R2

 The coefficient of determination is the


portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
 The coefficient of determination is also

called R-squared and is denoted as R2


SSR regression sum of squares
2
R  
SST total sum of squares

2
note: 0 R 1
R2 INTERPRETATION

R2 = SSR/SST = 0.82

Interpretation: About 82% of the sample variation in


Sales can be explained by Advertising Expenditures,
using the linear regression model.

15
RECAP

Yi β0  β1Xi  ε i

16
MAKING INFERENCES ABOUT SLOPE
 E(y) = 0 + 1x
 Ho: 1 = 0
 Ha: 1 ≠ 0
 If 1 = 0, then x has no influence on y.
 If we reject Ho, we say that x has a statistically
significant effect on y.
 To test the null, we need to know the sampling
̂
distribution of 1

17
MAKING INFERENCES ABOUT THE
SLOPE 1
 Sampling Distribution of̂1 for large n:


̂1 ~ N ( 1, ˆ 
1
SS xx
)

 Typically approximate
 s
 ˆ  by
sˆ 
1
SS xx 1
SS xx

 So, when n is large, we use a z-statistic ~ N(0,1)


 When n is small, we typically use a t-statistic ~ t( n-2 )
 For large n, the distributions of z and t statistics are
almost the same
18
MAKING INFERENCES ABOUT THE
SLOPE 1
A Test of Model Utility: Simple Linear
Regression
One-Tailed Test Two-Tailed Test
H0: β1=0 H0: β1=0
Ha: β1<0 (or Ha: β1>0) Ha: β1≠0 s  2 SSE

SSE
Degrees of freedom for error n  2
ˆ1 ˆ1
Test statistic :t  
sˆ s SS xx
1

Rejection region: t< -tα Rejection region: |t|> tα/2


(or t< -tα when Ha: β1>0) 19

Where tα and tα/2 are based on (n-2) degrees of freedom


EXAMPLE
 We estimated a simple relationship between
advertising and sales based on a sample of 5
observations. Is the true relationship
statistically significant at the .05 level of
significance?

20
TEST OF SLOPE COEFFICIENT
SOLUTION

H : 1 = 0
0

 H : 1  0
a
  .05
 df 5 – 2 = 3
 Critical Value(s):
Reject H0 Reject H0
.025 .025

-3.182 0 3.182 t
21
TEST OF SLOPE COEFFICIENT
SOLUTION

H : 1 = 0
0 Test Statistic:
 H : 1  0
a
  .05
 df 5 – 2 = 3
 Critical Value(s): Decision:
Reject H0 Reject H0 Reject Ho at  = .05
.025 .025 Conclusion:

There is evidence of a
-3.182 0 3.182 t relationship
22
MAKING INFERENCES ABOUT THE
SLOPE 1
 Confidence Interval for 1 : ˆ1 t 2 sˆ
1

 [0.090, 1.309]

 We can be 95% confident that the true mean


increase in monthly sales revenue per
additional $100 of advertising expenditure is
between $90 and $1,310.
23
REGRESSION RESULTS IN R

24
PREDICTION WITH REGRESSION
MODELS
 Types of predictions
 Point estimates
 Interval estimates
 What is predicted
 Population mean response E(y) for given x
 Point on population regression line
 Individual response (yi) for given x

25
WHAT IS PREDICTED

y
yIndividual ^b x
^b 0 + 1

^y i =
Mean y, E(y)

E(y) = b0 + b1x

Prediction, ^
y
x
xP 26
USING THE MODEL FOR
ESTIMATION AND PREDICTION
 100(1-α)% Confidence Interval for Mean Value of y at
x=xp

1 xp  x 
2

yˆ t 2 s 
n SS xx
 100(1-α)% Prediction Interval for an Individual New
Value of y at x=xp
1 xp  x 
2

yˆ t 2 s 1  
n SS xx
27
 where tα/2 is based on (n-2) degrees of freedom
EXAMPLE
 Find a 95% confidence interval for the mean
monthly sales when the store spends $400
on advertising.

28
EXAMPLE
 Predict the monthly sales for next month if
$400 is spent on advertising. Use a 95%
prediction interval.

29
CONFIDENCE INTERVALS V.
PREDICTION INTERVALS

y
^b xi
^b 0 + 1

^y i =

x
x 30
MODEL ASSUMPTIONS

31
MODEL ASSUMPTIONS
So far we only estimated deterministic component. Now we
turn our attention to random error ϵ. We first need some
modeling assumptions…
Assumption 1: E( /x) = E( ) = 0
The mean of the probability distribution of  is 0. This
implies mean value of y for a given value of x is 0 + 1x.
y = 0 + 1 x + 
Since, E( /x) = E( ) = 0,
E(y/x) = 0 + 1x
32
Sometimes, just written as E(y) = 0 + 1x
MODEL ASSUMPTIONS

Assumption 2: Homoskedasticity
• The variance of the probability distribution of 
is constant for all settings of the independent
variable x. For our straight-line model, this
assumption means that the variance of  is
equal to a constant, say 2, for all values of x.
• When this assumption does not hold, we say
we have a problem of heteroskedasticity.
33
MODEL ASSUMPTIONS

Assumption 3: Normality
The probability distribution of  is normal.
Assumption 4: No Autocorrelation
The values of  associated with any two observed
values of y are independent–that is, the value of 
associated with one value of y has no effect on the
values of  associated with other y values.
34

You might also like