COMM5005 Lecture 8
COMM5005 Lecture 8
for Business
COMM5005
Lecture 8
Statistics Flow Chart
Statistics
Lecture 5
Probability
Descriptive Inferential
Lecture 8
245 1400
500
312 1600
400
Weekly sales
279 1700
300
308 1875
200
199 1100 100
219 1550 0
405 2350 0 500 1000 1500 2000 2500 3000
319 1425
255 1700
Simple linear regression example (3 of 6)
Simple linear regression example (4 of 6)
Simple linear regression example (5 of 6)
• Predict the weekly sales for the local store for 2,000 customers
• The predicted weekly sales for the local computer games store for 2,000
customers is 317.85 ($1,000s) = $317,850.
Measures of variation
• Total variation is made up of two parts.
Independence of errors
• Error values are statistically independent
Normality of error
• Error values (ε) are normally distributed for any given value of X
Y- Population
intercept slopes Random Error
Significance
ANOVA df SS MS F F
Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
Significance
ANOVA df SS MS F F
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Adjusted r2 (1 of 3)
• r2 never decreases when a new X variable is added to the
model
• this can be a disadvantage when comparing models
é 2 æ n - 1 öù
r 2
adj = 1 - ê(1 - r )ç ÷ú
ë è n - k - 1 øû
(where: n = sample size, k = number of independent variables)
• Penalises excessive use of unimportant independent variables
• Smaller than r2
• Useful in comparing among models
Adjusted r2 (3 of 3)
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R
Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable
affects Y)
F test for overall significance (1 of 3)
Test statistic
SSR
MSR k
F= =
MSE SSE
n - k -1
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
F test for overall significance (3 of 3)
Are individual variables significant?
Hypotheses:
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist)
Test Statistic:
bj - 0
t=
Sb j
where the degree of freedom for this statistic is n – k – 1, and 𝑆!! is the
standard error of estimate 𝑏" .
Are individual variables significant?
Regression Statistics
Multiple R 0.72213
t-value for Price is t =
R Square 0.52148 -2.306, with p-value .0398
Adjusted R
Square 0.44172
Standard Error 47.46341
t-value for Advertising is t =
Observations 15 2.855, with p-value .0145
Significance
ANOVA df SS MS F F
Regression 2 29460.027 14730.01 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
Are individual variables significant?
From Excel output:
H0: βi = 0 Coefficients
Standard
Error t Stat P-value
-
H1: βi ≠ 0 Price -24.97509 10.83213 2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
d.f. = 15-2-1 = 12
a = .05 ta/2 = 2.1788
Check: The statistic for each variable falls in
the rejection region (p-values < .05)
a/2=.025 a/2=.025
Conclusion: Reject H0 for each
variable.
Reject Do not reject H0 Reject There is evidence that both
H0 -tα/2 0 tα/2 H0
Price and Advertising affect
-2.1788 2.1788 pie sales at a = .05
Confidence Interval Estimate for the Slope
Confidence interval for the population slope βj
Standard
Coefficients Error Here, t has:
Intercept 306.52619 114.25389 (15 – 2 – 1) = 12
Price -24.97509 10.83213 d.f.
Advertising 74.13096 25.96732
Example: Form a 95% confidence interval for the effect of changes in price (X1) on
pie sales: -24.975 ± (2.1788)(10.832)
Standard
Coefficients Error … Lower 95% Upper 95%
Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888
Ŷ = b0 + b1 X1 + b 2 X 2
Ŷ = b0 + b1 X1 + b 2 (0) = b0 + b1 X1 No holiday
Different Same
Y (sales) intercept slope
If H0: β2 = 0 is
b0 + b2
Holi rejected, then
day
(X = 1) ‘Holiday’ has a
b0 2
No h significant effect on
olida
y (X pie sales
2 = 0)
X1 (Price)
Notices
1-53
• The final online test will be held on 5 December 2022 from
9.00 am 1.00 pm. See the Moodle site for details.
• Topics examined in the final exam will be mostly statistics
(80% of questions), as well as financial mathematics and
calculus.
• This is an open-book test, so textbook and notes are
allowed. You can also use excel and any calculators you
want.
• Last eLearning tutorial and online quiz #3 will be released
this week.
• Last seminar will be next week (on week 10).
• Group assignment due at 6.00 pm on Friday 18 November.
• Finally, I am available for consultation until the final exam.