14 Statistics and Probability
14 Statistics and Probability
1) Introduction
2) Scatter Diagrams
3) Simple Linear Regression
4) Measuring the Fit of the Regression Model
5) Assumptions of the Regression Model
Introduction
• Regression analysis is a very valuable tool for a manager
• Regression can be used to
•Understand the relationship between variables
•Predict the value of one variable based on another variable
• Examples
•Determining best location for a new store
•Studying the effectiveness of advertising dollars in increasing sales
volume
• The variable to be predicted is called the dependent variable
•Sometimes called the response variable
• The value of this variable depends on the value of the independent
variable
•Sometimes called the explanatory or predictor variable
Dependent Independent Independent
variable
= variable
+ variable
Scatter Diagram
• Graphing is a helpful way to investigate the relationship between
variables.
• A scatter diagram or scatter plot is often used for that relation.
• The independent variable is normally plotted on the X axis.
• The dependent variable is normally plotted on the Y axis.
Triple A Construction
• Triple A Construction renovates old homes.
• They have found that the dollar volume of renovation work is dependent
on the area payroll.
10 –
8–
Sales ($100,000)
6–
4–
2–
0– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
Figure 1
Simple Linear Regression
◼ Regression models are used to test if there is a relationship between
variables (predict sales based on payroll)
◼ There is some random error that cannot be predicted
Y = 0 + 1X +
where
Y = dependent variable (response)
X = independent variable (predictor or
explanatory)
𝜷0 = intercept (value of Y when X = 0)
𝜷 1 = slope of the regression line
e = random error
Simple Linear Regression
◼ True values for the slope and intercept are not known so they are
estimated using sample data
Yˆ = b0 + b1 X
where
^
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
b0 = intercept (value of Y when X = 0)
b1 = slope of the regression line
Example; Triple A Construction
Y = Sales
X = Area payroll
◼ The line chosen in Figure 4.1 is the one that minimizes the errors
8
Sales
6
4
2
0
0 2 4 6 8
Payroll ($100.000,000's)
• For the simple linear regression model, the values of the intercept and
slope can be calculated using the formulas below
Yˆ = b0 + b1 X
X=
X
= average (mean) of X values
n
Y=
Y
= average (mean) of Y values
n
b1 =
( X − X )(Y − Y )
(X − X ) 2
b0 = Y − b1 X
• Regression calculations
Y X (X – X)2 (X – X)(Y – Y)
6 3 (3 – 4)2 = 1 (3 – 4)(6 – 7) = 1
8 4 (4 – 4)2 = 0 (4 – 4)(8 – 7) = 0
9 6 (6 –𝜷4)2 = 4 (6 – 4)(9 – 7) = 4
5 4 (4 – 4)2 = 0 (4 – 4)(5 – 7) = 0
4.5 2 (2 – 4)2 = 4 (2 – 4)(4.5 – 7) = 5
9.5 5 (5 – 4)2 = 1 (5 – 4)(9.5 – 7) = 2.5
ΣY = 42 ΣX = 24 Σ(X – X)2 = 10 Σ(X – X)(Y – Y) = 12.5
Y = 42/6 = 7 X = 24/6 = 4
Table 2
• Regression calculations
X=
X 24
= =4
6 6
Y=
Y 42
= =7
6 6
b1 =
( X − X )(Y − Y ) 12.5
= = 1.25
(X − X ) 2
10
b0 = Y − b1 X = 7 − (1.25 )( 4 ) = 2
Therefore Yˆ = 2 + 1.25 X
• Regression calculations sales = 2 + 1.25(payroll)
X=
X 24
= =4 If the payroll next
6 6 year is $600 million
Y=
Y 42
= =7 Yˆ = 2 + 1.25(6) = 9.5 or $ 950,000
6 6
b1 =
( X − X )(Y − Y ) 12.5
= = 1.25
(X − X ) 2
10
b0 = Y − b1 X = 7 − (1.25 )( 4 ) = 2
Therefore Yˆ = 2 + 1.25 X
Measuring the Fit
of the Regression Model
◼ Regression models can be developed for any variables X and Y
◼ How do we know the model is actually helpful in predicting Y based on X?
◼ We could just take the average error, but the positive and negative
errors would cancel each other out
◼ Three measures of variability are
◼ SST – Total variability about the mean
◼ SSE – Variability about the regression line
◼ SSR – Total variability that is explained by the model
◼ Sum of the squares total
SST = (Y − Y )2
◼ An important relationship
SST = SSR + SSE
Y X (Y – Y)2 Y^ ^ 2
(Y – Y) (Y^ – Y)2
6 3 (6 – 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563
Table 3
For Triple A Construction
◼ An important relationship
SST = SSR + SSE
◼ SSR – explained variability
◼ SSE – unexplained variability
12 – ^
Y = 2 + 1.25X
10 –
^
Y–Y
Sales ($100,000) 8– ^
Y–Y
Y–Y Y
6–
4–
2–
0– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
Figure 4.2
Coefficient of Determination
•The proportion of the variability in Y explained by regression equation is
called the coefficient of determination
•The coefficient of determination is r2
SSR SSE
r = 2
= 1−
SST SST
◼ For Triple A Construction
15.625
r =
2
= 0.6944
22.5
◼ About 69% of the variability in Y is explained by
the equation based on payroll (X)
Assumptions of the Regression Model
Interpretation
The value of r shows that there is positive correlation between the company
ranking and customer ranking. It also indicates that both company and
customer consider these characteristics important for customer satisfaction
due to which the association between two rankings is positive.