0% found this document useful (0 votes)
57 views37 pages

14 Statistics and Probability

The learning objectives of this chapter are to teach students how to: 1) Develop simple and multiple linear regression models to predict dependent variables from independent variables 2) Interpret regression results such as slope, intercept, and statistical tests 3) Evaluate the fit of regression models and address assumptions like nonlinearity and outliers

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views37 pages

14 Statistics and Probability

The learning objectives of this chapter are to teach students how to: 1) Develop simple and multiple linear regression models to predict dependent variables from independent variables 2) Interpret regression results such as slope, intercept, and statistical tests 3) Evaluate the fit of regression models and address assumptions like nonlinearity and outliers

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Learning Objectives

After completing this chapter, students will be able to:

1. Identify variables and use them in a regression model


2. Develop simple linear regression equations from sample data and interpret the
slope and intercept
3. Compute the coefficient of determination and the coefficient of correlation and
interpret their meanings
4. Interpret the F-test in a linear regression model
5. List the assumptions used in regression and use residual plots to identify
problems
Learning Objectives
After completing this chapter, students will be able to:

6. Develop a multiple regression model and use it to predict


7. Use dummy variables to model categorical data
8. Determine which variables should be included in a multiple regression model
9. Transform a nonlinear function into a linear one for use in regression
10. Understand and avoid common mistakes made in the use of regression analysis
Chapter Outline

1) Introduction
2) Scatter Diagrams
3) Simple Linear Regression
4) Measuring the Fit of the Regression Model
5) Assumptions of the Regression Model
Introduction
• Regression analysis is a very valuable tool for a manager
• Regression can be used to
•Understand the relationship between variables
•Predict the value of one variable based on another variable
• Examples
•Determining best location for a new store
•Studying the effectiveness of advertising dollars in increasing sales
volume
• The variable to be predicted is called the dependent variable
•Sometimes called the response variable
• The value of this variable depends on the value of the independent
variable
•Sometimes called the explanatory or predictor variable
Dependent Independent Independent
variable
= variable
+ variable
Scatter Diagram
• Graphing is a helpful way to investigate the relationship between
variables.
• A scatter diagram or scatter plot is often used for that relation.
• The independent variable is normally plotted on the X axis.
• The dependent variable is normally plotted on the Y axis.
Triple A Construction
• Triple A Construction renovates old homes.
• They have found that the dollar volume of renovation work is dependent
on the area payroll.

TRIPLE A’S SALES LOCAL PAYROLL


($100,000’s) ($100,000,000’s)
6 3
8 4
9 6
5 4
4.5 2
9.5 5
Table 1
Triple A Construction
12 –

10 –

8–
Sales ($100,000)

6–

4–

2–

0– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)

Figure 1
Simple Linear Regression
◼ Regression models are used to test if there is a relationship between
variables (predict sales based on payroll)
◼ There is some random error that cannot be predicted

Y =  0 + 1X + 
where
Y = dependent variable (response)
X = independent variable (predictor or
explanatory)
𝜷0 = intercept (value of Y when X = 0)
𝜷 1 = slope of the regression line
e = random error
Simple Linear Regression

◼ True values for the slope and intercept are not known so they are
estimated using sample data

Yˆ = b0 + b1 X

where
^
Y = dependent variable (response)
X = independent variable (predictor or explanatory)
b0 = intercept (value of Y when X = 0)
b1 = slope of the regression line
Example; Triple A Construction

• Triple A Construction is trying to predict sales based on area payroll

Y = Sales
X = Area payroll
◼ The line chosen in Figure 4.1 is the one that minimizes the errors

Error = (Actual value) – (Predicted value)


e = Y − Yˆ
Least Squares Regression
Errors can be positive or negative so the average error could be zero even
though individual errors could be large.
Least squares regression minimizes the sum of the squared errors.

Payroll Line Fit Plot


10
($100,000)

8
Sales

6
4
2
0
0 2 4 6 8
Payroll ($100.000,000's)
• For the simple linear regression model, the values of the intercept and
slope can be calculated using the formulas below

Yˆ = b0 + b1 X

X=
 X
= average (mean) of X values
n

Y=
 Y
= average (mean) of Y values
n
b1 =
 ( X − X )(Y − Y )
(X − X ) 2

b0 = Y − b1 X
• Regression calculations

Y X (X – X)2 (X – X)(Y – Y)
6 3 (3 – 4)2 = 1 (3 – 4)(6 – 7) = 1
8 4 (4 – 4)2 = 0 (4 – 4)(8 – 7) = 0
9 6 (6 –𝜷4)2 = 4 (6 – 4)(9 – 7) = 4
5 4 (4 – 4)2 = 0 (4 – 4)(5 – 7) = 0
4.5 2 (2 – 4)2 = 4 (2 – 4)(4.5 – 7) = 5
9.5 5 (5 – 4)2 = 1 (5 – 4)(9.5 – 7) = 2.5
ΣY = 42 ΣX = 24 Σ(X – X)2 = 10 Σ(X – X)(Y – Y) = 12.5
Y = 42/6 = 7 X = 24/6 = 4

Table 2
• Regression calculations

X=
 X 24
= =4
6 6

Y=
 Y 42
= =7
6 6

b1 =
 ( X − X )(Y − Y ) 12.5
= = 1.25
(X − X ) 2
10

b0 = Y − b1 X = 7 − (1.25 )( 4 ) = 2

Therefore Yˆ = 2 + 1.25 X
• Regression calculations sales = 2 + 1.25(payroll)

X=
 X 24
= =4 If the payroll next
6 6 year is $600 million

Y=
 Y 42
= =7 Yˆ = 2 + 1.25(6) = 9.5 or $ 950,000
6 6

b1 =
 ( X − X )(Y − Y ) 12.5
= = 1.25
(X − X ) 2
10

b0 = Y − b1 X = 7 − (1.25 )( 4 ) = 2

Therefore Yˆ = 2 + 1.25 X
Measuring the Fit
of the Regression Model
◼ Regression models can be developed for any variables X and Y
◼ How do we know the model is actually helpful in predicting Y based on X?
◼ We could just take the average error, but the positive and negative
errors would cancel each other out
◼ Three measures of variability are
◼ SST – Total variability about the mean
◼ SSE – Variability about the regression line
◼ SSR – Total variability that is explained by the model
◼ Sum of the squares total
SST =  (Y − Y )2

◼ Sum of the squared error


SSE =  e 2 =  (Y − Yˆ )2

◼ Sum of squares due to regression


SSR =  (Yˆ − Y )2

◼ An important relationship
SST = SSR + SSE
Y X (Y – Y)2 Y^ ^ 2
(Y – Y) (Y^ – Y)2
6 3 (6 – 7)2 = 1 2 + 1.25(3) = 5.75 0.0625 1.563

8 4 (8 – 7)2 = 1 2 + 1.25(4) = 7.00 1 0

9 6 (9 – 7)2 = 4 2 + 1.25(6) = 9.50 0.25 6.25

5 4 (5 – 7)2 = 4 2 + 1.25(4) = 7.00 4 0

4.5 2 (4.5 – 7)2 = 6.25 2 + 1.25(2) = 4.50 0 6.25

9.5 5 (9.5 – 7)2 = 6.25 2 + 1.25(5) = 8.25 1.5625 1.563


^2
∑(Y – Y)2 = 22.5 ∑(Y – Y) = 6.875 ∑(Y^ – Y)2 = 15.625
Y=7 SST = 22.5 SSE = 6.875 SSR = 15.625

Table 3
For Triple A Construction

◼ Sum of the squares total SST = 22.5


SSE = 6.875
SST =  (Y − Y )2
SSR = 15.625
◼ Sum of the squared error
SSE =  e 2 =  (Y − Yˆ )2

◼ Sum of squares due to regression


SSR =  (Yˆ − Y )2

◼ An important relationship
SST = SSR + SSE
◼ SSR – explained variability
◼ SSE – unexplained variability
12 – ^
Y = 2 + 1.25X
10 –
^
Y–Y
Sales ($100,000) 8– ^
Y–Y
Y–Y Y
6–

4–

2–

0– | | | | | | | |
0 1 2 3 4 5 6 7 8
Payroll ($100 million)
Figure 4.2
Coefficient of Determination
•The proportion of the variability in Y explained by regression equation is
called the coefficient of determination
•The coefficient of determination is r2
SSR SSE
r = 2
= 1−
SST SST
◼ For Triple A Construction
15.625
r =
2
= 0.6944
22.5
◼ About 69% of the variability in Y is explained by
the equation based on payroll (X)
Assumptions of the Regression Model

◼ If we make certain assumptions about the errors in a regression model,


we can perform statistical tests to determine if the model is useful
1. Errors are independent
2. Errors are normally distributed
3. Errors have a mean of zero
4. Errors have a constant variance
◼ A plot of the residuals (errors) will often highlight any glaring violations
of the assumption
REGRESSION LINE OF X ON Y
The line which expresses the trend of two observed values is called a
regression line. For example if the sample data is given then the value
of 𝒚 corresponding to the given value of 𝒙 can be estimated by the
method of least squares. Now because the value of 𝒚 is estimated
from given value of 𝒙 therefore the resulting line is called regression line
of 𝒚 on 𝒙 which means that 𝒚 is dependent on 𝒙. The general equation of
𝒚 on 𝒙 is
𝒀 = 𝒂 + 𝒃𝒙
𝒏 σ 𝒙𝒚− σ 𝒚 σ 𝒙 σ𝒚 σ𝒙
where 𝒃 = ഥ − 𝒃ഥ
and 𝐚 = 𝒚 𝒙= −𝒃
𝒏 σ 𝒙𝟐 − σ 𝒙 𝟐 𝒏 𝒏
Question
Problem: The following table shows the chart of price and demand for an
item at different periods of time.
I. Forecast demand for the price of $ 25
• Solution:
Correlation
Correlation measures the degree of interdependence (association)
between two variables. If two variables are so related that an increase or
decrease of one is found in connection with increase or decrease of the
other, then the two variables are said to be correlated. Here it is
important to note that there might be a similar movement between two
variables such as automobile sales and demand for shoes. But these two
variables have no connection due to which the calculation for these two
variables is wrong because it does not make any sense. Therefore care
must be taken that the two variables have some connection before a
calculation can make sense.
Correlation Coefficient
• The correlation coefficient gives a mathematical value for measuring
the strength of the linear relationship between two variables.
• r lies between -1 and +1
• +1 indicates perfect positive relation
• -1 indicates perfect negative relation
• 0 shows no correlation
Pearson product moment Correlation
Coefficient
• The formula for calculating linear correlation coefficient is called
product-moment formula presented by Karl Pearson. Therefore it is also
called Pearsonian coefficient of correlation. The formula is given as:
Properties of Coefficient of Correlation
• Coefficient of correlation lies between -1 and +1,i.e. −𝟏 ≤ 𝒓 ≤ +𝟏.
• Coefficients of correlation are independent of change of origin and scale.
• Coefficients of correlation possess the property of symmetry. i.e. 𝒓𝒙𝒚 = 𝒓𝒚𝒙
• The coefficient of correlation is a geometric mean of two regression
coefficient. 𝐫 = ± 𝒃𝒙𝒚 . 𝒃𝒚𝒙
𝐫 = + 𝒃𝒙𝒚 . 𝒃𝒚𝒙 , if 𝒃𝒙𝒚 𝒂𝒏𝒅 𝒃𝒚𝒙 are positive.
𝐫 = − 𝒃𝒙𝒚 . 𝒃𝒚𝒙 , if 𝒃𝒙𝒚 𝒂𝒏𝒅 𝒃𝒚𝒙 are negative.
• Note: Correlation is the geometric mean of absolute values of
two regression coefficients i.e.
Question
Solution
Rank Correlation
• Rank correlation is used in a situation when the variable under
consideration is not measurable e.g. intelligence, knowledge,
experience, beauty etc. Such types of variables are judged by two
different people or by two procedures. Therefore it is necessary to find
the correlation between judgment of two person or two procedures.
For this purpose observations are ranked and the method is called rank
correlation coefficient.
• The formula for calculating rank correlation is given as:
Question
Solution

Interpretation
The value of r shows that there is positive correlation between the company
ranking and customer ranking. It also indicates that both company and
customer consider these characteristics important for customer satisfaction
due to which the association between two rankings is positive.

You might also like