Session 19&20
Session 19&20
Introduction
The study of relationships
between variables.
There are two potential objectives:
Drawing
scatterplots is a
good way to begin
regression analysis.
Graphical plot of two
variables
If there is any
relationship
between the two
variables, it is usually
apparent from the
scatterplot.
Example: Sales versus Promotions
at Pharmex
Objective: To use a scatterplot to examine the relationship between
promotional expenditures and sales at Pharmex.
Solution: Pharmex has collected data from 50 randomly selected
metropolitan regions.
There are two variables: Pharmex’s promotional expenditures as a
percentage of those of the leading competitor (“Promote”) and Pharmex’s
sales as a percentage of those of the leading competitor (“Sales”).
Example 2: Explaining Overhead
Costs at Bendrix
Objective: To use scatterplots to examine the relationships
among overhead, machine hours, and production runs at Bendrix.
Solution: Data file contains observations of overhead costs,
machine hours, and number of production runs at Bendrix.
Each observation (row) corresponds to a single month.
Example 2: Explaining Overhead
Costs at Bendrix
Examine scatterplots between each explanatory
variable (Machine Hours and Production Runs) and
the dependent variable (Overhead).
Example 2: Explaining Overhead
Costs at Bendrix
Check for possible time series patterns, by creating a time series graph
for any of the variables.
Check for relationships among the multiple explanatory variables
(Machine Hours versus Production Runs).
Linear versus Nonlinear
Relationships
Scatterplots: for
detecting
relationships, not
obvious otherwise.
Hope to see is a
linear, relationship.
This doesn’t mean
that all points lie on a
straight line, but that
the points tend to
cluster around a
straight line.
This scatterplot
illustrates a nonlinear
relationship.
Outliers
Occasionally, the
variance of the DV
depends on the value
of the explanatory
variable (IV).
Unequal variance is a
violation of the linear
regression
assumptions, but there
are ways to deal with it.
Robust regression,
weighted least square
regression, Use log or
square root transform Y
No Relationship
Another concept from Descriptive
Stat.
Correlations are numerical summary
measures that indicate the strength of linear
relationships between pairs of variables.
summarizes the information in a scatterplot.
linear relationships only.
The residual is actual (observed) minus
fitted (predicted) value.
Fundamental Equation for Regression:
Regression summary
Adj R-
Multiple R R-square Square Std Error
0.673 0.453 0.442 7.395
ANOVA table
df SS MS F p-value
2172.88039 2172.88039
Regression 1 2 2 39.737 0.000
2624.73960 54.6820751
Error 48 8 6
Total 49 4797.62
Several types of
explanatory variables
can be included in
regression equations:
Dummy variables
Interaction variables
Nonlinear
transformations
These techniques
produce much better
fits than you could
obtain without them.
Dummy Variables
ANOVA table
df SS MS F p-value
708085273 708085273
Regression 1 .8 .8 94.748 0.000
254093815 7473347.5
Error 34 .2 06
Total 35 962179089