Evans - Analytics2e - PPT - 07 and 08
Evans - Analytics2e - PPT - 07 and 08
Statistical Inference
Chapter 8
Trendlines and Regression
Analysis
Statistical Inference
Statistical inference (thống kê suy luận) focuses
on drawing conclusions about populations from
samples.
◦ Statistical inference includes estimation of population
parameters (tham số) and hypothesis testing, which
involves drawing conclusions about the value of the
parameters of one or more populations.
Hypothesis Testing
Hypothesis testing involves drawing inferences about
two contrasting propositions (each called a hypothesis)
relating to the value of one or more population
parameters.
H0: Null hypothesis: describes an existing theory
H1: Alternative hypothesis: the complement of H0
Using sample data, we either:
- reject H0 and conclude the sample data provides
sufficient evidence to support H1, or
- fail to reject H0 and conclude the sample data
does not support H1.
Example 7.1: A Legal Analogy for Hypothesis
Testing
In the U.S. legal system, a defendant is innocent
until proven guilty.
◦ H0: Innocent
◦ H1: Guilty
If evidence (sample data) strongly indicates the
defendant is guilty, then we reject H0.
Note that we have not proven guilt or innocence!
Hypothesis Testing Procedure
Linear y = a + bx
Linear functions show steady increases
in predictive models.
It is easy to understand, and over small
independent variable.
Multiple regression involves two or more
independent variables.
Simple Linear Regression
Finds a linear relationship between:
- one independent variable X and
- one dependent variable Y
First prepare a scatter plot to verify the data has a
linear trend.
Use alternative approaches if the data is not linear.
Example 8.3: Home Market Value Data
Size of a house is typically
related to its market value.
X = square footage
Y = market value ($)
The scatter plot of the full
data set (42 homes)
indicates a linear trend.
Least-Squares Regression
Simple linear regression model:
There is no problem of
Example 8.14: Identifying Potential Multicollinearity
Add 3 columns to
the data, one for
each of the tool
type variables
Example 8.17 Continued
Regression results