Stepwise Regression
Stepwise Regression
In the stepwise regression procedure regression model is formulated from a set of candidate
regressor variables by entering and removing regressors — in a stepwise manner — into the
model until there is no justifiable reason to enter or remove any more.
The list of candidate regressor variables must include all of the variables that actually predict
the response. Otherwise, one is sure to end up with a regression model that is underspecified
and therefore misleading.
First, start with no regressors in the "stepwise model." Then, at each step along the way either
enter or remove a regressor based on the t-tests for the slope parameters. Stop when no more
regressors can be justifiably entered or removed from stepwise model, thereby leading to a
"final model."
The first thing that needs to be done is set a significance level for deciding when to enter a
regressor into the stepwise model. Call this the Alpha-to-Enter significance level and denote
it by E . Also set a significance level for deciding when to remove a regressor from the
stepwise model. Call this the Alpha-to-Remove significance level and denote it by R . That
is, first:
• Specify an Alpha-to-Enter significance level. This will typically be greater than the
usual 0.05 level so that it is not too difficult to enter regressors into the model. Set this
significance level by default to E = 0.15.
• Specify an Alpha-to-Remove significance level. This will typically be greater than the
usual 0.05 level so that it is not too easy to remove regressors from the model. Set this
significance level by default to R = 0.15.
STEP 1
Once the starting significance levels have been specified, then :
2. Of those regressors whose t-test’s p-value is less than E = 0.15, the first regressor put
in the stepwise model is the regressor that has the smallest t-test’s p-value.
3. If no regressor has a t-test’s p-value less than E = 0.15, stop.
STEP 2
Suppose X2 had the smallest t-test P-value below E = 0.15 and therefore was deemed the
and Xk .
2. Of those regressors whose t-test’s p-value is less than E = 0.15, the second regressor
put in the stepwise model is the regressor that has the smallest t-test’s p-value.
3. If no regressor has a t-test’s p-value less than E = 0.15, stop. The model with the one
entering X3 into the stepwise model somehow affected the significance of the X2
- regressor. That is, check the t-test’s p-value for testing 2 = 0 . If the t-test’s p-value
for 2 = 0 has the p-value greater than R = 0.15 — remove X2 from the stepwise
model.
STEP 3
1. Suppose both X2 and X3 made it into the two-regressor stepwise model and remained
there.
2. Now, fit each of the three- regressor models that include X2 and X3 as regressors —
regress Y on X2 , X3 , and Xk .
3. Of those regressors whose t-test’s p-value is less than E =0.15, the third regressor put
in the stepwise model is the regressor that has the smallest t-test p-value.
4. If no regressor has a t-test p-value less than E = 0.15, stop. The model containing the
two regressors obtained from the second step is the final model.
5. But, suppose instead that X4 was deemed the "best" third regressor and it is therefore
entering X3 into the stepwise model somehow affected the significance of the X2 and
X3 regressors. That is, check the t-test’s p-values for testing 2 = 0 and 3 = 0 . If
the t-test’s P-value for either 2 = 0 or 3 = 0 . has the p-value greater than R
STEP 4
Continue the steps as described above until adding an additional regressor does not yield a t-
test’s p-value below E =0.15.