0% found this document useful (0 votes)
6 views20 pages

Session 3 - Chapter 06 Linear Reg

Chapter 6 discusses multiple linear regression, focusing on the distinction between explanatory and predictive modeling. It emphasizes the importance of selecting a subset of predictors to enhance model accuracy and robustness, using the example of predicting used Toyota Corolla prices. The chapter also outlines various methods for variable selection and the significance of metrics like AIC and BIC in model evaluation.

Uploaded by

tejaasbaid25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

Session 3 - Chapter 06 Linear Reg

Chapter 6 discusses multiple linear regression, focusing on the distinction between explanatory and predictive modeling. It emphasizes the importance of selecting a subset of predictors to enhance model accuracy and robustness, using the example of predicting used Toyota Corolla prices. The chapter also outlines various methods for variable selection and the significance of metrics like AIC and BIC in model evaluation.

Uploaded by

tejaasbaid25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Chapter 6: Multiple Linear

Regression
We assume a linear relationship
between predictors and outcome:

outcome coefficients

constant error (noise)

predictors
Topics
• Explanatory vs. predictive modeling with regression
• Example: prices of Toyota Corollas
• Fitting a predictive model
• Assessing predictive accuracy
• Selecting a subset of predictors
Explanatory Modeling
Goal: Explain relationship between predictors
(explanatory variables) and target

● Familiar use of regression in data analysis

● Model Goal: Fit the data well and understand the


contribution of explanatory variables to the model

● Metrics: “goodness-of-fit” - R2, residual analysis, p-


values
Predictive Modeling
Goal: predict target values in other data where we have
predictor values, but not target values

● Classic data mining context


● Model Goal: Optimize predictive accuracy
● Train model on training data
● Assess performance on validation (hold-out) data
● Explaining role of predictors is not primary purpose
(but useful)
Explanatory vs. Predictive Modeling
1. A good explanatory model: fits the data closely
2. A good predictive model: predicts new records accurately
3. In explanatory models, the entire dataset is used for estimating the
best- fit model.
4. For predictive models, the data are typically split into a training set
and a validation set.
5. Performance measures for explanatory models measure how close
the data fit the model, whereas in predictive models, performance
is measured by predictive accuracy.
6. In explanatory models the focus is on the coefficients (β), whereas
in predictive models the focus is on the predictions (yˆ).
It is extremely important to know the goal of the analysis before
beginning the modeling process.
Estimating Regression Equation and
Prediction
• Predictions are best if they will be unbiased (if its expected
value is equal to the true value of the parameter)
• It will have the smallest mean squared error compared to any
unbiased estimates if we make the following assumptions:
1. Linearity: The relationship between X and Y is linear.
2. Independence of errors: There is not a relationship between
the residuals and the variable.
3. Normality of errors: The residuals are approximately normally
distributed.
4. Equal variances: The variance of the residuals is the consistent
for all values of X.
Example: Prices of Toyota Corolla
Problem Statement: A large Toyota car dealership offers purchasers
of new Toyota cars the option to buy their used car as part of a trade-
in. In particular, a new promotion promises to pay high prices for
used Toyota Corolla cars for purchasers of a new car. The dealer then
sells the used cars for a small profit. To ensure a reasonable profit,
the dealer needs to be able to predict the price that the dealership
will get for the used cars.

Goal: predict prices of used Toyota Corollas based on their


specification
Data: Prices of used Toyota Corollas, with their specification
information
Variables Used
Price in Euros
Age in months as of 8/04
KM (kilometers)
Fuel Type (diesel, petrol, CNG)
HP (horsepower)
Metallic color (1=yes, 0=no)
Automatic transmission (1=yes, 0=no)
CC (cylinder volume)
Doors
Quarterly_Tax (road tax)
Weight (in kg)
Distribution of Residuals (Holdout
Set)

Symmetric distribution
A few outliers
Observations
• Note that the mean error (ME) is $ 19.6 and RMSE = $1325.
• A histogram of the residuals shows that most of the errors
are between ±$2000.
• This error magnitude might be small relative to the car
price but should be taken into account when considering
the profit.
• Measures such as the mean error, and error percentiles are
used to assess the predictive performance of a model and
to compare models.
Variable Selection: Reducing No. of
Predictors
Why bother to select a subset? Can we use use all the variables in the
model?
• A previously hidden relationship might emerge.
• Ex: a company found that customers who had purchased anti-scuff
protectors for chair and table legs had lower credit risks.
Subset selection is needed.
• It may be expensive or not feasible to collect a full complement of
predictors.
• We may be able to measure fewer predictors more accurately.
• The more predictors, the higher the chance of missing values in the data.
• Parsimony is an important property of good models. We obtain more
insight into the influence of predictors in models with few parameters.
Variable Selection: Reducing No. of
Predictors
• One very rough rule of thumb is to have a number of records n
larger than 5(p + 2), where p is the number of predictors.
• Using predictors that are uncorrelated with the outcome
variable increases the variance of predictions.
• Dropping predictors that are actually correlated with the
outcome variable can increase the average error (bias) of
predictions.
• There is a trade-off between too few and too many predictors.
In general, accepting some bias can reduce the variance in
predictions.
• Methods for reducing the number of predictors p to a smaller
set are often used.
Feature (Variable, Predictor) Selection
• Why select a subset of attributes to predict the target?
• More predictors/attributes problems:
• Expensive data collection
• More missing data
• Multicollinearity – some predictors behave the same way
• Uncorrelation with target variable
• The goal
• Find parsimonious model (simplest model that performs
sufficiently well)
• More robust & higher predictive accuracy
• Variable selection methods
• Exhaustive search
• Partial Subset selection: Forward
• Partial Subset selection: Backward
• Partial Subset selection: Stepwise
Exhaustive Search = Best Subset
● All possible subsets of predictors assessed (single, pairs, triplets, etc.)
● Computationally intensive, not feasible for big data
● Judge by “adjusted R2”
● Adjusted R2 uses a penalty over number of predictors
● It avoids the artificial increase of variance when you increase no. of
predictors without compromising the information (or variance).

Penalty for
number of
predictors
Specialized Metrics Used in Regression
(lower values are better)
Criteria for balancing over-fitting and under-fitting:
• Akaike Information Criterion (AIC)
• AIC = n ln(SSE/n) + n(1 + ln(2π)) + 2(p + 1)
• Bayesian Information Criterion (BIC)
• BIC = n ln(SSE/n) + n(1 + ln(2π)) + ln(n)(p + 1)
• AIC and BIC measure the goodness of fit of a model, but also
include a penalty that is a function of the number of parameters
in the model.
• smaller AIC and BIC values are considered better.
• Mallow’s Cp
• Cp = SSE/σ2full + 2(p+1) - n
• Mallow’s Cp is equivalent to AIC for large samples
Exhaustive output shows best model for each
number of predictors
sum$which
(Intercept) Age_08_04 KM HP Met_Color Auto CC Doors Q_Tax Weight Diesel Petrol
1 TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2 TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
3 TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
4 TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
5 TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
6 TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
7 TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
8 TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
9 TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
10 TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
11 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Each row is the best model for a given # of predictors, “TRUE” and
“FALSE” show whether the variable is included
Adjusted R2 and CP for the models with 1 predictor, 2 predictors, 3
predictors, etc. (exhaustive search method)

> sum$adjr2
[1] 0.753 0.794 0.843 0.862 0.865 0.868 0.869 0.868 0.868
0.868 0.868
> sum$cp
[1] 520.44 333.23 114.69 28.75 18.29 4.16 3.96
5.26 7.08 9.01 11.00

Metrics improve until you hit 6-7 predictors, then stabilize,


so choose model with 6-7 predictors
Exhaustive search may be computationally
infeasible - some alternatives:

FORWARD SELECTION
●Start with no predictors
●Add them one by one (add the one with largest contribution)
●Stop when the addition is not statistically significant

BACKWARD ELIMINATION
●Start with all predictors
●Successively eliminate least useful predictors one by one
●Stop when all remaining predictors have statistically significant
contribution

STEPWISE
● Like Forward Selection
● Except at each step, also consider dropping non-significant predictors
Summary
●Linear regression models are very popular tools, not only for
explanatory modeling, but also for prediction
●A good predictive model has high predictive accuracy (to a
useful practical level)
●Predictive models are fit to training data, and predictive
accuracy is evaluated on a separate validation data set
●Removing redundant predictors is key to achieving
predictive accuracy and robustness
●Subset selection help find “good” candidate models.

You might also like