0% found this document useful (0 votes)
17 views34 pages

R Squared HigherOrderPolynomial

The document explains key statistical concepts related to regression analysis, including R, R-Squared, Adjusted R-Squared, and Predicted R-Squared, highlighting their roles in measuring relationships and model performance. It also discusses linear, multiple, and polynomial regression, emphasizing the importance of avoiding overfitting and underfitting through techniques like regularization. Additionally, it covers the bias-variance tradeoff and approaches to mitigate overfitting in regression models.

Uploaded by

Mrclub 3Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views34 pages

R Squared HigherOrderPolynomial

The document explains key statistical concepts related to regression analysis, including R, R-Squared, Adjusted R-Squared, and Predicted R-Squared, highlighting their roles in measuring relationships and model performance. It also discusses linear, multiple, and polynomial regression, emphasizing the importance of avoiding overfitting and underfitting through techniques like regularization. Additionally, it covers the bias-variance tradeoff and approaches to mitigate overfitting in regression models.

Uploaded by

Mrclub 3Money
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

R, R-Squared, Adjusted R-Squared and

Predicted R-Squared

Linear and Multivariate Polynomial


Regression

Dr. S. N. Ahsan
What is R (statistical correlation)?
• R is a correlation coefficient that measures the strength of the
relationship between two variables, as well as the direction on
a scatterplot. The value of r is always between a negative one and a
positive one (-1 and a +1).
What is R-Squared?
• R-Squared (R² or the coefficient of determination) is a statistical measure in a
regression model that determines the proportion of variance in the dependent
variable that can be explained by the independent variable. In other words, r-
squared shows how well the data fit the regression model (the goodness of fit).
• The most common interpretation of r-squared is how well the regression model
explains observed data. For example, an r-squared of 60% reveals that 60% of the
variability observed in the target variable is explained by the regression model.
Generally, a higher r-squared indicates more variability is explained by the model.
• However, it is not always the case that a high r-squared is good for the regression
model. The quality of the statistical measure depends on many factors, such as
the nature of the variables employed in the model, the units of measure of the
variables, and the applied data transformation. Thus, sometimes, a high r-
squared can indicate problems with the regression model.
• A low r-squared figure is generally a bad sign for predictive models. However, in
some cases, a good model may show a small value.
R-Squared (Example)
PROBLEMS WITH R-SQUARED
• Problem 1: Every time you add a predictor to a model, the R-squared
increases, even if due to chance alone. It never decreases.
Consequently, a model with more terms may appear to have a better
fit simply because it has more terms.
• Problem 2: If a model has too many predictors and higher order
polynomials, it begins to model the random noise in the data. This
condition is known as overfitting the model and it produces
misleadingly high R-squared values and a lessened ability to make
predictions.
WHAT IS THE ADJUSTED R-SQUARED?
• The adjusted R-squared compares the explanatory power of regression models
that contain different numbers of predictors.
• Suppose you compare a five-predictor model with a higher R-squared to a one-
predictor model. Does the five predictor model have a higher R-squared because
it’s better? Or is the R-squared higher because it has more predictors? Simply
compare the adjusted R-squared values to find out!
• The adjusted R-squared is a modified version of R-squared that has been adjusted
for the number of predictors in the model. The adjusted R-squared increases only
if the new term improves the model more than would be expected by chance. It
decreases when a predictor improves the model by less than expected by chance.
The adjusted R-squared can be negative, but it’s usually not. It is always lower
than the R-squared.
• In the simplified Best Subsets Regression output below, you can see where the
adjusted R-squared peaks, and then declines. Meanwhile, the R-squared
continues to increase.
WHAT IS THE ADJUSTED R-SQUARED?
• You might want to include only three predictors in this model.
However, an overspecified model (one that's too complex) is more
likely to reduce the precision of coefficient estimates and predicted
values. Consequently, you don’t want to include more terms in the
model than necessary.
WHAT IS THE PREDICTED R-SQUARED?
• The predicted R-squared indicates how well a regression model predicts
responses for new observations. This statistic helps you determine when
the model fits the original data but is less capable of providing valid
predictions for new observations.
• Like adjusted R-squared, predicted R-squared can be negative and it is
always lower than R-squared.
• The predictive R2 is something of an in-house cooked measure, which is
“calculated by systematically removing each observation from the data set,
estimating the regression equation, and determining how well the model
predicts the removed observation”.
• A key benefit of predicted R-squared is that it can prevent you from
overfitting a model. As mentioned earlier, an overfit model contains too
many predictors and it starts to model the random noise.
WHAT IS THE PREDICTED R-SQUARED?
Statistical software calculates
predicted R-squared using the
following procedure:
1. It removes a data point from
the dataset.
2. Calculates the regression
equation.
3. Evaluates how well the
model predicts the missing
observation.
4. And repeats this for all data
points in the dataset.
WHAT IS THE PREDICTED R-SQUARED?
Linear Regression & Multiple Linear Regression
• Linear Regression is the supervised Machine Learning model in which
the model finds the best fit linear line between the independent
and dependent variable i.e it finds the linear relationship between
the dependent and independent variable.
• Whereas, In Multiple Linear Regression there are more than one
independent variables for the model to find the relationship.

• Equation of Multiple Linear Regression, where bo is the intercept,


b1,b2,b3,b4…,bn are coefficients or slopes of the independent
variables x1,x2,x3,x4…,xn and y is the dependent variable.
Linear Regression (Example)
Multiple Linear Regression (Example)
Polynomial Regression
A simple linear regression algorithm only works when the relationship
between the data is linear. But suppose we have non-linear data, then
linear regression will not be able to draw a best-fit line. Simple regression
analysis fails in such conditions. Hence, polynomial regression is used to
overcome this problem, which helps identify the curvilinear relationship
between independent and dependent variables.
Equation of the Polynomial Regression Model
• Simple Linear Regression equation:
y = b0+b1x
• Multiple Linear Regression equation:
y= b0+b1x+ b2x2+ b3x3+....+ bnxn
• Polynomial Regression equation:
y= b0+b1x + b2x2+ b3x3+....+ bnxn
• Multivariate Polynomial Regression equation:
y= b0+b1x1 + b2x2+ b11x12 + b22x22+ b12x1x2 + …..
Polynomial Regression (higher order)
• Polynomial regression is a basic linear regression with a higher order
degree. This higher-order degree allows our equation to fit advanced
relationships, like curves and sudden jumps. As the order increases in
polynomial regression, we increase the chances of overfitting and
creating weak models.
Can Polynomial Regression Be Used For Multiple Variables?
Polynomial regression can be used for multiple independent variables,
which is called multivariate polynomial regression. These equations are
usually very complex but give us more flexibility and higher accuracy
due to utilizing multiple variables in the same equation.

Multiple polynomial regression results of daily energy consumption as a function of outdoor temperature
and humidity (two views in different angles)
Fitting Multiple Polynomial Regression
Polynomial Regression (Underfitting vs. Overfitting)
The plot shows the function that we want to approximate, which is a part of the cosine function. In
addition, the samples from the real function and the approximations of different models are
displayed. The models have polynomial features of different degrees. We can see that a linear
function (polynomial with degree 1) is not sufficient to fit the training samples. This is
called underfitting. A polynomial of degree 4 approximates the true function almost perfectly.
However, for higher degrees the model will overfit the training data, i.e. it learns the noise of the
training data. We evaluate quantitatively overfitting / underfitting by using cross-validation. We
calculate the mean squared error (MSE) on the validation set, the higher, the less likely the model
generalizes correctly from the training data.
Underfitting
• A machine learning model is said to be “underfitting” means that our
model fails to produce good results because of an oversimplified model.
Such a model can neither model the training data nor generalize over new
data. When such a situation occurs, we say that the model has “high bias”.
• “Bias is the difference between the average prediction of our model and
the correct value which we are trying to predict. A model with high bias
pays very little attention to the training data and oversimplifies the model.
It always leads to a high error on training and test data.”
• Hence, an underfit model performs poorly on training as well as testing
data.
Overfitting
• It is the opposite case of underfitting. Here, our model produces good
results on training data but performs poorly on testing data. This
happens because our model fits the training data so well that it leaves
very little or no room for generalization over new data. When
overfitting occurs, we say that the model has “high variance”.
• “Variance is the amount that the estimate of the target function will
change if different training data was used.”
Under & Over fitting (Example)
Training & Testing Error
A model that is underfitted will have high training and high testing
error while an overfit model will have extremely low training error but
a high testing error.
Bias and Variance
Bias Definition: Bias refers to the error introduced by approximating a real-world problem
(which may be complex) by a much simpler model.
• High Bias: Means the model is too simple, underfitting the data. It fails to capture the
underlying patterns. Example: Using a linear model to fit a nonlinear relationship. Effect:
Leads to systematically wrong predictions across the dataset.
Variance Definition: Variance refers to the error introduced by the model's sensitivity to
small fluctuations in the training data.
• High Variance: Means the model is too complex, overfitting the data. It captures noise as
if it were a real pattern. Example: A very deep decision tree on a small dataset. Effect:
Performs very well on training data but poorly on unseen test data.
Bias-Variance Tradeoff
• There is a tradeoff between bias and variance:
• Simple models: High bias, low variance.
• Complex models: Low bias, high variance.
• The goal is to find the sweet spot that minimizes total error, which includes both bias
and variance.
Bias and Variance
Approach to Solving Overfitting Problem
Some of the approaches to solving the problem of overfitting are:
1. Cross-Validation
2. Train with more data
3. Remove features
4. Early stopping
5. Regularization
6. Ensembling
Cross-Validation to Solving Overfitting Problem
Regularization
• The word “regularize” means to make things regular or acceptable.
This is exactly why we use it for. Regularization is a form of regression
used to reduce the error by fitting a function appropriately on the
given training set and avoid overfitting. It discourages the fitting of a
complex model, thus reducing the variance and chances of
overfitting. It is used in the case of multicollinearity (when
independent variables are highly correlated).
• Regularization can be of two kinds, Ridge and Lasso Regression. We
will use loss-function - Residual Sum of Squares (RSS). It can be
mathematically given as:
Ridge Regression / L2 Regularization
• In this regression, we add a penalty term to the RSS loss function. Our
modified loss function now becomes:

• Here, λ is called the “tuning parameter” which decides how heavily we


want to penalize the flexibility of our model. If we look closely, we might
observe that if λ=0, it performs like linear regression and as λ→inf, the
impact of the shrinkage penalty grows, and the ridge regression coefficient
estimates will approach zero. As can be seen, selecting a good value of λ is
critical. The coefficient estimates produced by this method are sometimes
also known as the “L2 norm”.
Lasso Regression / L1 Regularization
• This regression adopts the same idea as Ridge Regression with a
change in the penalty term. In statistics, this is sometimes called
the “L1 norm”.

You might also like