Machine Learning
Unit 3
Regularization of Linear Models
Course: Machine Learning TY B. Tech(CSIT)
Faculty: Mrs. S. P. Patil, IT Dept., RIT
Errors in Machine Learning
Figure 1: Errors in Machine Learning
Src: [Link]
Bias
■Bias is the difference between our actual and predicted
values.
■Bias is the simple assumptions that our model makes
about our data to be able to predict new data.
Figure 2: Bias
Bias
■Q. When bias is high, what will happen?
Figure 2: Bias
Bias
■ Q. When bias is high, what will happen?
■ Assumptions made by our model are too basic,
■ The model can’t capture the important features
of our data.
■ The model hasn’t captured patterns in the
training data
■ It cannot perform well on the testing data too.
■ If this is the case, our model cannot perform on
new data and cannot be sent into production.
■ Underfitting [Link]
Variance
■ variance as the model’s sensitivity to fluctuations in the data.
■ model may learn from noise.
■ During training, it allows our model to ‘see’ the data a certain number of times to
find patterns in it.
■ It will capture most patterns in the data, but it will also learn from the
unnecessary data present, or from the noise.
Fig. Example of Variance
Variance
■ The model will perform really well on testing data and get high accuracy
■ but will fail to perform on new, unseen data.
■ New data may not have the exact same features and the model won’t be able to
predict it very well.
■ Overfitting.
Fig. Over-fitted model where we see model performance on, a) training data b) new data
Bias-Variance Tradeoff
■ Find the perfect balance between Bias
and variance
■ Ensures that we capture the essential
patterns in our model while ignoring
the noise present it in. This is called
Bias-Variance Tradeoff.
■ It helps optimize the error in our
model and keeps it as low as possible.
■ An optimized model: sensitive to the
patterns in our data, but will be able to
generalize to new data.
■ Both the bias and variance should be
Fig. Error in Training and Testing with high
low so as to prevent overfitting and Bias and Variance
underfitting.
Bull’s Eye Graph for Bias and
Variance
Different Combinations of Bias-Variance
■ Low-Bias, Low-Variance: The
combination of low bias and low variance
shows an ideal machine learning model.
However, it is not possible practically.
■ Low-Bias, High-Variance: With low bias
and high variance, model predictions are
inconsistent and accurate on average.
This case occurs when the model learns
with a large number of parameters and
hence leads to an overfitting
■ High-Bias, Low-Variance: With High
bias and low variance, predictions are
consistent but inaccurate on average. This
case occurs when a model does not learn
well with the training dataset or uses few
numbers of the parameter. It leads to
underfitting problems in the model.
■ High-Bias, High-Variance: With high
bias and high variance, predictions are
inconsistent and also inaccurate on
average.
Ways to reduce High Bias
■ High bias mainly occurs due to a much simple model.
Below are some ways to reduce the high bias:
• Increase the input features as the model is underfitted.
• Decrease the regularization term.
• Use more complex models, such as including some
polynomial features.
Ways to Reduce High Variance
• Reduce the input features or number of parameters as a
model is overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
What is Polynomial Regression
■Regularize a polynomial model : Reduce the number of
polynomial degrees.
■Regularization : by constraining the weights of the model
■Ridge Regression, Lasso Regression, and Elastic Net : use
different types of regularization for
Ridge Regression
■Tikhonov regularization
■a regularization term equal to is added to the cost
function. (during training)
■fit the data , keep the model weights as small as possible.
■The hyperparameter α controls how much you want to
regularize the model.
– If α = 0, then Ridge Regression is just Linear Regression.
– If α is very large, then all weights end up very close to
zero and the result is a flat line going through the data’s
mean
Ridge Regression
■Ridge Regression cost function
– bias term θ0 is not regularized
Fig. linear model (left) and a polynomial model (right), both with various levels of
Ridge regularization
Ridge Regression
■increasing α leads to flatter (i.e., less extreme, more
reasonable) predictions, thus reducing the model’s variance
but increasing its bias.
■Ridge Regression closed-form solution
Ridge Regression
■ >>> from sklearn.linear_model import Ridge
■ >>> ridge_reg = Ridge(alpha=1, solver="cholesky")
■ >>> ridge_reg.fit(X, y)
■ >>> ridge_reg.predict([[1.5]])
■ array([[1.55071465]])
■ And using Stochastic Gradient Descent:
■ >>> sgd_reg = SGDRegressor(penalty="l2")
■ >>> sgd_reg.fit(X, [Link]())
■ >>> sgd_reg.predict([[1.5]])
■ array([1.47012588])
Lasso Regression
■Least Absolute Shrinkage and Selection Operator Regression
■it adds a regularization term to the cost function, but it uses
the ℓ1 norm of the weight vector instead of half the square of
the ℓ2 norm
■Lasso Regression cost function
■ it tends to eliminate the weights of the least important features (i.e., set them to zero).
For
Lasso Regression
■Least Absolute Shrinkage and Selection Operator Regression
■ from sklearn.linear_model import Lasso
■ >>> lasso_reg = Lasso(alpha=0.1)
■ >>> lasso_reg.fit(X, y)
■ >>> lasso_reg.predict([[1.5]])
■ array([1.53788174]
References
■ Textbook : Chapter 4
■ [Link]