0% found this document useful (0 votes)
152 views

What Is LASSO Regression Definition, Examples and Techniques

This document provides an overview of LASSO (Least Absolute Shrinkage and Selection Operator) regression. It discusses that LASSO regression uses L1 regularization to perform both variable selection and regularization to avoid overfitting. It describes the mathematical formulation of LASSO regression which minimizes residual sum of squares while adding a penalty term proportional to the absolute value of coefficients. This forces some coefficients to become exactly zero, allowing automatic feature selection. The document also includes an example of implementing LASSO regression in Python scikit-learn.

Uploaded by

Sudip Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

What Is LASSO Regression Definition, Examples and Techniques

This document provides an overview of LASSO (Least Absolute Shrinkage and Selection Operator) regression. It discusses that LASSO regression uses L1 regularization to perform both variable selection and regularization to avoid overfitting. It describes the mathematical formulation of LASSO regression which minimizes residual sum of squares while adding a penalty term proportional to the absolute value of coefficients. This forces some coefficients to become exactly zero, allowing automatic feature selection. The document also includes an example of implementing LASSO regression in Python scikit-learn.

Uploaded by

Sudip Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

BLOG Courses

Data Science and Business Analytics Articles Tutorials In

Great Learning Blog AI and Machine Learning Machine Learning

A Complete understanding of LASSO Regression


By Great Learning Team / Published on May 30, 2023 / 81978

Table of contents

Contributed by: Dinesh Kumar

Introduction
LASSO regression, also known as L1 regularization, is a popular technique used in statistical
modeling and machine learning to estimate the relationships between variables and make
predictions. LASSO stands for Least Absolute Shrinkage and Selection Operator.

The primary goal of LASSO regression is to find a balance between model simplicity and
accuracy. It achieves this by adding a penalty term to the traditional linear regression model,
which encourages sparse solutions where some coefficients are forced to be exactly zero.
This feature makes LASSO particularly useful for feature selection, as it can automatically
identify and discard irrelevant or redundant variables.

What is Lasso Regression?


Lasso regression is a regularization technique. It is used over regression methods for a more
accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk
towards a central point as the mean. The lasso procedure encourages simple, sparse models
(i.e. models with fewer parameters). This particular type of regression is well-suited for
models showing high levels of multicollinearity or when you want to automate certain parts
of model selection, like variable selection/parameter elimination.
Lasso Regression uses L1 regularization technique (will be discussed later in this article). It is
used when we have more features because it automatically performs feature selection.

Here’s a step-by-step explanation of how LASSO regression works:

1. Linear regression model: LASSO regression starts with the standard linear regression model,
which assumes a linear relationship between the independent variables (features) and the
dependent variable (target). The linear regression equation can be represented as
follows:makefileCopy codey = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε Where:
y is the dependent variable (target).
β₀, β₁, β₂, ..., βₚ are the coefficients (parameters) to be estimated.
x₁, x₂, ..., xₚ are the independent variables (features).
ε represents the error term.
2. L1 regularization: LASSO regression introduces an additional penalty term based on the absolute
values of the coefficients. The L1 regularization term is the sum of the absolute values of the
coefficients multiplied by a tuning parameter λ:scssCopy codeL₁ = λ * (|β₁| + |β₂| +
... + |βₚ|) Where:
λ is the regularization parameter that controls the amount of regularization applied.
β₁, β₂, ..., βₚ are the coefficients.
3. Objective function: The objective of LASSO regression is to find the values of the coefficients that
minimize the sum of the squared differences between the predicted values and the actual
values, while also minimizing the L1 regularization term:makefileCopy codeMinimize: RSS +
L₁ Where:
RSS is the residual sum of squares, which measures the error between the predicted values
and the actual values.
4. Shrinking coefficients: By adding the L1 regularization term, LASSO regression can shrink the
coefficients towards zero. When λ is sufficiently large, some coefficients are driven to exactly
zero. This property of LASSO makes it useful for feature selection, as the variables with zero
coefficients are effectively removed from the model.
5. Tuning parameter λ: The choice of the regularization parameter λ is crucial in LASSO regression.
A larger λ value increases the amount of regularization, leading to more coefficients being
pushed towards zero. Conversely, a smaller λ value reduces the regularization effect, allowing
more variables to have non-zero coefficients.
6. Model fitting: To estimate the coefficients in LASSO regression, an optimization algorithm is used
to minimize the objective function. Coordinate Descent is commonly employed, which iteratively
updates each coefficient while holding the others fixed.

LASSO regression offers a powerful framework for both prediction and feature selection,
especially when dealing with high-dimensional datasets where the number of features is
large. By striking a balance between simplicity and accuracy, LASSO can provide
interpretable models while effectively managing the risk of overfitting.

It’s worth noting that LASSO is just one type of regularization technique, and there are other
variants such as Ridge regression (L2 regularization) and Elastic Net

Lasso Meaning

The word “LASSO” stands for Least Absolute Shrinkage and Selection Operator. It is a
statistical formula for the regularisation of data models and feature selection.

Regularization
Regularization is an important concept that is used to avoid overfitting of the data, especially
when the trained and test data are much varying.

Regularization is implemented by adding a “penalty” term to the best fit derived from the
trained data, to achieve a lesser variance with the tested data and also restricts the
influence of predictor variables over the output variable by compressing their coefficients.

In regularization, what we do is normally we keep the same number of features but reduce
the magnitude of the coefficients. We can reduce the magnitude of the coefficients by using
different types of regression techniques which uses regularization to overcome this problem.
So, let us discuss them. Before we move further, you can also upskill with the help of online
courses on Linear Regression in Python and enhance your skills.

Lasso Regularization Techniques


There are two main regularization techniques, namely Ridge Regression and Lasso
Regression. They both differ in the way they assign a penalty to the coefficients. In this blog,
we will try to understand more about Lasso Regularization technique.

L1 Regularization
If a regression model uses the L1 Regularization technique, then it is called Lasso Regression. If
it used the L2 regularization technique, it’s called Ridge Regression. We will study more about
these in the later sections.
L1 regularization adds a penalty that is equal to the absolute value of the magnitude of the
coefficient. This regularization type can result in sparse models with few coefficients. Some
coefficients might become zero and get eliminated from the model. Larger penalties result in
coefficient values that are closer to zero (ideal for producing simpler models). On the other
hand, L2 regularization does not result in any elimination of sparse models or coefficients.
Thus, Lasso Regression is easier to interpret as compared to the Ridge. While there are ample
resources available online to help you understand the subject, there’s nothing quite like a
certificate. Check out Great Learning’s best artificial intelligence course online to upskill in the
domain. This course will help you learn from a top-ranking global school to build job-ready
AIML skills. This 12-month program offers a hands-on learning experience with top faculty and
mentors. On completion, you will receive a Certificate from The University of Texas at Austin,
and Great Lakes Executive Learning.

Also Read: Python Tutorial for Beginners

Mathematical equation of Lasso Regression


Residual Sum of Squares + λ * (Sum of the absolute value of the magnitude of coefficients)

Where,

λ denotes the amount of shrinkage.


λ = 0 implies all features are considered and it is equivalent to the linear regression where only
the residual sum of squares is considered to build a predictive model
λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more
features
The bias increases with increase in λ
variance increases with decrease in λ

Lasso Regression in Python


For this example code, we will consider a dataset from Machine hack’s Predicting Restaurant
Food Cost Hackathon.

About the Data Set

The task here is about predicting the average price for a meal. The data consists of the
following features.
Size of training set: 12,690 records

Size of test set: 4,231 records

Columns/Features

TITLE: The feature of the restaurant which can help identify what and for whom it is suitable
for.

RESTAURANT_ID: A unique ID for each restaurant.

CUISINES: The variety of cuisines that the restaurant offers.

TIME: The open hours of the restaurant.

CITY: The city in which the restaurant is located.

LOCALITY: The locality of the restaurant.

RATING: The average rating of the restaurant by customers.

VOTES: The overall votes received by the restaurant.

COST: The average cost of a two-person meal.

After completing all the steps till Feature Scaling (Excluding), we can proceed to building a
Lasso regression. We are avoiding feature scaling as the lasso regression comes with a
parameter that allows us to normalise the data while fitting it to the model.

Also Read: Top Machine Learning Interview Questions

Lasso regression example

import numpy as np

Creating a New Train and Validation Datasets


from sklearn.model_selection import train_test_split
data_train, data_val = train_test_split(new_data_train, test_size = 0.2,
random_state = 2)

Classifying Predictors and Target

#Classifying Independent and Dependent Features


#_______________________________________________
#Dependent Variable
Y_train = data_train.iloc[:, -1].values
#Independent Variables
X_train = data_train.iloc[:,0 : -1].values
#Independent Variables for Test Set
X_test = data_val.iloc[:,0 : -1].values

Evaluating The Model With RMLSE

def score(y_pred, y_true):


error = np.square(np.log10(y_pred +1) - np.log10(y_true +1)).mean() **
0.5
score = 1 - error
return score
actual_cost = list(data_val['COST'])
actual_cost = np.asarray(actual_cost)

Building the Lasso Regressor

#Lasso Regression

from sklearn.linear_model import Lasso


#Initializing the Lasso Regressor with Normalization Factor as True
lasso_reg = Lasso(normalize=True)
#Fitting the Training data to the Lasso regressor
lasso_reg.fit(X_train,Y_train)
#Predicting for X_test
y_pred_lass =lasso_reg.predict(X_test)
#Printing the Score with RMLSE
print("\n\nLasso SCORE : ", score(y_pred_lass, actual_cost))
Output
0.7335508027883148

The Lasso Regression attained an accuracy of 73% with the given Dataset.

Also Read: What is Linear Regression in Machine Learning?

Lasso Regression in R
Let us take “The Big Mart Sales” dataset we have product-wise Sales for Multiple outlets of a
chain.

In the dataset, we can see characteristics of the sold item (fat content, visibility, type, price)
and some characteristics of the outlet (year of establishment, size, location, type) and the
number of the items sold for that particular item. Let’s see if we can predict sales using these
features.

Let’s us take a snapshot of the dataset:

Let’s Code!

Quick check – Deep Learning Course

Ridge and Lasso Regression


Lasso Regression is different from ridge regression as it uses absolute coefficient values for
normalization.

As loss function only considers absolute coefficients (weights), the optimization algorithm will
penalize high coefficients. This is known as the L1 norm.

In the above image we can see, Constraint functions (blue area); left one is for lasso whereas
the right one is for the ridge, along with contours (green eclipse) for loss function i.e, RSS.

In the above case, for both regression techniques, the coefficient estimates are given by the
first point at which contours (an eclipse) contacts the constraint (circle or diamond) region.
On the other hand, the lasso constraint, because of diamond shape, has corners at each of
the axes hence the eclipse will often intersect at each of the axes. Due to that, at least one of
the coefficients will equal zero.

However, lasso regression, when α is sufficiently large, will shrink some of the coefficients
estimates to 0. That’s the reason lasso provides sparse solutions.

The main problem with lasso regression is when we have correlated variables, it retains only
one variable and sets other correlated variables to zero. That will possibly lead to some loss
of information resulting in lower accuracy in our model.

That was Lasso Regularization technique, and I hope now you can comprehend it in a better
way. You can use this to improve the accuracy of your machine learning models.

Difference Between Ridge Regression and Lasso Regression

Ridge Regression Lasso Regression

The penalty term is the sum of the squares of The penalty term is the sum of the absolute values
the coefficients (L2 regularization). of the coefficients (L1 regularization).

Shrinks the coefficients but doesn’t set any Can shrink some coefficients to zero, effectively
coefficient to zero. performing feature selection.

Helps to reduce overfitting by shrinking large Helps to reduce overfitting by shrinking and
coefficients. selecting features with less importance.

Works well when there are a large number of Works well when there are a small number of
features. features.

Performs “soft thresholding” of coefficients. Performs “hard thresholding” of coefficients.

In short, Ridge is a shrinkage model, and Lasso is a feature selection model. Ridge tries to
balance the bias-variance trade-off by shrinking the coefficients, but it does not select any
feature and keeps all of them. Lasso tries to balance the bias-variance trade-off by shrinking
some coefficients to zero. In this way, Lasso can be seen as an optimizer for feature selection.

Quick check – Free Machine Learning Course

Interpretations and Generalizations


Interpretations:

1. Geometric Interpretations
2. Bayesian Interpretations
3. Convex relaxation Interpretations
4. Making λ easier to interpret with an accuracy-simplicity tradeoff

Generalizations

1. Elastic Net
2. Group Lasso
3. Fused Lasso
4. Adaptive Lasso
5. Prior Lasso
6. Quasi-norms and bridge regression

Conclusion
LASSO regression is a valuable statistical modeling and machine learning technique that
balances model simplicity and accuracy. By adding a penalty term based on the absolute
values of the coefficients, LASSO encourages sparsity in the model, leading to automatic
feature selection and the identification of relevant variables. The regularization parameter λ
controls the amount of regularization applied, and a larger λ value pushes more coefficients
toward zero. LASSO regression is instrumental when dealing with high-dimensional datasets,
as it can effectively manage to overfit and provide interpretable models. Overall, LASSO
regression is a powerful tool for prediction and feature selection, offering a practical solution
for various data analysis and machine learning applications.

FAQs Related to Lasso Regression


What is Lasso regression used for?
Lasso regression is used for eliminating automated variables and the selection of features.

What is lasso and ridge regression?


Lasso regression makes coefficients to absolute zero; while ridge regression is a model
turning method that is used for analyzing data suffering from multicollinearity

What is Lasso Regression in machine learning?


Lasso regression makes coefficients to absolute zero; while ridge regression is a model
turning method that is used for analyzing data suffering from multicollinearity
Why does Lasso shrink zero?
The L1 regularization performed by Lasso, causes the regression coefficient of the less
contributing variable to shrink to zero or near zero.

Is lasso better than Ridge?


Lasso is considered to be better than ridge as it selects only some features and decreases
the coefficients of others to zero.

How does Lasso regression work?


Lasso regression uses shrinkage, where the data values are shrunk towards a central point
such as the mean value.

What is the Lasso penalty?


The Lasso penalty shrinks or reduces the coefficient value towards zero. The less contributing
variable is therefore allowed to have a zero or near-zero coefficient.

Is lasso L1 or L2?
A regression model using the L1 regularization technique is called Lasso Regression, while a
model using L2 is called Ridge Regression. The difference between these two is the term
penalty.

Is lasso supervised or unsupervised?


Lasso is a supervised regularization method used in machine learning.

If you are a beginner in the field, take up the artificial


intelligence and machine learning online course offered by
Great Learning.

Sharing is caring:
Great Learning Team
Great Learning's Blog covers the latest developments and innovations in technology
that can be leveraged to build rewarding careers. You'll find career guides, tech
tutorials and industry news to keep yourself updated with the fast-changing world of
tech and business.

Recommended for you

Python NumPy Tutorial – 2024 Top 6 Career Options after MBA in


Business Analytics in 2024
Top 10 Web Scraping Projects of Data Science vs Machine Learning
2024 and Artificial Intelligence: The
Difference Explained (2024)

Data Scientist Resume Examples, Label Encoding in Python – 2024


Templates & Samples | 2024

Leave a Comment
Your email address will not be published. Required fields are marked *

Type here..

Name* Email* Website

Save my name, email, and website in this browser for the next time I comment.
I'm not a robot
reCAPTCHA

Post Comment »

Free Courses

Python for Machine Learning

Data Science Foundations

Deep Learning with Python

Introduction to Cyber Security

Introduction to Digital Marketing

Java Programming

View More →

Blog Categories

Data Science

Artificial Intelligence

Career

Cybersecurity

Full Stack Development

Study Abroad

Study In USA

Popular Courses

PGP In Data Science and Business Analytics

PGP In Artificial Intelligence And Machine Learning


PGP In Management

PGP In Cloud Computing

Software Engineering Course

PGP In Digital Marketing

View More →

Salary Blogs

Salary Calculator

Data Architect Salary

Cloud Engineer Salary

Software Engineer Salary

Product Manager Salary

Interview Questions

Java Interview Questions

Python Interview Questions

SQL Interview Questions

Selenium Interview Questions

Machine Learning Interview Questions

NLP Interview Questions

View More →

About Us Contact Us Privacy Policy Terms of Use Great Learning Careers

© 2013 - 2023 Great Learning Education Services Private Limited (Formerly known as Great Lakes E-Learning Services

Private Limited).All rights reserved


Get our android app Get our ios app

You might also like