Assign3 Lasso

This document discusses using LASSO regression to predict loss amounts using a dataset of insurance claim sizes with 130 features. LASSO regression is used to perform both feature selection and shrinkage given the large number of features. Various figures show the relationship between feature coefficients and the tuning parameter lambda, how mean squared error decreases with increasing model complexity, and means squared error as a function of lambda. The tuned LASSO model achieves an R-squared of 0.474 and improves out-of-sample prediction accuracy over linear regression, though the author notes it may still not be reliable enough for decision making.

Uploaded by

Chelsi Gondalia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Assign3 Lasso

Uploaded by

Chelsi Gondalia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment 3- LASSO Regression

In this study, we consider a dataset of claim sizes (severities) from Allstate. The data set has 130 features
and we are uncertain about what the features represent. Hence, the regression models in this study are purely
based on shrinkage and selection techniques.
Since the data is large (188,318 records and 130 features) we chose a training data that conists of 20% of
the data to keep runtimes manageable. We want to develop models to predict loss. The most basic candidate
for predictive modeling: linear regression is first used. This OLS model is regressed on all the 130 features
and is compared to the LASSO regression model later in this report.
We try to use LASSO regression using glmnet for our predictive modeling purposes. Given the size of
our model, LASSO regression helps in selection as well as shrinkage. For the LASSO model, there is a
penalty term which is weighed by a tuning parameter,  Figure 1 visualizes the relationship between
feature coefficients and . It should be noted that when there are large s, the feature coefficients are
essentially set to zero. We must reach a compromise where the feature coefficients and  value makes the
most sense.

Figure 1. Relationship between feature coefficients and tuning parameter, .

Figure 2 illustrates how the mean-squared error of our in-sample (blue) and out-of-sample data decreases
as model complexity i.e. number of features in the model increases. It is worth mentioning that as the
model becomes more complex, the in-sample Mean-squared error becomes smaller and our LASSO
model starts resembling our OLS regression model.

Figure 2. Mean-squared-error of the test and train sets as a function of the model complexity.
Furthermore, Figure 3 compares the relationship between Mean-squared error and l. It is clear that the
mean-squared error increases exponentially as log () goes beyond 4. In order to tune our parameter , we
use the cross-validation technique with 5 folds. With this technique we get a minimum =1.285.

Figure 3. Mean-squared-error as a function of the tuning parameter, .

Now that we have our LASSO regression model tuned, we must evaluate its performance. From further
calculations and by comparing our model predictions to the test set data we obtain a R2=0.474. This value
by itself is quite low and shows that our model may not be completely reliable. We also compare the
LASSO regression model to the OLS regression model in Table 1. It should be noted that with LASSO
regression model, the in-sample RMSE increases slightly and contrastingly out-of-sample RMSE
decreases significantly by 40%. This shows that the LASSO regression model is more efficient with Out-
of-sample predictions than the OLS regression model. However, we should keep in mind that even the
LASSO model is not perfect and may not be used as the key factor in decision making in this case.
Table 1. In-sample, Out-of-sample RMSEs for Linear and LASSO regression models.
In-sample RMSE Out-of-sample RMSE
Linear Regression model 1929.04 3584.16
LASSO Regression model 1994.70 2122.16

Rise of The Demon God 1-500 - Demonic - Angel
No ratings yet
Rise of The Demon God 1-500 - Demonic - Angel
2,019 pages
Structure and Quranic Interpretation A Study of Symmetry and Coherence in Islams Holy Text
No ratings yet
Structure and Quranic Interpretation A Study of Symmetry and Coherence in Islams Holy Text
179 pages
Schistosomiasis
75% (4)
Schistosomiasis
92 pages
Lasso Regression
No ratings yet
Lasso Regression
16 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Feature Selection Using LASSO
No ratings yet
Feature Selection Using LASSO
26 pages
What Is LASSO Regression Definition, Examples and Techniques
No ratings yet
What Is LASSO Regression Definition, Examples and Techniques
15 pages
Lesson Four
No ratings yet
Lesson Four
28 pages
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
No ratings yet
Notes_Lecture 13_Regularization_LASSO and RIDGE Regression
29 pages
6414 SP2022 Practice Final Part1 Solutions
No ratings yet
6414 SP2022 Practice Final Part1 Solutions
3 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
AML-3
No ratings yet
AML-3
19 pages
Tibshirani Lasso
No ratings yet
Tibshirani Lasso
22 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Ridge Lasso Regression Bias Variance Tradeoff 71
No ratings yet
Ridge Lasso Regression Bias Variance Tradeoff 71
19 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Model Selection
No ratings yet
Model Selection
11 pages
Lecture8a Regularization
No ratings yet
Lecture8a Regularization
30 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
1. Lecture+Notes+-+Advanced+Regression
No ratings yet
1. Lecture+Notes+-+Advanced+Regression
12 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
4lasso and Friends
No ratings yet
4lasso and Friends
36 pages
HW1 (1)
No ratings yet
HW1 (1)
7 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Hdnotes 2021
No ratings yet
Hdnotes 2021
31 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Chapter3
No ratings yet
Chapter3
52 pages
Residual Analysis For Simple Linear Regression: X B B y N e N e
No ratings yet
Residual Analysis For Simple Linear Regression: X B B y N e N e
15 pages
Lasoo Regression
No ratings yet
Lasoo Regression
8 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
A Convenient Approach For Penalty Parameter Selection in Robust Lasso Regression
No ratings yet
A Convenient Approach For Penalty Parameter Selection in Robust Lasso Regression
12 pages
Subset Selection and Shrinkage Methods
No ratings yet
Subset Selection and Shrinkage Methods
25 pages
TP2_reg_2024
No ratings yet
TP2_reg_2024
5 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
SubjectiveQuestions
No ratings yet
SubjectiveQuestions
4 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Lecture 13 - Reguralization
No ratings yet
Lecture 13 - Reguralization
33 pages
Euclid Aos 1083178935
No ratings yet
Euclid Aos 1083178935
93 pages
SL_3
No ratings yet
SL_3
11 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
Machine L in China Appendix
No ratings yet
Machine L in China Appendix
43 pages
Chapter 3. Linear Regression
No ratings yet
Chapter 3. Linear Regression
41 pages
Econometrics jimma assignment
No ratings yet
Econometrics jimma assignment
6 pages
Lec_5
No ratings yet
Lec_5
53 pages
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
No ratings yet
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
43 pages
10 CÂU LÝ THUYẾT MIDTERM
No ratings yet
10 CÂU LÝ THUYẾT MIDTERM
2 pages
Eco Trix
No ratings yet
Eco Trix
16 pages
ARCH Final Project, Neel Ravi
No ratings yet
ARCH Final Project, Neel Ravi
14 pages
ED7102-Computer Applications in Design
100% (2)
ED7102-Computer Applications in Design
11 pages
Buddhist Tourism in Ladakh
No ratings yet
Buddhist Tourism in Ladakh
5 pages
CB - IX - Sci - CH 11 - Work and Energy - Specific Qs
No ratings yet
CB - IX - Sci - CH 11 - Work and Energy - Specific Qs
2 pages
Sovietology
No ratings yet
Sovietology
54 pages
Kisi Up Ukmppg
No ratings yet
Kisi Up Ukmppg
17 pages
On The Reflexive Nature of Archaeology
No ratings yet
On The Reflexive Nature of Archaeology
12 pages
Citation On Partition
100% (1)
Citation On Partition
2 pages
AIGA 021 05 Oxygen Pipeline Systems
No ratings yet
AIGA 021 05 Oxygen Pipeline Systems
31 pages
10 Argot
No ratings yet
10 Argot
3 pages
Social Skills Module
No ratings yet
Social Skills Module
6 pages
Ani Asteroidi
No ratings yet
Ani Asteroidi
1 page
Attention, Interest, Desire, Action - The
No ratings yet
Attention, Interest, Desire, Action - The
4 pages
Scientific Paper Micros
No ratings yet
Scientific Paper Micros
7 pages
Sermons of Saint Bernard On Advent Christmas Including The Famous Treatise On The Incarnation Called Missus Est... (Saint Bernard of Clairvaux) (Z-Library)
No ratings yet
Sermons of Saint Bernard On Advent Christmas Including The Famous Treatise On The Incarnation Called Missus Est... (Saint Bernard of Clairvaux) (Z-Library)
117 pages
3 - Gupta and Harsha Period
No ratings yet
3 - Gupta and Harsha Period
12 pages
Article - Harper's - February 2011 - Adam Hochschild Reviews Timothy Snyder's Blood Lands
No ratings yet
Article - Harper's - February 2011 - Adam Hochschild Reviews Timothy Snyder's Blood Lands
4 pages
PHD Thesis Download in Marketing
100% (2)
PHD Thesis Download in Marketing
5 pages
The United States Korean Relations
No ratings yet
The United States Korean Relations
107 pages
300.XXX - Discharge Instructions - Form Template
No ratings yet
300.XXX - Discharge Instructions - Form Template
1 page
Urban Folk-Issue 3
No ratings yet
Urban Folk-Issue 3
32 pages
Chapter 2 Roles For e Government in Financial Regulation and Monitoring
No ratings yet
Chapter 2 Roles For e Government in Financial Regulation and Monitoring
17 pages
Pathophysiology and Investigation of Coronary Artery Disease
No ratings yet
Pathophysiology and Investigation of Coronary Artery Disease
4 pages
DLL All-Subjects-1 Q1 W3 D2
No ratings yet
DLL All-Subjects-1 Q1 W3 D2
6 pages
Craig 3 Simple Questions For First Catilinarian PDF
No ratings yet
Craig 3 Simple Questions For First Catilinarian PDF
14 pages
Simon Thorn and The Wolf S Den Activity Pack
No ratings yet
Simon Thorn and The Wolf S Den Activity Pack
8 pages
QRA - Food
No ratings yet
QRA - Food
27 pages

Assign3 Lasso

Uploaded by

Assign3 Lasso

Uploaded by

Assignment 3- LASSO Regression

Figure 1. Relationship between feature coefficients and tuning parameter, .

Figure 3. Mean-squared-error as a function of the tuning parameter, .

You might also like