6th Lecture Note 108335647 230518 203102
6th Lecture Note 108335647 230518 203102
Seyoung Park
Objective
• Linear Regression Analysis with R
• Learn "Estimation", "Inference" and "Shrinkage methods (Lasso regression, Ridge regression)" with
linear regression model
Regression Analysis
• Regression analysis: used for modeling the relationship between explanatory variables (X) and a
dependent variable (Y)
• Simple linear regression analysis: linear regression model with a single explanatory variable (X), i.e.
Y = β0 + β1 X + ϵ. Here ϵ represents a random error, e.g. ϵ ∼ N (0, σ 2 )
y i = β0 + β1 x i + ϵi ,
1
• Is the obtained β1 statistically significant?
“stat500” Example
# use "stat500" data
# stat500 is 55 by 4 dimensional data
library("faraway")
data(stat500)
head(stat500)
25
20
15
10 15 20 25 30
midterm
##
2
## Call:
## lm(formula = final ~ midterm, data = stat500)
##
## Coefficients:
## (Intercept) midterm
## 15.0462 0.5633
25
20
15
10 15 20 25 30
midterm
3
35
30
final
25
20
15
10 15 20 25 30
midterm
##
## Call:
## lm(formula = final ~ midterm, data = stat500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.932 -2.657 0.527 2.984 9.286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.0462 2.4822 6.062 1.44e-07 ***
## midterm 0.5633 0.1190 4.735 1.67e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.192 on 53 degrees of freedom
## Multiple R-squared: 0.2973, Adjusted R-squared: 0.284
## F-statistic: 22.42 on 1 and 53 DF, p-value: 1.675e-05
4
Multiple Linear Regression
• Multiple linear regression analysis is based on multiple explanatory variables X1 , X2 , · · · , Xp and
Y = β0 + β1 X1 + β2 X2 + · · · βp Xp + ϵ
• How to interpret β1 , · · · , βp ? βj is the effect of Xj on Y when the other p − 1 explanatory variables are
fixed.
• How to analyze when there are too many explanatory variables (i.e. p is too large) ?
“stat500” Example
Fitted linear regression model is f inal = 16.81 + 0.57 midterm − 0.08 hw
# use "stat500" data
# stat500 is 55 by 4 dimensional data
data(stat500)
lm(final ~ midterm + hw, data = stat500)
##
## Call:
## lm(formula = final ~ midterm + hw, data = stat500)
##
## Coefficients:
## (Intercept) midterm hw
## 16.81061 0.58179 -0.08157
##
## Call:
## lm(formula = final ~ midterm + hw, data = stat500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.0388 -2.5964 0.3714 3.0063 9.3497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.81061 4.08112 4.119 0.000137 ***
## midterm 0.58179 0.12445 4.675 2.12e-05 ***
## hw -0.08157 0.14916 -0.547 0.586836
## ---
5
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.22 on 52 degrees of freedom
## Multiple R-squared: 0.3013, Adjusted R-squared: 0.2744
## F-statistic: 11.21 on 2 and 52 DF, p-value: 8.948e-05
In vector form
n
X
minimize ∥yi − x′i β∥2 ,
i=1
• The obtained minimizer β̂ = (β̂0 , β̂1 , · · · , β̂p ) is called the "Ordinary Least Squares" estimator (OLS
estimator) for β.
OLS estimator
Let X is an n by p + 1 matrix and y is an n-dimensional vector such that
−x1 − y1
−x2 − y2
X = . and y = .
.. ..
−xn − yn
X T X β̂ = X T y ⇒ β̂ = (X T X)−1 X T y
6
X = cbind(rep(1, nrow(X)), X)
# OLS estimator
OLS = solve(t(X)%*%X, t(X)%*%y)
## [,1]
## rep(1, nrow(X)) 16.81060740
## midterm 0.58178957
## hw -0.08156661
lm(final ~ midterm + hw, data = stat500)$coefficients
## (Intercept) midterm hw
## 16.81060740 0.58178957 -0.08156661
Goodness of fit
• It is essential to measure how well the linear regression model fits the data
Goodness of fit
• It holds that SSR + SSE = SST . Why?
• R2 (R-squared):
SSR SSE
R2 = =1−
SST SST
• R2 has ranges from 0 to 1. 0 indicates that ŷi = ȳ. On the other hand, 1 indicates ŷi = yi , i.e. linear
regression predictions perfectly fit the data (or perfectly explains the observed variation)
7
• Larger R2 indicates a better fit to the data
• For simple linear regression (i.e. p = 1), R2 is equal to r2 which is a square of the sample correlation
between X and Y
Adjusted R2
• Note that R2 , i.e.,
SSE
R2 = 1 −
SST
is based on biased estimates of the variances of the dependent variable and of the errors. Why?
SSE/(n − p − 1) n−1
Adjusted R2 = 1 − = 1 − (1 − R2 ) ,
SST /(n − 1) n−p−1
where n − 1 and n − p − 1 represent degree of freedom of the estimate of the variance of the dependent
variable and of the estimate of the error variance, respectively
##
## Call:
## lm(formula = Species ~ Area + Elevation + Scruz + Adjacent, data = gala)
##
## Coefficients:
## (Intercept) Area Elevation Scruz Adjacent
## 7.07538 -0.02398 0.31957 -0.23936 -0.07485
8
“Galapagos Islands” Example (cont.)
# compute R-squared
y = gala$Species
R2 = 1 - deviance(fit)/ sum((y-mean(y))ˆ2)
R2
## [1] 0.7658462
# compute Adjusted R-squared
n=30; p=4
R2_adjusted = 1 - (1-R2)*(n-1)/(n-p-1)
R2_adjusted
## [1] 0.7283816
# "Multiple R-squared" is the R-squared value
summary(fit)
##
## Call:
## lm(formula = Species ~ Area + Elevation + Scruz + Adjacent, data = gala)
##
## Residuals:
## Min 1Q Median 3Q Max
## -111.637 -34.930 -7.864 33.432 182.524
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.07538 18.74982 0.377 0.709093
## Area -0.02398 0.02151 -1.115 0.275554
## Elevation 0.31957 0.05113 6.250 1.54e-06 ***
## Scruz -0.23936 0.16464 -1.454 0.158434
## Adjacent -0.07485 0.01663 -4.501 0.000136 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.74 on 25 degrees of freedom
## Multiple R-squared: 0.7658, Adjusted R-squared: 0.7284
## F-statistic: 20.44 on 4 and 25 DF, p-value: 1.39e-07
##
## Call:
## lm(formula = Species ~ Area + Elevation + Scruz + Adjacent +
## Nearest, data = gala)
##
## Coefficients:
## (Intercept) Area Elevation Scruz Adjacent Nearest
## 7.068221 -0.023938 0.319465 -0.240524 -0.074805 0.009144
9
“Galapagos Islands” Example (cont.)
# compute R-squared
y = gala$Species
R2 = 1 - deviance(fit2)/ sum((y-mean(y))ˆ2)
R2
## [1] 0.7658469
# compute Adjusted R-squared
n=30; p=5
R2_adjusted = 1 - (1-R2)*(n-1)/(n-p-1)
R2_adjusted
## [1] 0.7170651
# "Multiple R-squared" is the R-squared value
summary(fit2)
##
## Call:
## lm(formula = Species ~ Area + Elevation + Scruz + Adjacent +
## Nearest, data = gala)
##
## Residuals:
## Min 1Q Median 3Q Max
## -111.679 -34.898 -7.862 33.460 182.584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.068221 19.154198 0.369 0.715351
## Area -0.023938 0.022422 -1.068 0.296318
## Elevation 0.319465 0.053663 5.953 3.82e-06 ***
## Scruz -0.240524 0.215402 -1.117 0.275208
## Adjacent -0.074805 0.017700 -4.226 0.000297 ***
## Nearest 0.009144 1.054136 0.009 0.993151
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 60.98 on 24 degrees of freedom
## Multiple R-squared: 0.7658, Adjusted R-squared: 0.7171
## F-statistic: 15.7 on 5 and 24 DF, p-value: 6.838e-07
Inference of model
• Are any of the p predictors X1 , · · · , Xp useful when predicting the dependent variable Y ?
• Corresponding F-statistics is
SSR/p M SR
F = = ,
SSE/(n − p − 1) M SE
where MSR and MSE are "regression mean square" and "mean square error", respectively
10
Table 1: Analysis of variance table
Source of variation df Sum of squares Mean of squares F-statistic
Pn
Regression p SSR = i=1 (ŷi − ȳ)2 M SR = SSRp F =M M SR
SE
Pn SSE
Residual n - p -1 SSE = i=1 (yi − ŷi )2 M SE = n−p−1
Pn
Total n-1 SST = i=1 (yi − ȳ)2
• Null hypothesis H0 is rejected if the F value computed from the data is greater than the critical value.
−1
More specifically, given the significance level α such as 0.01, 0.05, 0.1, check whether F > F1−α (p, n−p−1)
−1
or not, where F1−α (p, n − p − 1) is the 1 − α quantile of the Fp,n−p−1 distribution
• F value is related to R2 :
R2 n − p − 1
F =
1 − R2 p
• What if we get a very small F statistic? We can try nonlinear transformation variables or apply other
models: e.g. xj ← log(xj + 1)
11
## [1] TRUE
# Fstat > crit_value! This means we reject H0
# Compute p-value
pvalue = 1-pf(Fstat, p, n-p-1)
pvalue < 0.05
## [1] TRUE
# pvalue < significance level!
# This means we reject H0
• Now we are interested in whether one particular explanatory variable (say βj ) can be dropped from the
linear regression model, i.e. consider
H0 : βj = 0 versus Ha : βj ̸= 0
• Let SSE1 and SSE2 be the sum of squares of residuals of the models M1 and M2 , respectively:
• Then F-Statistic is
(SSE1 − SSE2 )/(p2 − p1 )
F =
SSE2 /(n − p2 − 1)
H0 : βj = 0 versus Ha : βj ̸= 0
• Recall that SSE1 and SSE2 are the sum of squares of residuals of the models M1 and M2 , respectively:
12
• Then F-Statistic is
(SSE1 − SSE2 )
F =
SSE2 /(n − p − 1)
• Referred distribution is F1,n−p−1 =d [t(n − p − 1)]2 , where t(n − p − 1) represents Student’s t-distribution
with a degree of freedom n − p − 1
“savings” Example
# use "savings" data
# savings is an old economic dataset on 50
# different countries (50 by 5 dimensional data)
library("faraway")
data(savings)
head(savings)
13
# apply linear regression model
fit2 = lm(sr ~ ., data = savings)
# Compute F-Statistic
n=nrow(savings); p=ncol(savings)-1
SST2 = sum((savings$sr - mean(savings$sr))ˆ2)
SSE2 = deviance(fit2)
MSE2 = SSE2/fit2$df.residual #fit$df.residual = n-p-1
SSR2 = SST2 - SSE2
MSR2 = SSR2/p
Fstat2 = MSR2/MSE2
## [1] 0.1255298
# compute p-value using t-distribution
2*pt(-1.561,n-p-1)
## [1] 0.1255297
Questions
1. Perform the following hypothesis test:
14
Questions (cont.)
# [1.] compare two nested models
fit2 = lm(sr ~ ., data = savings)
fit1 = lm(sr ~ pop15 + pop75, savings)
anova(fit1, fit2)
Questions (cont.)
# [2.] compare two nested models
fit2 = lm(sr ~ pop15 + pop75 + ddpi, data = savings)
fit1 = lm(sr ~ pop15 + pop75, savings)
anova(fit1, fit2)
Testing a subspace
One can consider the following hypothesis test:
15
Testing a subspace
One can consider the following hypothesis test:
Testing a subspace
One can consider the following hypothesis test:
Questions
Q1 . Perform the following hypothesis test:
16
which implies
βpop15 = β1 + β2 , βpop75 = 4β1 , βdpi = β2 , βddpi = β3 .
This can be rewritten as the following compact form:
• However, (1). when the number of explanatory variables (p) is much larger than sample size (n); (2).
when columns of the design matrix X are highly correlated, obtained β̂ can be unstable and often less
interpretable
Ridge regression
Pp
• Ridge regression is based on limiting j=1 βj2
where λ > 0 is a user-determined tuning parameter that controls the tradeoff between fit and penalty
17
library("MASS")
# ridge regression
lm.ridge(sr ~ ., data= savings, lambda = 1)
Ridge regression
1
Coefficients
0
−1
βpop15
−2
βpop75
−3
βdpi
−4
βddpi
0 200 400 600 800 1000
λ
Apendix
# Ridge regression
lam_set = seq(0, 1000, 1)
result = lm.ridge(sr ~ .,data=savings,lambda=lam_set)
plot(lam_set, result$coef["pop15",],type = "l",
xlim=range(lam_set),ylim=range(result$coef),lwd=2,
xlab=expression(lambda),ylab="Coefficients",cex.lab=2)
lines(lam_set,result$coef["pop75",],col="blue",lty=2,lwd=2)
lines(lam_set,result$coef["dpi",],col="red",lty=3,lwd=2)
lines(lam_set,result$coef["ddpi",],col="green",lty=4,lwd=2)
abline(h = 0, lwd = 2)
# Add legend
legend(300,-1,legend=expression(beta[pop15],beta[pop75],
beta[dpi],beta[ddpi]),
col=c("black", "blue", "red", "green"), lty=1:4, cex=1)
18
which clearly illustrates the shrinkage effect of Ridge regression
• Ridge regression produce the effect of shrinking the estimates of β toward zero that cause a bias but
reduce a variance of the estimator. Think about M SE(β̂) = Bias(β̂)2 + V ariance(β̂)!
Lasso regression
Pp
• Lasso regression is based on limiting j=1 |βj |
where λ > 0 is a user-determined tuning parameter that controls the tradeoff between fit and penalty
• Lasso regression has the effect of making the estimates of some of βj s exactly zero that cause a bias but
reduce a variance of the estimator.
19
Ridge regression (not efficient)
Ridge
Coefficients
0.0
−0.5
βpop15
βpop75
−1.0
βdpi
βddpi
−1.5
0 20 40 60 80 100
λ
Appendix (not efficient)
par(mfrow = c(1,2))
# ridge regression
plot(lam_set1, rev(ridge$beta[1,]),type = "l",
xlim=range(lam_set1),ylim=range(ridge$beta),lwd=2,
main = "Ridge",
xlab=expression(lambda),ylab="Coefficients",cex.lab=2)
lines(lam_set1,rev(ridge$beta[2,]),col="blue",lty=2,lwd=2)
lines(lam_set1,rev(ridge$beta[3,]),col="red",lty=3,lwd=2)
lines(lam_set1,rev(ridge$beta[4,]),col="green",lty=4,lwd=2)
abline(h = 0, lwd = 2)
# Add legend
legend(20,-0.5,legend=expression(beta[pop15],beta[pop75],
beta[dpi],beta[ddpi]),
col=c("black", "blue", "red", "green"), lty=1:4, cex=1)
20
Ridge/Lasso regression (efficient)
Ridge Lasso
Coefficients
Coefficients
0.0
0.0
−0.5
−0.5
−1.0
−1.0
βpop15 βpop15
βpop75 βpop75
βdpi βdpi
−1.5
−1.5
βddpi βddpi
0 20 40 60 80 100 0 2 4 6 8 10
λ λ
Ridge/Lasso regression (efficient)
par(mfrow = c(1,2))
names = c("Ridge", "Lasso");result = list(ridge, lasso)
lam = list(lam_set1,lam_set2)
for (i in 1:2){
plot(lam[[i]], rev(result[[i]]$beta[1,]),type = "l",
xlim=range(lam[[i]]),ylim=range(result[[i]]$beta),
main = names[i],
xlab=expression(lambda),ylab="Coefficients",cex.lab=2)
lines(lam[[i]],rev(result[[i]]$beta[2,]),col="blue",lty=2)
lines(lam[[i]],rev(result[[i]]$beta[3,]),col="red",lty=3)
lines(lam[[i]],rev(result[[i]]$beta[4,]),col="green",lty=4)
abline(h = 0, lwd = 2)
# Add legend
legend("bottomright",legend=expression(beta[pop15],
beta[pop75], beta[dpi],beta[ddpi]),
col=c("black", "blue", "red", "green"), lty=1:4, cex=1)}
21
• "Sparsity assumption" is essential in the high-dimensional model
# column normalizing
norm_vec = sqrt(apply(Xˆ2, 2, mean))
X = X / matrix(rep(norm_vec, each = n), nrow = n)
lasso=glmnet(X[,2:(p+1)],y,family="gaussian", alpha=1,
lambda=lam_Lasso)
lasso_beta = rbind(lasso$a0,lasso$beta)
err_ridge = NULL
for (i in 1:length(lam_Ridge)){
err_ridge=c(err_ridge,sqrt(sum((ridge_beta[,i]-beta)ˆ2)))}
err_ridge = rev(err_ridge)
err_lasso = NULL
for (i in 1:length(lam_Lasso)){
22
err_lasso=c(err_lasso,sqrt(sum((lasso_beta[,i]-beta)ˆ2)))}
err_lasso = rev(err_lasso)
Estimation errors
3.0
3
2.0
2
1.0
1
λ λ
Appendix
# Drawing error plots
par(mfrow = c(1,2))
names = c("Errors plot (Ridge)", "Errors plot (Lasso)");
result = list(err_ridge, err_lasso);
lam = list(lam_Ridge,lam_Lasso)
for (i in 1:2){
# ridge regression
plot(lam[[i]], result[[i]],type = "l",
xlim=range(lam[[i]]), ylim=range(result[[i]]),lwd=2,
main = names[i],
xlab=expression(lambda),ylab="Estimation errors")
points(0,err_ls, col = "blue")
# Add legend
legend("bottomright",legend=c(names[[i]],"Least squares"),
col=c("black","blue"),lty=c(1,0),pch = c("","o"),cex=1)}
23
Make a function
# load the uploaded "generating_plot" function instead
err_ridge = NULL
for (i in 1:length(lam_Ridge)){
err_ridge = c(err_ridge, sqrt(sum((ridge_beta[,i] - beta)ˆ2)))}
err_ridge = rev(err_ridge)
err_lasso = NULL
for (i in 1:length(lam_Lasso)){
err_lasso = c(err_lasso, sqrt(sum((lasso_beta[,i] - beta)ˆ2)))}
err_lasso = rev(err_lasso)
par(mfrow = c(1,2))
names = c("Errors plot (Ridge)", "Errors plot (Lasso)"); result = list(err_ridge, err_lasso);
lam = list(lam_Ridge,lam_Lasso)
for (i in 1:2){
# ridge regression
plot(lam[[i]], result[[i]],type = "l",
xlim=range(lam[[i]]), ylim=range(result[[i]]),lwd=2,
main = names[i],
xlab=expression(lambda),ylab="Estimation errors",cex.lab=2)
#abline(h = 0, lwd = 2)
points(0, err_ls, col = "blue")
# Add legend
legend("bottomright", legend=c(names[[i]], "Least squares"),
col=c("black", "blue"), lty=c(1,0), pch = c("", "o") , cex=1)}
return(list(err_lasso=err_lasso, err_ridge=err_ridge,
24
lasso_beta=lasso_beta[, ], ridge_beta=ridge_beta))}
results = generating_plot(X,y,beta,lam_Ridge,lam_Lasso)
Estimation errors
0.8
0.6
0.6
0.4
0.4
0.2
λ λ
Comparisons of regression methods (highly correlated matrix X)
## Now we consider correlated design matrix case
set.seed(3000)
n = 100; p = 70
# Generating covariance (correlation) matrix
25
sigma = 0.99
A = array(0,c(p,p))
for (i in 1:p){for (j in 1:p){A[i,j] = sigmaˆ(abs(i-j))}}
Z = matrix(rnorm(n*p), ncol = p)
# The generating X has independent rows but dependent
# columns whose population covariance matrix is A
# library("expm")
X = Z %*% sqrtm(A); X = cbind(rep(1,n), X)
beta = c(rep(1,5), rep(0, p-4))
y = X %*% beta + 0.1*rnorm(n)
lam_Ridge = seq(0, 20, 0.05); lam_Lasso = seq(0, 1, 0.005)
results = generating_plot(X, y, beta, lam_Ridge, lam_Lasso)
1.0
Estimation errors
Estimation errors
1.6
1.4
0.8
1.2
0.6
1.0
0.8
0.4
λ λ
Model selection criterion
• Among many obtained linear models (by using different λ values), we could choose the best one based
on some criterion
• "R-squared" and "Adjusted R-squared" are one of such criteria, but not often used in the high-dimensional
model
• More popular criterion are "Akaike information criterion" (AIC) and "Bayesian information criterion"
(BIC)
26
AIC and BIC
• AIC/BIC consider trade-off between goodness of fit and simplicity of the model.
• AIC/BIC only provide a relative quality of the model, i.e. they do not provide a statistical inference
(i.e. test) of a model
• Lower AIC/BIC indicates a better model!
• For the model M , let L be the maximum value of the log-likelihood function for the model M . Then
n
!
X (yi − x′ β̂)2 i
AIC(M ) = −2 log L + 2|M | = n log + 2|M |
n
i=1
Pn
(yi − x′i β̂)2
i=1
BIC(M ) = −2 log L + |M | log n = n log + |M | log n
n
• BIC penalizes larger models more aggressively, i.e. BIC prefers smaller models compared to AIC
Pn
• Since σ̂ 2 = 1
n i=1 (yi− x′i β)2 , the maximum value of the log-likelihood function is
n
!
n 2 n 1X ′ 2
− log(σ̂ ) + constant = − log (yi − xi β̂) + constant,
2 2 n i=1
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
## Apply AIC/BIC to the simulation model
# column normalizing
norm_vec = sqrt(apply(Xˆ2, 2, mean))
X = X / matrix(rep(norm_vec, each = n), nrow = n)
27
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
# Apply Lasso regression
lam_Lasso = seq(0.05, 5, 0.01)*sqrt(log(p)/n)*0.1
lasso=glmnet(X[,2:(p+1)],y,family="gaussian", alpha=1,
lambda=lam_Lasso)
lasso_beta = rbind(lasso$a0,lasso$beta)
err_lasso = rev(err_lasso)
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
Errors plot (Lasso)
0.05 0.10 0.15 0.20 0.25 0.30
Estimation errors
λ
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 0.608 0.969 0.588 0.645 0.226 0.688 0.770 0.149 0.000 0.006 0.000 0.000 0.000 0.000 0.
## V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.004 0.000 0.000 0.000 0.000 -0.004 0.000 0.
## V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51
## 0.000 0.000 0.000 0.000 0.000 -0.005 0.003 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.
## V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.005 0.000 0.000 0.
## V76 V77 V78 V79 V80 V81 V82 V83 V84 V85 V86 V87 V88 V89
## 0.000 0.002 0.000 0.000 0.000 -0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.004 0.
## V95 V96 V97 V98 V99 V100 V101 V102 V103 V104 V105 V106 V107 V108 V
28
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 0.000 0.000 0.001 0.000 -0.001 0.000 0.
## V114 V115 V116 V117 V118 V119 V120 V121 V122 V123 V124 V125 V126 V127 V
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.
## V133 V134 V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145 V146 V
## 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.007 0.000 0.000 0.000 0.004 0.
## V152 V153 V154 V155 V156 V157 V158 V159 V160 V161 V162 V163 V164 V165 V
## 0.000 0.000 0.000 0.000 0.013 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.
## V171 V172 V173 V174 V175 V176 V177 V178 V179 V180 V181 V182 V183 V184 V
## -0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.006 0.
## V190 V191 V192 V193 V194 V195 V196 V197 V198 V199 V200
## 0.000 0.000 0.000 0.001 0.000 0.000 0.000 -0.006 -0.010 0.000 0.000
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
# Drawing error plots
names = "Errors plot (Lasso)" ; result = err_lasso;
lam = lam_Lasso
ind = which(err_lasso==min(err_lasso))
round(lasso_beta[,ncol(lasso_beta)-ind+1],3)
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
−200
−200
−300
−250
−400
−300
AIC
BIC
−500
−350
−600
−400
Index Index
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 0.60696 0.96263 0.58212 0.63603 0.21744 0.68105 0.76318 0.14181 0.00000 0.00000 0.00000 0
## V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25
29
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
## 0.00000 -0.00094 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -0.00220 0.00000 0.00000 0
## V75 V76 V77 V78 V79 V80 V81 V82 V83 V84 V85
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V90 V91 V92 V93 V94 V95 V96 V97 V98 V99 V100
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V105 V106 V107 V108 V109 V110 V111 V112 V113 V114 V115
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V120 V121 V122 V123 V124 V125 V126 V127 V128 V129 V130
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V135 V136 V137 V138 V139 V140 V141 V142 V143 V144 V145
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V150 V151 V152 V153 V154 V155 V156 V157 V158 V159 V160
## -0.00281 0.00000 0.00000 0.00000 0.00000 0.00000 0.01067 0.00000 0.00000 0.00000 0.00000 0
## V165 V166 V167 V168 V169 V170 V171 V172 V173 V174 V175
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V180 V181 V182 V183 V184 V185 V186 V187 V188 V189 V190
## 0.00000 0.00000 0.00000 0.00000 0.00348 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0
## V195 V196 V197 V198 V199 V200
## 0.00000 0.00000 0.00000 -0.00636 0.00000 0.00000
Appendix
# Computing AIC/BIC
AIC = NULL; BIC = NULL; BIC_fit = NULL; BIC_pen = NULL
for (i in 1:length(lam_Lasso)){
AIC=c(AIC, n*log(sum((y - X%*% lasso_beta[,i])ˆ2)/n)+
2*sum(abs(lasso_beta[2:(p+1),i])>0.00001))
BIC=c(BIC, n*log(sum((y - X%*% lasso_beta[,i])ˆ2)/n)+
log(n)*sum(abs(lasso_beta[2:(p+1),i])>0.00001))
BIC_fit=c(BIC_fit,n*log(sum((y-X%*%lasso_beta[,i])ˆ2)/n))
BIC_pen=c(BIC_pen, log(n)*
sum(abs(lasso_beta[2:(p+1),i])>0.00001))}
AIC = rev(AIC); BIC = rev(BIC)
BIC_fit = rev(BIC_fit); BIC_pen = rev(BIC_pen)
par(mfrow = c(1,2))
plot(AIC); plot(BIC)
ind_B = which(BIC==min(BIC))
round(lasso_beta[,ncol(lasso_beta)-ind_B+1],5)
30
Selecting model using AIC/BIC via simulation model (p>n and sparse case)
BIC plot
200
Value
0
−400
−2L (fit)
Penalty(simplicity)
−800
BIC
λ
Appendix
# Drawing BIC plot
names = "BIC plot" ;
lam = lam_Lasso
par(mfrow=c(1,1))
plot(lam, BIC_fit,type = "l",
xlim=range(lam),lwd=2,main=names,ylim=
c(min(BIC_fit), max(BIC_pen)),
xlab=expression(lambda),ylab="Value",cex.lab=2)
31