0% found this document useful (0 votes)

21 views10 pages

Lect 6

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

Lect 6

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 6: Regression continued

C4B Machine Learning Hilary 2011 A. Zisserman

• Lasso
• L1 regularization
• other regularizers

• SVM regression
• epsilon-insensitive loss

• More loss functions

Regression
y

• Suppose we are given a training set of N observations

((x1, y1), . . . , (xN , yN )) with xi ∈ Rd, yi ∈ R

• The regression problem is to estimate f (x) from this data

such that
yi = f (xi)
Regression cost functions
Minimize with respect to w
N
X
l (f (xi, w), yi) + λR (w)
i=1
loss function regularization

• There is a choice of both loss functions and regularization

• So far we have seen – “ridge” regression
N
X
• squared loss: (yi − f (xi, w))2
i=1

• squared regularizer: λkwk2

• Now, consider other losses and regularizers

The “Lasso” or L1 norm regularization

• LASSO = Least Absolute Shrinkage and Selection

Minimize with respect to w ∈ Rd

N
X d
X
2
(yi − f (xi, w)) + λ |wj |
i=1 j

loss function regularization

• This is a quadratic optimization problem

• There is a unique solution ⎛ ⎞1
d
X p
• p-Norm definition: k w kp = ⎝ |wi|p⎠
j=1
Sparsity property of the Lasso
• contour plots for d = 2
N
X
(yi − f (xi, w))2
i=1

d
λkwk2 λ
X
|wj |
ridge regression lasso j

• Minimum where loss contours tangent to regularizer’s

• For the lasso case, minima occur at “corners”
• Consequently one of the weights is zero
• In high dimensions many weights can be zero
Example: Lasso for polynomial basis functions regression
ideal fit
• The red curve is the true function 1.5
Sample points

(which is not a polynomial) 1

Ideal fit

0.5

• The data points are samples from the

curve with added noise in y. 0

y
-0.5

• N = 9, M = 7 -1

M
X -1.5
j > 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
f (x, w) = wj x = w Φ(x) x

j=0
w is a M+1
dimensional vector

ridge regression

lasso

Variation of weights with lambda

100
ridge regression 500
lasso
400
50
300

0 200
wj

100
wj

-50
0

-100
-100

-200
-150

-300

-8 -7 -6 -5 -4 -3 -2
10 10 10 10 10 10 10 -8 -7 -6 -5 -4 -3 -2
10 10 10 10 10 10 10
Variation of weightslog λ lambda
with
with λlambda
Variation of weights log
30
200

detail detail
20 150

100
10

50
wj

0
wj

-10
-50

-20 -100

-150
-5 -4 -7 -6 -5
10 10 10 10 10
log λ log λ
Second example – lasso in action
1.5

0.5
weights

−0.5

−1
0 0.5 1 1.5
regularization parameter λ

Sparse weight vectors

• Weights being zero is a method of “feature selection” –

zeroing out the unimportant features

• The SVM classifier also has this property (sparse alpha in

the dual representation)

• Ridge regression does not

• AdaBoost achieves feature selection by a different,

greedy approach
Other regularizers
N
X d
X
2
(yi − f (xi, w)) + λ |wj |q
i=1 j

• For q ≥ 1, the cost function is convex and has a unique minimum.

The solution can be obtained by quadratic optimization.

• For q < 1, the problem is not convex, and obtaining the global
minimum is more diﬃcult
SVMs for Regression
Use ε-insensitive error measure square
( loss
0 if |r| ≤ ε
Vε(r) =
|r| − ε otherwise. Vε(r)
This can also be written as

Vε(r) = (|r| − ε)+

r
where ()+ indicates the positive part of (.).
Or equivalently as
cost is zero inside epsilon “tube”
Vε(r) = max ((|r| − ε), 0)

loss function regularization

• As before, introduce slack variables for

cost is zero inside epsilon “tube”
points that violate ε-insensitive error.

• For each data point, xi, two slack vari-

ables, ξi, ξbi, are required (depending on
whether f (xi) is above or below the tube)

• Learning is by the optimization

N ³
X ´ 1
min C ξi + ξbi + ||w||2
w∈Rd, ξi, ξbi i 2
subject to

yi ≤ f (xi, w)+ε+ξi, yi ≥ f (xi, w)−ε−ξbi, ξi ≥ 0, ξbi ≥ 0 for i = 1 . . . N

• Again, this is a quadratic programming problem

• It can be dualized
• Some of the data points will become support vectors
• It can be kernelized
Example: SV regression with Gaussian basis functions
ideal fit
1.5

• The red curve is the true function Sample points

Ideal fit

(which is not a polynomial) 1

0.5

• Regression function – Gaussians 0

y
centred on data points
-0.5

• Parameters are: C, epsilon, sigma -1

-1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

N
X 2
/σ 2
f (x, w) = wi e−(x−xi ) = w> Φ(x)
i=1

Φ : x → Φ(x) R → RN w is a N-vector

1.5 1.5
Sample points Sample points
Ideal fit Validation set fit
1 1 Support vectors

0.5 0.5

0 0
y

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

epsilon = 0.01

• Validation set fit is a search

over both C and sigma
epsilon = 0.5 epsilon = 0.8
1.5 1.5
Sample points Sample points
Validation set fit Validation set fit
1 Support vectors 1 Support vectors

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

As epsilon increases:
• fit becomes looser
• less data points are support vectors

Loss functions for regression

• quadratic (square) loss `(y, f (x)) = 1
2 (y − f (x))
2

• ε-insensitive loss `(y, f (x)) = max ((|r| − ε), 0)

• Hüber loss (mixed quadratic/linear): robustness to outliers:

`(y, f (x)) = h(y − f (x))
(
r2 if |r| ≤ c 4 square
h(r) = 2
2c|r| − c otherwise. ε−insensitive
Huber
3
• all of these are convex
2

0
−3 −2 −1 0 1 2 3
y−f(x)
Final notes on cost functions

Regressors and classifiers can be constructed by a “mix ‘n’ match” of loss

functions and regularizers to obtain a learning machine suited to a
particular application. e.g. for a classifier f (x) = w>x + b
• L1 Logistic regression
N
X ³ ´
min log 1 + e−yif (xi) + λ||w||1
w∈Rd i

• L1—SVM
N
X
min max (0, 1 − yif (xi)) + λ||w||1
w∈Rd i

• Least squares SVM

N
X
min [max (0, 1 − yif (xi))]2 + λ||w||2
w∈Rd i

Background reading

• Bishop, chapters 3.1 & 7.1.4

• Hastie et al, chapters 3.4 & 12.3.5

• More on web page:

[Link]

Support Vector Regression Techniques
No ratings yet
Support Vector Regression Techniques
7 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
445 Lecture 7
No ratings yet
445 Lecture 7
30 pages
Group 30
No ratings yet
Group 30
33 pages
01 Lecturenote SRM
No ratings yet
01 Lecturenote SRM
9 pages
Module 3
No ratings yet
Module 3
35 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
01 Lecturenote SRM
No ratings yet
01 Lecturenote SRM
9 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
CS2011 2
No ratings yet
CS2011 2
14 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
38 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
34 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Least Squares
No ratings yet
Least Squares
12 pages
6 Complexity
No ratings yet
6 Complexity
22 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
ML Module III
No ratings yet
ML Module III
64 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
04 Linear
No ratings yet
04 Linear
31 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
EECS 16B: Linear Regression Overview
No ratings yet
EECS 16B: Linear Regression Overview
54 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
SLChapter 5
No ratings yet
SLChapter 5
16 pages
Regularization
No ratings yet
Regularization
3 pages
ML - Perplexity
No ratings yet
ML - Perplexity
71 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
10 - Linear Regression-Problems and Solutions
No ratings yet
10 - Linear Regression-Problems and Solutions
23 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
A Convenient Approach For Penalty Parameter Selection in Robust Lasso Regression
No ratings yet
A Convenient Approach For Penalty Parameter Selection in Robust Lasso Regression
12 pages
4lasso and Friends
No ratings yet
4lasso and Friends
36 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Empirical Risk Minimization Guide
No ratings yet
Empirical Risk Minimization Guide
6 pages
Machine Learning PPT Part II
No ratings yet
Machine Learning PPT Part II
56 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Lasso NIPS
No ratings yet
Lasso NIPS
8 pages
Day 1
No ratings yet
Day 1
41 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Lecture 1.5-1.6
No ratings yet
Lecture 1.5-1.6
23 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
L3 Linear Regression
No ratings yet
L3 Linear Regression
23 pages
Unit 2
No ratings yet
Unit 2
92 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
Slides 2
No ratings yet
Slides 2
27 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
Regularization & Generalized Linear Models
No ratings yet
Regularization & Generalized Linear Models
135 pages
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
No ratings yet
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
1 page
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
No ratings yet
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
15 pages
History and Basics of Cryptography
No ratings yet
History and Basics of Cryptography
1 page
Overview of the Waterfall Model
No ratings yet
Overview of the Waterfall Model
2 pages
Formulate A Project Strategy
No ratings yet
Formulate A Project Strategy
1 page
SDLC
No ratings yet
SDLC
1 page
Court Ruling on Transcept vs. Aguilar
No ratings yet
Court Ruling on Transcept vs. Aguilar
6 pages
Safety Officer Career Overview
No ratings yet
Safety Officer Career Overview
8 pages
(Ebook) See, Solve, Scale: How Anyone Can Turn an Unsolved Problem into a Breakthrough Success by Danny Warshay ISBN 9781250272317, 9781250272324, 9781250283733, 1250272319, 1250272327, 1250283736 online reading
No ratings yet
(Ebook) See, Solve, Scale: How Anyone Can Turn an Unsolved Problem into a Breakthrough Success by Danny Warshay ISBN 9781250272317, 9781250272324, 9781250283733, 1250272319, 1250272327, 1250283736 online reading
341 pages
PGPIFDM Prospectus
No ratings yet
PGPIFDM Prospectus
18 pages
TPCODL Tariff Notification 2022-23
No ratings yet
TPCODL Tariff Notification 2022-23
8 pages
Identify and Avoid Spam Texts
No ratings yet
Identify and Avoid Spam Texts
1 page
Crochet Pinafore Dress Pattern
100% (7)
Crochet Pinafore Dress Pattern
10 pages
Affidavit of Support
No ratings yet
Affidavit of Support
1 page
Chapter One Five 043313
No ratings yet
Chapter One Five 043313
60 pages
Configuration and User Guide For Mass Fixed Asset Retirement
No ratings yet
Configuration and User Guide For Mass Fixed Asset Retirement
25 pages
Intro To Environ Eng-CVEEN 3610 - Fall 2018-Goel PDF
No ratings yet
Intro To Environ Eng-CVEEN 3610 - Fall 2018-Goel PDF
4 pages
Whistleblower Case: Nunn v. Duke Power
No ratings yet
Whistleblower Case: Nunn v. Duke Power
7 pages
FR 2019 11 14
No ratings yet
FR 2019 11 14
620 pages
Gradall G-660 Upperstructure Manual
100% (1)
Gradall G-660 Upperstructure Manual
123 pages
B25D B30D MKIII Pneumatics
No ratings yet
B25D B30D MKIII Pneumatics
12 pages
IARP RSO GIC Application Form
No ratings yet
IARP RSO GIC Application Form
2 pages
Admission Test for MSc in Sectoral Management
No ratings yet
Admission Test for MSc in Sectoral Management
11 pages
Air Asia Flight Declaration Form
No ratings yet
Air Asia Flight Declaration Form
1 page
EGov User Manual
33% (6)
EGov User Manual
67 pages
New C3 Brochure
No ratings yet
New C3 Brochure
23 pages
Wicked - Act 1
No ratings yet
Wicked - Act 1
5 pages
TrioTIH5162C Screen Manual (SN 045 046) PDF
No ratings yet
TrioTIH5162C Screen Manual (SN 045 046) PDF
39 pages
EMI Unit-1
No ratings yet
EMI Unit-1
52 pages
Programmable Electric Pressure Cooker User Manual: Model: YBW40P, YBW50P, YBW60P YBW80P, YBW100P, YBW120P
No ratings yet
Programmable Electric Pressure Cooker User Manual: Model: YBW40P, YBW50P, YBW60P YBW80P, YBW100P, YBW120P
56 pages
Abesamis (Fs2-Le4)
No ratings yet
Abesamis (Fs2-Le4)
12 pages
Multi Angular Gearless Drive
No ratings yet
Multi Angular Gearless Drive
4 pages
Fringeworthy - Spelunking With Frank Lloyd Wright
No ratings yet
Fringeworthy - Spelunking With Frank Lloyd Wright
7 pages
Sample Paper English XII (Easy Concept)
No ratings yet
Sample Paper English XII (Easy Concept)
10 pages
Trip Sheet Displacement Summary
No ratings yet
Trip Sheet Displacement Summary
1 page
GrandTheftAutoV+20Tr-LNG - v1.03 - Rev4 - INFO
No ratings yet
GrandTheftAutoV+20Tr-LNG - v1.03 - Rev4 - INFO
6 pages

Lect 6

Uploaded by

Lect 6

Uploaded by

Lecture 6: Regression continued

C4B Machine Learning Hilary 2011 A. Zisserman

• More loss functions

• Suppose we are given a training set of N observations

((x1, y1), . . . , (xN , yN )) with xi ∈ Rd, yi ∈ R

• The regression problem is to estimate f (x) from this data

• There is a choice of both loss functions and regularization

• squared regularizer: λkwk2

The “Lasso” or L1 norm regularization

• LASSO = Least Absolute Shrinkage and Selection

Minimize with respect to w ∈ Rd

loss function regularization

• This is a quadratic optimization problem

• Minimum where loss contours tangent to regularizer’s

(which is not a polynomial) 1

• The data points are samples from the

Variation of weights with lambda

Sparse weight vectors

• Weights being zero is a method of “feature selection” –

• The SVM classifier also has this property (sparse alpha in

• Ridge regression does not

• AdaBoost achieves feature selection by a different,

• For q ≥ 1, the cost function is convex and has a unique minimum.

Vε(r) = (|r| − ε)+

loss function regularization

• As before, introduce slack variables for

• For each data point, xi, two slack vari-

• Learning is by the optimization

yi ≤ f (xi, w)+ε+ξi, yi ≥ f (xi, w)−ε−ξbi, ξi ≥ 0, ξbi ≥ 0 for i = 1 . . . N

• Again, this is a quadratic programming problem

• The red curve is the true function Sample points

(which is not a polynomial) 1

• Regression function – Gaussians 0

• Parameters are: C, epsilon, sigma -1

• Validation set fit is a search

Loss functions for regression

• ε-insensitive loss `(y, f (x)) = max ((|r| − ε), 0)

• Hüber loss (mixed quadratic/linear): robustness to outliers:

Regressors and classifiers can be constructed by a “mix ‘n’ match” of loss

• Least squares SVM

• Bishop, chapters 3.1 & 7.1.4

• Hastie et al, chapters 3.4 & 12.3.5

• More on web page:

You might also like