Unit-2 L4 (2)

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Unit-2 L4 (2)

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

REGULARIZATION

SPARSE REPRESENTATIONS
SPARSE REPRESENTATIONS
• Direct and Indirect Penalties
• Direct Penalty
• Weight decay penalizes parameters directly
• L1 penalization induces sparse parameterization
• Indirect Penalty
• Another strategy is to place penalty on the
activations of the units in the neural network
• Encouraging their activations to be sparse
• It imposes a complicated penalty on model
parameters
• Representational sparsity describes a
representation where many of the elements of
SPARSE REPRESENTATIONS
• Direct versus Representational Sparsity
• Parameter regularization, with W=A

• Representational regularization, with W=B

SPARSE REPRESENTATIONS
• Representational Regularization
• Accomplished using the same sort of mechanisms used in parameter
regularization
• Norm penalty regularization of representation
• Performed by adding to the loss function J, a norm penalty on the
representation.
SPARSE REPRESENTATIONS
• Placing constraint on Activation Values
• Another approach to representational sparsity:
• place a hard constraint on activation values
• Called Orthogonal matching pursuit (OMP)
• Encode x with h that solves constrained
optimization:
• where ||h||0 is the number of zero entries of h
• The problem is solved efficiently when W is orthogonal
• Often called OMP-k, where k is no. of zero entries
• Essentially, any model with hidden units can be
made sparse:
REGULARIZATION
BAGGING AND OTHER ENSEMBLE METHODS
BAGGING AND OTHER ENSEMBLE METHODS
• What is bagging?
• It is short for Bootstrap Aggregating
• It is a technique for reducing generalization error
by combining several models
• Idea is to train several models separately, then have all
the models vote on the output for test examples
• This strategy is called model averaging
• Techniques employing this strategy are known as
ensemble methods
• Model averaging works because different models
will not make the same mistake
BAGGING AND OTHER ENSEMBLE METHODS
• Ex: Ensemble error rate
• Consider set of k regression models
• Each model makes error εi on each example, i=1,..N
• Errors drawn from a zero-mean multivariate normal
with variance E[εi 2]=v and covariance E[εiεj ]=c
• Error of average prediction of all ensemble models:
• Expected squared error of ensemble prediction is

• If errors are perfectly correlated, c=v, and mean

squared error reduces to v, so model averaging does
not help
• If errors are perfectly uncorrelated and c=0, expected
squared error of ensemble is only v/k
BAGGING AND OTHER ENSEMBLE METHODS
• Ensemble vs Bagging
• Different ensemble methods construct the ensemble of
models in different ways
• Ex: each member of ensemble could be formed by training a
completely different kind of model using a different algorithm or
objective function
• Bagging is a method that allows the same kind of model,
training algorithm, and objective function to be reused
several times
• The Bagging Technique
• Given training set D of size N, generate k data sets of
same no of examples as original by sampling with
replacement
• Some observations may be repeated in each Di the rest being
BAGGING AND OTHER ENSEMBLE METHODS
• Example of Bagging Principle
• Task of training an 8 detector
• Bagging training procedure
• make different data sets by resampling the given data
set

• Each detector is brittle. Their average is robust achieving

BAGGING AND OTHER ENSEMBLE METHODS
•Neural nets and bagging
• Neural nets reach a wide variety of solution points
• Thus they benefit from model averaging when
trained on the same dataset
• Differences in:
• random initializations
• random selection of minibatches, in hyperparameters,
• cause different members of the ensemble to make
partially independent errors
BAGGING AND OTHER ENSEMBLE METHODS
• Model averaging is powerful
• Model averaging is a reliable method for reducing
generalization error
• Machine learning contests are usually won by model averaging
over dozens of models
• Since model averaging performance comes at the
expense of increased computation and memory,
benchmark comparisons are made using a single model
• Boosting
• Incrementally adding models to the ensemble
• Has been applied to ensembles of neural networks, by
incrementally adding neural networks to the ensemble
• Also interpreting a neural network as an ensemble,
incrementally adding hidden units to the network
REGULARIZATION
DROPOUT
DROPOUT

• Regularization with unlimited computation

• Best way to regularize a fixed size model is:
• Average the predictions of all possible settings of the parameters
• Weighting each setting with the posterior probability given the
training data
• This would be the Bayesian approach
• Dropout does this using considerably less computation
• By approximating an equally weighted geometric mean of the
predictions of an exponential number of learned models that share
parameters
• Dropout is a bagging method
• Bagging is a method of averaging over several models to improve
generalization
• Impractical to train many neural networks since it is expensive in
DROPOUT
• Removing units creates networks
• Dropout trains an ensemble of all subnetworks
• Subnetworks formed by removing non-output units from an
underlying base network
• We can effectively remove units by multiplying its output
value by zero
• For networks based on performing a series of affine
transformations or on-linearities
• Needs some modification for radial basis functions based
on difference between unit state and a reference value
• Dropout Neural Net
• A simple way to prevent neural net overfitting
DROPOUT
• Dropout Neural Net
DROPOUT
• Performance with/without Dropout
DROPOUT
• Dropout as bagging
• In bagging we define k different models, construct k
different data sets by sampling from the dataset with
replacement, and train model i on dataset I
• Dropout aims to approximate this process, but with
an exponentially large no. of neural networks
DROPOUT
• Dropout as an ensemble method
DROPOUT
• Mask for dropout training
• To train with dropout we use minibatch based learning
algorithm that takes small steps such as SGD

• At each step randomly sample a binary mask

• Probability of including a unit is a hyperparameter
• 0.5 for hidden units and 0.8 for input units

• We run forward & backward propagation as usual

DROPOUT
• Forward Propagation with dropout
DROPOUT

• Formal description of dropout

• Suppose that mask vector μ specifies which units to
include
• The cost of the model is specified by J(θ,μ)

• Drop training consists of minimizing Eμ(J(θ,μ))

• The expected value contains an exponential no. of

terms
DROPOUT
• Bagging training vs Dropout training
• Dropout training not same as bagging training
• In bagging, the models are all independent
• In dropout, models share parameters
• Models inherit subsets of parameters from parent network
• Parameter sharing allows an exponential no. of models with a
tractable amount of memory
• In bagging each model is trained to convergence on its
respective training set
• In dropout, most models are not explicitly trained
• Fraction of sub-networks are trained for a single step
• Parameter sharing allows good parameter settings

5a Fe Ecu Wiring Diagrams n8zc8
92% (13)
5a Fe Ecu Wiring Diagrams n8zc8
6 pages
Design and Analysis of Electric Vehicle Gearbox and Differential
100% (1)
Design and Analysis of Electric Vehicle Gearbox and Differential
8 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
05. Chap 7-2 Regularization for Deep Learning-Hyun-Lim Yang
No ratings yet
05. Chap 7-2 Regularization for Deep Learning-Hyun-Lim Yang
49 pages
B43 Exp4 ML
No ratings yet
B43 Exp4 ML
6 pages
1729585037_ML11_Generalization
No ratings yet
1729585037_ML11_Generalization
40 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Unit-2 L2 (3)
No ratings yet
Unit-2 L2 (3)
22 pages
MLquestions
No ratings yet
MLquestions
26 pages
Unit 2
No ratings yet
Unit 2
37 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Ensemble Learning-Bagging-Boosting-Stacking
No ratings yet
Ensemble Learning-Bagging-Boosting-Stacking
12 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
UNIT3_class
No ratings yet
UNIT3_class
30 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
UNIT IV
No ratings yet
UNIT IV
18 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
5 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
AIML UNIT 4
No ratings yet
AIML UNIT 4
26 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
boosting
No ratings yet
boosting
28 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
AN2DL_03_2324_NeuralNetwroksTraining
No ratings yet
AN2DL_03_2324_NeuralNetwroksTraining
40 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
UNIT-5 ML notes
No ratings yet
UNIT-5 ML notes
24 pages
ML Concepts
No ratings yet
ML Concepts
3 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble TBL Notes
No ratings yet
Ensemble TBL Notes
2 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Unit 3
No ratings yet
Unit 3
110 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Random Forest
No ratings yet
Random Forest
20 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Unit – IV
No ratings yet
Unit – IV
24 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Regularization Slides (2)
No ratings yet
Regularization Slides (2)
50 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
ML Module 5 2022 PDF
100% (2)
ML Module 5 2022 PDF
31 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Cornell CS578: Bagging and Boosting
No ratings yet
Cornell CS578: Bagging and Boosting
10 pages
AI - W7L14
No ratings yet
AI - W7L14
22 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
UMl - unit 3
No ratings yet
UMl - unit 3
50 pages
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
No ratings yet
Ensemble Learning Helps Improve Machine Learning Results by Combining Several Models
2 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
English Quiz Hawthorn English
No ratings yet
English Quiz Hawthorn English
4 pages
Bharathidasan University, Tiruchirappalli - 620 024: University Regular Programmes ADMISSION 2022-23
No ratings yet
Bharathidasan University, Tiruchirappalli - 620 024: University Regular Programmes ADMISSION 2022-23
34 pages
User Requirements Analysis
No ratings yet
User Requirements Analysis
27 pages
Body Area Networks Smart IoT and Big Data For Intelligent Health
No ratings yet
Body Area Networks Smart IoT and Big Data For Intelligent Health
234 pages
BSNL Project - Sai PDF
No ratings yet
BSNL Project - Sai PDF
98 pages
MEHDIABASI171
100% (1)
MEHDIABASI171
18 pages
DX Diag
No ratings yet
DX Diag
34 pages
Weller Portasol P-1K English
No ratings yet
Weller Portasol P-1K English
3 pages
Industrial Process Safety Management: School of Science & Engineering Technology
No ratings yet
Industrial Process Safety Management: School of Science & Engineering Technology
14 pages
15 Herramientas para Productividad Online
No ratings yet
15 Herramientas para Productividad Online
5 pages
Taimur Ijlal - Profile
No ratings yet
Taimur Ijlal - Profile
28 pages
Idea Validation and Prototyping
No ratings yet
Idea Validation and Prototyping
2 pages
National Food & Nutrition Security Policy 2020
No ratings yet
National Food & Nutrition Security Policy 2020
30 pages
MELAG
No ratings yet
MELAG
64 pages
Cybersecurity for Artificial Intelligence Mark Stamp - Read the ebook online or download it to own the full content
100% (1)
Cybersecurity for Artificial Intelligence Mark Stamp - Read the ebook online or download it to own the full content
73 pages
Approved "Practice" and "Studies" Courses: VMS Concentration in Cinematic Arts
No ratings yet
Approved "Practice" and "Studies" Courses: VMS Concentration in Cinematic Arts
5 pages
HITE EUROPE Company (2024.07)
No ratings yet
HITE EUROPE Company (2024.07)
25 pages
Untitled
No ratings yet
Untitled
35 pages
Títulos de Ensayos Sobre Tecnología
No ratings yet
Títulos de Ensayos Sobre Tecnología
6 pages
Cybersecurity Domainsv 2dot0
No ratings yet
Cybersecurity Domainsv 2dot0
1 page
DGMTR Instruction Manual
No ratings yet
DGMTR Instruction Manual
14 pages
Research Outline
No ratings yet
Research Outline
2 pages
How To Connect Two Mikrotik SXT 5D S in Bridge Mode Part I
No ratings yet
How To Connect Two Mikrotik SXT 5D S in Bridge Mode Part I
9 pages
Rohit Arora Specimen File
No ratings yet
Rohit Arora Specimen File
5 pages
2 - Review of Lietrature
No ratings yet
2 - Review of Lietrature
27 pages
757735403_Copy_of_The_Crypto_Insider_s_Guide_to_Finding_100x_Coins
No ratings yet
757735403_Copy_of_The_Crypto_Insider_s_Guide_to_Finding_100x_Coins
13 pages
Data Sheet
No ratings yet
Data Sheet
2 pages
Department of Civil Engineering
No ratings yet
Department of Civil Engineering
2 pages

Unit-2 L4 (2)

Uploaded by

Unit-2 L4 (2)

Uploaded by

REGULARIZATION

• Representational regularization, with W=B

• If errors are perfectly correlated, c=v, and mean

• Each detector is brittle. Their average is robust achieving

• Regularization with unlimited computation

• At each step randomly sample a binary mask

• We run forward & backward propagation as usual

• Formal description of dropout

• Drop training consists of minimizing Eμ(J(θ,μ))

• The expected value contains an exponential no. of

You might also like