0% found this document useful (0 votes)

45 views

Module - 2 Ver 1.4

Regularization techniques are used in machine learning to reduce overfitting and improve generalization. Some common regularization strategies include parameter norm penalties, data augmentation, and early stopping. The goal is to find a balanced model that has low bias and low variance by reducing model flexibility for overfitted models and increasing flexibility for underfitted models. Regularization terms like L1 and L2 norms are added to the objective function to penalize high parameter values and improve generalization to new data.

Uploaded by

Pranav B

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Module - 2 Ver 1.4

Uploaded by

Pranav B

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Regularization for Deep Learning

Module - 2
Regularization: Definition 2

• Algorithm should perform well on

new inputs, not just on training
data.
• Strategies are used in ML to reduce
test error, by allowing increase in
training error.
• Such strategies collectively are
known as REGULARIZATION
Accuracy of each models with respect to:
• Low bias and low variance is the
goal. Training Data Training Data Training Data
Test Data  Test Data Test Data
Regularization Strategies 3

1. Parameter Norm Penalties 8. Early Stopping

2. Norm Penalties as 9. Parameter tying and
Constrained Optimization parameter sharing
3. Regularization and Under- 10.Sparse representations
constrained Problems 11.Bagging and other ensemble
4. Data Set Augmentation methods
5. Noise Robustness 12.Dropout
6. Semi-supervised learning 13.Adversarial training
7. Multi-task learning 14. Tangent methods
Bias and errors 4

1. Bias is the difference between the predicted value and the expected/true value.
2. The model makes certain assumptions about the data to make the target function simple,
but those assumptions may not always be correct.
3. A high bias model makes more assumptions about the target function.
4. High bias can cause an algorithm to miss the correct relationship between features and the
target output (underfitting).
5. The bias error is the error due to wrong/in-accurate assumptions that the learning
algorithm makes during training.
6. Zero bias may sound good as the model perfectly fits the training data, but this means that
the model has learned too much from the training data, it is called overfitting and the
model will not be able to do a good job with the new/testing data.
Variance and errors 5

• The Variance is when the model considers the fluctuations/noise

in the data during training.
• Variance is the error due to sensitivity to small fluctuations in the
dataset.
• A high variance model learns too much from the data as it still
considers the noise as something to learn from, as a result, it
becomes very sensitive to any small fluctuation, and it overfits
the training data.
• High variance can cause an algorithm to model the random noise
in the training data, rather than the intended outcome.
Bias and Variance Trade-off 6
Generalization Error & Regularization 7

• Generalization error (also known as the out-of-sample error or

the risk) is a measure of how accurately an algorithm can predict
outcome values for previously unseen data.

• Regularization = any modification made to a learning algorithm

that is intended to reduce its generalization error but not its
training error.
Regularization strategies 8

• Option-1: put extra constraints on a machine learning model,

such as adding restrictions on the parameter values.
• Option-2: add extra terms in the objective function that can be
thought of as corresponding to a soft constraint on the parameter
values.
• Other forms of Regularizations: ensemble methods, dropout etc.
Strategies to create large, deep, regularized
model for deep learning. 9

1. Parameter Norm Penalties: Penalizing the estimation using an

extra parameter in the error function.
2. L1 and L2 regularizations are some of the techniques used to
address the overfitting issues.
Context of the example. 10

• Trying to predict the number of matches won based on age.

• Underfit Overfit
Balanced fit 11
Improving the performance of model-1 12

Scenario-1: The model’s poor performance on the training data

could be because the model is too simple.
• Increase the model flexibility.
• Add new domain-specific features.
• Decrease the amount of regularization used.
Improving the performance of model-2 13

Scenario-2: The model is overfitting the training data; reduce

model flexibility.
• Reduce the model flexibility.
• Consider using fewer feature combinations.
• Increase the amount of regularization used.
Reduce the overfitting issue 14
Error computation 15

• Using mean squared error function.

Substituting for the predicted value 16

The predicted value is a

higher-order polynomial,
and this varies depending
on the problem domain.

X1 and X2 represents the

age of a person in the
given example.
Objective of the regularization 17

• To minimize the error in each iteration.

• L2 and L1 regularizations are used.
L2 Regularization 18

• Add a new parameter to penalize the model heavily.

• Penalize higher values of theta; which will make the error bigger every time the theta
gets bigger.
L1 Regularization 19

• Consider the absolute value of theta in the added parameter.

Norm Penalties and Norm? 20

• The norm is a quantity which describes the size of a vector.

• When a vector is stretched, the norm is multiplied by the stretching factor
• The norm of the sum of two vectors is less than or equal to the sum of the
norm of each individually
• The norm can never be negative
• The zero vector has norm 0
L2 Norm, L1 Norm etc… 21

• The L2 norm is the most common norm function in machine learning. Its
definition is the same as the Euclidean distance formula between the endpoint
of the vector and the origin:

• Commonly used L1 norm is simply the sum of the elements of the vector:

• In machine learning, norms are used for:

• Defining a loss function in terms of the magnitude of the distance between predicted and
actual points
• Defining a regularization term which includes the magnitude of the weights, to
encourage small weights
Norm Penalties as Constrained Optimization 22
Generalized Lagrange Function 23

• The constrained Optimization Problem requires us to minimize the

function while ensuring the point discovered belongs to the
feasible set.

Original Function Arbitrary constant and Arbitrary constant and

equality Function inequality Function

Solution to Generalized
Lagrange Function
Insight into the effect of constraint 24

• Constraining the norm of each layer separately prevents any one

hidden unit from having very large weights.
• When using high learning rates, it is possible to encounter a
positive feedback loop in which large weights induce large
gradients which then induce a large update to the weights.
Underconstrained Problems 25

• Many linear models in ML depends on inverting the sample space

dimension matrix to solve regression problems.

• Inverting a singular matrix will not help in solving regression in

linear algebra.
Solution for underconstrained problems 26

• Weight decay
• Weight decay is a regularization technique of adding a small penalty,
usually the L2 norm of the weights (all the weights of the model), to the
loss function.
loss = loss + weight decay parameter * L2 norm of the weights
Loss = MSE(y_hat, y) + wd * sum(w^2)
• Regularization help in stopping the iteration when slope of likelihood
equals weight decay coefficient.
Regularization in linear algebra problems 27

• We can solve underdetermined linear equations using Moore

Penrose pseudoinverse
Dataset Augmentation 28

• The best way to make a machine learning model generalize better

is to train it on more data.
• In practice, sample data is limited.
• One way to get around this is – CREATE FAKE DATA (data synthesis)
and ADD IT TO DATA SET = Data Augmentation
Augmentation scenarios 29

• Data augmentation for classification => Generate new sample (x,y)

by transforming inputs.
• Not suitable for density estimation (without solving prior to
synthesis)
• Effective for object recognition, speech recognition.
• Injecting noise in the input for neural network is also a form of
augmentation.
Dataset augmentation sample 30
Noise Robustness 31

• Noise applied at inputs

• Noise applied to weights
• Injecting noise at the output targets
Noise injection is powerful 32

• Noise applied to inputs is a data augmentation

• For some models addition of noise with infinitesimal variance at the input is
equivalent to imposing a penalty on the norm of the weights.
• Noise applied to hidden units
– Noise injection can be much more powerful than simply shrinking the
parameters
– Noise applied to hidden units is so important that it merits its own separate
discussion
• Dropout is the main development of this approach
Adding Noise to Weights 33

•This technique primarily used with RNNs (Recurrent neural network)

•This can be interpreted as a stochastic implementation of Bayesian
inference over the weights
•Bayesian learning considers model weights to be uncertain and representable
via a probability distribution p(w) that reflects that uncertainty
• This can be seen in a regression setting for labelled data set.
Adding noise to output units 34

• Most datasets have some mistakes in labels, this will maximize the
probability prediction.
• To prevent noise is explicitly labelled in the model.
• Local smoothing is a mechanism to regularize a model based on
softmax
Questions? 35
NEXT CLASS:
Semi-Supervised Learning, Multi-Task Learning

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Predictive Modelling - Final Project Report-Logistic Regression and LDA
100% (1)
Predictive Modelling - Final Project Report-Logistic Regression and LDA
25 pages
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
No ratings yet
Deployment: Cheat Sheet: Machine Learning With KNIME Analytics Platform
2 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
unit4
No ratings yet
unit4
93 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Regularization
No ratings yet
Regularization
46 pages
Unit-2 L1 (3)
No ratings yet
Unit-2 L1 (3)
23 pages
DL Class3
No ratings yet
DL Class3
28 pages
DL+lect+7 (1)
No ratings yet
DL+lect+7 (1)
15 pages
Machine Learning by Tom Mitchell - Definitions
No ratings yet
Machine Learning by Tom Mitchell - Definitions
12 pages
Unit 4
No ratings yet
Unit 4
35 pages
Variance and Bias
No ratings yet
Variance and Bias
14 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Chapter III - Supervised and Unsupervised Algorithms
No ratings yet
Chapter III - Supervised and Unsupervised Algorithms
122 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
No ratings yet
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
5 pages
2. Linear Regression, Polynomical, Gradiant Descent
No ratings yet
2. Linear Regression, Polynomical, Gradiant Descent
42 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
Module 3-DL
No ratings yet
Module 3-DL
12 pages
DSS08 - CLS-ANN, SVM, Ensemble-Vn
No ratings yet
DSS08 - CLS-ANN, SVM, Ensemble-Vn
44 pages
ML-1
No ratings yet
ML-1
24 pages
Lec 2 Basics of machine learning (1)
No ratings yet
Lec 2 Basics of machine learning (1)
35 pages
Module3_notes
No ratings yet
Module3_notes
18 pages
Data Science Concepts Overfitting Underfitting
No ratings yet
Data Science Concepts Overfitting Underfitting
8 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
unit-online-1.3
No ratings yet
unit-online-1.3
21 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Lecture 7
No ratings yet
Lecture 7
19 pages
DL_Unit-3
No ratings yet
DL_Unit-3
56 pages
Regularization
No ratings yet
Regularization
2 pages
Deep Learning Module 3-1
No ratings yet
Deep Learning Module 3-1
31 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
DL_Unit1 (1)
No ratings yet
DL_Unit1 (1)
79 pages
Data Preparation For ML in Practice v213
No ratings yet
Data Preparation For ML in Practice v213
78 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
All DL
No ratings yet
All DL
72 pages
204 DS Notes Unit 2
No ratings yet
204 DS Notes Unit 2
4 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
Deep Learning Module-03
No ratings yet
Deep Learning Module-03
20 pages
DL 4
No ratings yet
DL 4
15 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
CHP 3
No ratings yet
CHP 3
70 pages
Regularization
No ratings yet
Regularization
18 pages
ML Final Notes Unit 4,5 Rishi
No ratings yet
ML Final Notes Unit 4,5 Rishi
45 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Lecture 10_04.09.2024_Regression-02 Lecture Slides
No ratings yet
Lecture 10_04.09.2024_Regression-02 Lecture Slides
61 pages
NN 08
No ratings yet
NN 08
36 pages
BiasVariance
No ratings yet
BiasVariance
14 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
Lec 3
No ratings yet
Lec 3
13 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Machine Learning For The Recognition of Emotion in
No ratings yet
Machine Learning For The Recognition of Emotion in
25 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Predicting Employee Churn in Python
100% (1)
Predicting Employee Churn in Python
19 pages
Hands-On Learning With KubeFlow + Keras - TensorFlow 2.0 + TF Extended
No ratings yet
Hands-On Learning With KubeFlow + Keras - TensorFlow 2.0 + TF Extended
1 page
CS583 Chapter 4 Supervised Learning
No ratings yet
CS583 Chapter 4 Supervised Learning
166 pages
Data Mining Model For Predicting Student Enrolment
No ratings yet
Data Mining Model For Predicting Student Enrolment
8 pages
07 Dr. S. Anitha
No ratings yet
07 Dr. S. Anitha
9 pages
Soal UTS Statistik Multivariat
No ratings yet
Soal UTS Statistik Multivariat
23 pages
Job Skills Thesis
No ratings yet
Job Skills Thesis
5 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Von Mises-Fisher Mixture Model-Based Deep Learning: Application To Face Verification
No ratings yet
Von Mises-Fisher Mixture Model-Based Deep Learning: Application To Face Verification
16 pages
'Universalbank - CSV': #Reading The File
No ratings yet
'Universalbank - CSV': #Reading The File
4 pages
Real-Time Convolutional Neural Networks For Emotion and Gender Classification
No ratings yet
Real-Time Convolutional Neural Networks For Emotion and Gender Classification
5 pages
Xor Gate: Perceptron Can Not Realize An XOR Gate. We Need More Complex Network or Use Different Transfer Functions
No ratings yet
Xor Gate: Perceptron Can Not Realize An XOR Gate. We Need More Complex Network or Use Different Transfer Functions
12 pages
5 Pre Diagnosis of Hypertension Using Artificial Neural Network PDF
No ratings yet
5 Pre Diagnosis of Hypertension Using Artificial Neural Network PDF
7 pages
Targeted Marketing PDF
No ratings yet
Targeted Marketing PDF
10 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Federated Learning: Strategies For Improving Communication Efficiency
No ratings yet
Federated Learning: Strategies For Improving Communication Efficiency
5 pages
Multi-Step-Ahead Prediction With Neural Networks
No ratings yet
Multi-Step-Ahead Prediction With Neural Networks
12 pages
Sample Project Report
No ratings yet
Sample Project Report
52 pages
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
No ratings yet
Course Logistics and Introduction: CS771: Introduction To Machine Learning Piyush Rai
23 pages
Spam Detection in Online Social Networks
No ratings yet
Spam Detection in Online Social Networks
14 pages
Red Wine Quality Detection
No ratings yet
Red Wine Quality Detection
17 pages
Arti Ficial Intelligence Exploitation in Facility Management Using Deep Learning
No ratings yet
Arti Ficial Intelligence Exploitation in Facility Management Using Deep Learning
16 pages
Android Skin Cancer Detection and Classification B
No ratings yet
Android Skin Cancer Detection and Classification B
14 pages
Wild Facial Expression Recognition
No ratings yet
Wild Facial Expression Recognition
11 pages
Disease Prediction Using Deep Learning
No ratings yet
Disease Prediction Using Deep Learning
25 pages
Bachelor's Final Project - Investigating Artificial Intelligence Applied To Robotics
No ratings yet
Bachelor's Final Project - Investigating Artificial Intelligence Applied To Robotics
72 pages