0% found this document useful (0 votes)
4 views

CNN Regularization

Uploaded by

gs2116060
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

CNN Regularization

Uploaded by

gs2116060
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CO 44251: DEEP LEARNING

Trishna Saikia
Pruning
• It is a technique used in deep learning to reduce the size of a neural network by eliminating less important units
(neurons) or connections between them, with the goal of improving efficiency and reducing overfitting.

• The term "pruning" refers to the process of selectively removing parts of a model that are considered less
significant to its performance.
Weight pruning

• Set individual weights in the weight matrix to zero. This


corresponds to deleting connections as in the figure.

• Here, to achieve sparsity of k% we rank the individual weights in


weight matrix W according to their magnitude, and then set to
zero the smallest k%.

Unit/Neuron pruning

• Set entire columns to zero in the weight matrix to zero, in effect For more details: https://towardsdatascience.com/pruning-deep-neural-network-
deleting the corresponding output neuron. 56cae1ec5505

• Here to achieve sparsity of k% we rank the columns of a weight


matrix according to their L2-norm and delete the smallest k%.
Stochastic pooling

• Stochastic pooling operates over local regions of the input feature map, just like other pooling methods.
• However, instead of deterministically selecting the maximum value (max pooling) or the average value (average
pooling), it selects a value based on a probability distribution derived from the values in the pooling region.
• This introduces a layer of randomness, acting as a regularizer and helping the network to avoid overfitting.

The Stochastic Pooling Process:

1. Pooling Region: Define a local region (e.g., 2x2 or 3x3) in the input feature map.
2. Probability Calculation: Sum all the values in the pooling region.
3. Divide each value by this sum to obtain the probability of selecting that value.
Benefits of Stochastic Pooling:

• Regularization: By introducing randomness, stochastic pooling serves as a form of regularization, reducing overfitting
and enhancing the generalization capability of the model.

• Robustness: It prevents the network from becoming overly reliant on specific features, making it more robust to
variations and noise in the input data.

• Feature Diversity: Stochastic pooling can capture a more diverse set of features compared to deterministic pooling
methods, potentially leading to richer representations.
Synthetic Data

• Synthetic data is essentially artificial data created algorithmically.


• It is designed to mimic the characteristics of real-world data without containing any actual information.
• Used widely in data science and machine learning, synthetic data enables algorithms to be tested and improved without
risking the privacy or security of real-world data.
• It can also be used to augment existing datasets, especially in cases where the original data is limited or biased.

Demo: https://openai.com/index/sora/
Early Stopping

It is a regularization technique for deep neural networks that stops


training when parameter updates no longer begin to yield improves on
a validation set.

Process:

• During training, a model iteratively updates its weights to minimize a loss function, improving its performance on the
training data.
• However, after a certain point, the model may start to "overfit," learning patterns that are specific to the training data and
not generalizable to new data.
• To track the model's generalization ability, the training process typically includes a validation set (a subset of data not used
for training).
• The performance of the model on the validation set is monitored after each epoch (a complete pass through the training
data).
• As training progresses, the model’s accuracy on the training data may continue to improve, but the validation accuracy might
plateau and then start to degrade. This indicates that the model is overfitting to the training data.
• Early stopping prevents this by stopping the training when the validation performance stops improving for a specified
number of epochs (patience). The model’s weights at the point of best validation performance are retained.
Weight Decay

• It is a regularization technique used in deep learning to prevent overfitting by penalizing large weights in the model.
• It works by adding a penalty term to the loss function during training, encouraging the model to learn smaller, more
generalized weights.
• This helps in improving the model’s generalization on unseen data.

❖ In weight decay, a regularization term is added to the original loss function (e.g., mean squared error or cross-entropy
loss). The modified loss function becomes:

𝐿𝑛𝑒𝑤 = 𝐿𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙​ + 𝜆∑𝑤𝑖2

Where,
➢ 𝐿𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙​​ is the original loss function.
➢ 𝑤𝑖 represents the model’s weights.
➢ 𝜆 is a hyperparameter that controls the strength of the regularization.
➢ The added term​𝜆∑𝑤𝑖2 encourages the weights 𝑤𝑖 to be smaller, penalizing large weight values.
Controlling Weight Decay:

• If 𝜆 is too small, the regularization effect will be negligible, and the model may overfit.
• If 𝜆 is too large, the weights will shrink too much, and the model might underfit (not learning the data well enough).
• A carefully tuned 𝜆 helps strike a balance between overfitting and underfitting.

Purpose of Weight Decay

• Prevents Overfitting: Models with very large weights tend to fit the training data too closely, leading to poor generalization
on unseen data. By penalizing large weights, weight decay reduces overfitting.

• Improves Generalization: Encouraging smaller weights can lead to simpler models that generalize better on new data.

• Smoothing the Loss Landscape: Weight decay has the effect of smoothing the model’s loss landscape, making it less
sensitive to small variations in the data, which can help in avoiding overfitting.
Thank You

You might also like