CNN Regularization
CNN Regularization
Trishna Saikia
Pruning
• It is a technique used in deep learning to reduce the size of a neural network by eliminating less important units
(neurons) or connections between them, with the goal of improving efficiency and reducing overfitting.
• The term "pruning" refers to the process of selectively removing parts of a model that are considered less
significant to its performance.
Weight pruning
Unit/Neuron pruning
• Set entire columns to zero in the weight matrix to zero, in effect For more details: https://towardsdatascience.com/pruning-deep-neural-network-
deleting the corresponding output neuron. 56cae1ec5505
• Stochastic pooling operates over local regions of the input feature map, just like other pooling methods.
• However, instead of deterministically selecting the maximum value (max pooling) or the average value (average
pooling), it selects a value based on a probability distribution derived from the values in the pooling region.
• This introduces a layer of randomness, acting as a regularizer and helping the network to avoid overfitting.
1. Pooling Region: Define a local region (e.g., 2x2 or 3x3) in the input feature map.
2. Probability Calculation: Sum all the values in the pooling region.
3. Divide each value by this sum to obtain the probability of selecting that value.
Benefits of Stochastic Pooling:
• Regularization: By introducing randomness, stochastic pooling serves as a form of regularization, reducing overfitting
and enhancing the generalization capability of the model.
• Robustness: It prevents the network from becoming overly reliant on specific features, making it more robust to
variations and noise in the input data.
• Feature Diversity: Stochastic pooling can capture a more diverse set of features compared to deterministic pooling
methods, potentially leading to richer representations.
Synthetic Data
Demo: https://openai.com/index/sora/
Early Stopping
Process:
• During training, a model iteratively updates its weights to minimize a loss function, improving its performance on the
training data.
• However, after a certain point, the model may start to "overfit," learning patterns that are specific to the training data and
not generalizable to new data.
• To track the model's generalization ability, the training process typically includes a validation set (a subset of data not used
for training).
• The performance of the model on the validation set is monitored after each epoch (a complete pass through the training
data).
• As training progresses, the model’s accuracy on the training data may continue to improve, but the validation accuracy might
plateau and then start to degrade. This indicates that the model is overfitting to the training data.
• Early stopping prevents this by stopping the training when the validation performance stops improving for a specified
number of epochs (patience). The model’s weights at the point of best validation performance are retained.
Weight Decay
• It is a regularization technique used in deep learning to prevent overfitting by penalizing large weights in the model.
• It works by adding a penalty term to the loss function during training, encouraging the model to learn smaller, more
generalized weights.
• This helps in improving the model’s generalization on unseen data.
❖ In weight decay, a regularization term is added to the original loss function (e.g., mean squared error or cross-entropy
loss). The modified loss function becomes:
Where,
➢ 𝐿𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 is the original loss function.
➢ 𝑤𝑖 represents the model’s weights.
➢ 𝜆 is a hyperparameter that controls the strength of the regularization.
➢ The added term𝜆∑𝑤𝑖2 encourages the weights 𝑤𝑖 to be smaller, penalizing large weight values.
Controlling Weight Decay:
• If 𝜆 is too small, the regularization effect will be negligible, and the model may overfit.
• If 𝜆 is too large, the weights will shrink too much, and the model might underfit (not learning the data well enough).
• A carefully tuned 𝜆 helps strike a balance between overfitting and underfitting.
• Prevents Overfitting: Models with very large weights tend to fit the training data too closely, leading to poor generalization
on unseen data. By penalizing large weights, weight decay reduces overfitting.
• Improves Generalization: Encouraging smaller weights can lead to simpler models that generalize better on new data.
• Smoothing the Loss Landscape: Weight decay has the effect of smoothing the model’s loss landscape, making it less
sensitive to small variations in the data, which can help in avoiding overfitting.
Thank You