0% found this document useful (0 votes)

65 views17 pages

UNIT-IV Improving Deep Neural Networks

The document discusses the concepts of bias and variance in deep learning, highlighting the importance of balancing them to avoid underfitting and overfitting. It also covers various training aspects of Convolutional Neural Networks (CNNs), including data augmentation, regularization techniques, weight initialization, activation functions, normalization, and hyperparameter tuning. Additionally, it explains transfer learning and fine-tuning strategies to enhance model performance on new tasks using pre-trained models.

Uploaded by

preetam naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views17 pages

UNIT-IV Improving Deep Neural Networks

Uploaded by

preetam naik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT-IV: Improving Deep Neural Networks: Data Augmentation - Under-fitting vs.

over-fitting. Training Aspects of CNNs, Regularization, Weight Initialization, Activation

Functions, Normalization, Hyperparameters in CNNs, Transfer Learning, and Fine Tuning in
CNNS

Bias

● Bias is the error from wrong assumptions in the model.

● A high-bias model is too simple to learn the true patterns.

● It underfits the data (poor performance on both training and test sets).

📌 Example: Using a straight line to fit curved data.

Variance

● Variance is the error from the model being too sensitive to training data.

● A high-variance model memorizes the training data and overfits.

● It performs well on training data but poorly on new data.

📌 Example: A deep neural net trained on very few images.

Bias-Variance Trade-off

● You need to balance bias and variance to build a good model.

○ Too much bias → underfitting.

○ Too much variance → overfitting.

● The goal is to find a sweet spot where the model generalizes well to unseen data.

Student Analogy – Understanding Bias & Variance

Student Behavior Outcome in Outcome in Conclusion

Class Test Monthly Test

A Distracted, pays no ~50% (random ~50% Underfitting

attention guessing)

B Memorizes 98% Low score Overfitting

everything
C Learns concepts & 92% Consistent score Good
solves problems Generalization

➡️ Similar to:
● A = Simple model (can’t learn)

● B = Overly complex model (memorizes)

● C = Balanced model (generalizes well)

Underfitting

● The model is too simple to learn the underlying pattern in the data.

● Occurs due to high bias.

● Poor performance on both training and test data.

● Fails to capture important trends in the data.

● Causes: Simple model, insufficient training, too few parameters.

● Example: Using a linear model to classify complex image data.

(1-layer CNN on MNIST)

Overfitting

● The model is too complex and learns noise along with patterns.

● Occurs due to high variance.

● Performs very well on training data, but poorly on test data.

● Fails to generalize to unseen data.

● Causes: Complex model, small training dataset, too many parameters.

● Example: Deep neural network memorizing a small set of images(Deep CNN on 100
cat images)

Real-Life Analogy: Fruit Recognition

● Underfitting: Seen only one apple & banana → can't identify new fruit.
● Overfitting: Memorized exact training images → fails on rotated banana.

● Good Fit: Seen various types → generalizes to unseen examples.

Data Augmentation

● Data augmentation is a technique used to artificially increase the size and

diversity of the training dataset.

● It involves applying various transformations to the original data to create new,

modified examples.

● Helps improve model generalization and prevents overfitting, especially when the
dataset is small.
● Commonly used in training Convolutional Neural Networks (CNNs) for image
classification tasks.

Common Techniques of Data Augmentation

1. Flipping – Horizontally or vertically flipping the image.

2. Rotation – Rotating the image by a small angle (e.g., ±10°).

3. Scaling – Zooming in or out of the image.

4. Cropping – Randomly cutting parts of the image.

5. Brightness/Contrast Adjustment – Modifying lighting conditions.

6. Translation – Shifting the image left, right, up, or down.

7. Noise Injection – Adding random noise to make the model robust.

Purpose:

● Prevents overfitting

● Exposes model to variations

● Encourages robust feature learning

Training Aspects of CNNs:

1. Regularization Techniques

Purpose: Prevents overfitting by limiting the complexity of the model.
1. Dropout

○ Definition: Randomly disables a percentage of neurons during training.

○ Goal: Prevents overfitting by ensuring that the model doesn't rely too much
on any single neuron.

2. L2 Regularization (Weight Decay)

○ Definition: Adds a penalty term to the loss function based on the squared
values of the weights.

○ Goal: Prevents the model from learning excessively large weights, which
could lead to overfitting.

○ Formula: where λ is the regularization parameter.

3. L1 Regularization

○ Definition: Adds a penalty term to the loss function based on the absolute
values of the weights.

○ Goal: Encourages sparsity in the model (many weights become zero), leading
to simpler models.

○ Formula:

4. Data Augmentation

○ Definition: Artificially increases the size of the training dataset by applying

random transformations to the images (e.g., flipping, rotation, scaling,
cropping).

○ Goal: Increases the variety of data the model sees, preventing it from
memorizing the training examples and improving generalization.

2. Weight Initialization Techniques in CNNs

1. Zero Initialization

○ Definition: All weights are initialized to zero.

○ Problem: Leads to symmetry-breaking failure—every neuron in the layer

learns the same thing, making the network ineffective.
○ Conclusion: Not used in practice.

2. Random Initialization

○ Definition: Weights are initialized with small random values drawn from a
normal or uniform distribution.

○ Problem: If the values are too small, it can lead to vanishing gradients; if
too large, it can lead to exploding gradients.

3. Normal Distribution Initialization

○ Definition: Weights are drawn from a normal (Gaussian) distribution,

centered around a mean (usually 0) with a certain standard deviation.

○ Example: torch.nn.init.normal_(tensor, mean=0.0, std=0.02)

○ Advantage: Helps in maintaining variance in the model and prevents both

vanishing and exploding gradients.
○

4. Uniform Distribution Initialization

○ Definition: Weights are initialized using a uniform distribution where all

values within a range are equally likely.

○ Example: torch.nn.init.uniform_(tensor, a=-0.1, b=0.1)

○ Advantage: Ensures that all neurons start with different values, which is
important for symmetry breaking.

5. Xavier Initialization (Glorot Initialization)

○ Used For: Sigmoid and tanh activation functions.

○ Definition: Weights are initialized with values that maintain the variance
across layers. The goal is to keep the variance of activations and gradients
the same across layers.

○ Formula:

■ For Uniform Distribution:

■ For Normal Distribution:

○ Advantage: Helps in preventing vanishing/exploding gradients.

6. He Initialization (Kaiming Initialization)

○ Used For: ReLU and its variants (Leaky ReLU, ELU).

○ Definition: Weights are initialized with higher variance to account for the fact
that ReLU neurons "kill" half the activations (negative ones).

○ Formula:

○ Advantage: Keeps the variance large enough to prevent dying ReLU

problems.

3. Activation Functions

Sigmoid:

● Formula:

● Range: 0 to 1

● Use: Binary classification

● Issue: Vanishing gradients.

Tanh:

● Range: -1 to 1

● Use: Hidden layers

● Issue: Vanishing gradients.

ReLU:

● Formula: ReLU(x)=max(0,x)

● Range: 0 to ∞

● Use: Hidden layers

● Issue: Dying ReLU problem.

Leaky ReLU:

● Formula: Leaky ReLU(x)=max(0.01x,x)

● Range: (-∞ to ∞)

● Use: Hidden layers, solves dying ReLU.

Softmax:

● Formula:

● Range: 0 to 1 (probabilities)

● Use: Multi-class classification.

4. Normalization refers to the technique of adjusting the input data or activations in neural
networks so that they lie within a certain range, helping to improve the efficiency and stability
of the training process.

Common Normalization Techniques:

1. Batch Normalization (BN):

○ Purpose: Reduces internal covariate shift by normalizing activations for each

mini-batch.

○ How: For each layer, normalize the input features to have zero mean and unit
variance.

○ Benefits: Speeds up training, allows higher learning rates, reduces

overfitting.

5. Loss Function

● Purpose: Measures how far the model’s predictions are from the true labels.

● Examples:

○ Cross-Entropy Loss: For classification.

○ Mean Squared Error (MSE): For regression.

6. Optimizer

● Purpose: Adjusts the weights based on the gradient of the loss function.

● Common Optimizers:

○ Adam

○ SGD (Stochastic Gradient Descent)

○ RMSprop

7. Learning Rate

● Purpose: Controls the step size when updating the weights.

● Impact:

○ Too high → Can lead to unstable learning.

○ Too low → Training can be very slow and might get stuck in suboptimal
solutions.
8.Early Stopping

● Purpose: Stops training when the performance on the validation set stops improving.

● Goal: Prevents overfitting by halting training before the model memorizes the training
data.

9. Hyperparameters in CNNs

Hyperparameters are parameters set before training that greatly affect how well a CNN
learns and performs. Tuning them properly is essential for optimal model performance.

1. Number of Layers

● Defines the depth of the CNN (how many convolutional + pooling layers).

● Deeper networks can learn more complex features.

● But too deep → risk of overfitting.

● Start with fewer layers and increase gradually based on performance.

2. Filter Size

● Determines the receptive field (how much of the image is seen by a neuron).

● Larger filters (e.g., 5x5) → capture more spatial info, but more parameters.

● Smaller filters (e.g., 3x3) → fewer parameters, but might miss global features.

● Common practice: use 3x3 filters in most modern CNNs.

3. Stride

● Controls how much the filter moves across the input.

● Stride = 1 (default) → more detailed output.

● Stride > 1 → reduces feature map size (downsampling).

● Trade-off: smaller stride = more info, larger stride = faster but info loss.

4. Padding

● Adds zeros around the input to preserve its size after convolution.

● Types:

○ Same padding: output size = input size.

○ Valid padding: no padding; output shrinks.

● Helps retain edge information.

● Slightly increases computation and memory use.

5. Learning Rate

● Controls the step size during weight updates.

● High learning rate → fast but unstable learning.

● Low learning rate → stable but slow learning.

● Needs fine-tuning; often adjusted dynamically during training.

6. Batch Size

● Number of training samples processed before the model updates weights.

● Large batch size:

○ Stable gradients.

○ Higher memory usage.

● Small batch size:

○ Lower memory.

○ Faster updates, but noisier gradients.

● Common values: 16, 32, 64, 128 (depending on GPU capacity).

7. Number of Epochs

● One epoch = one full pass over the entire dataset.

● Too few epochs → underfitting (model hasn’t learned enough).

● Too many epochs → overfitting (model memorizes training data).

● Use early stopping to halt training when validation performance stops improving.

8. Dropout Rate

● Regularization method to prevent overfitting.

● Randomly "drops" a fraction of neurons during training.

● Common dropout rates: 0.2 to 0.5.

● Helps ensure neurons don’t become overly reliant on each other (co-adaptation).

Hyperparameter Optimization Techniques

These techniques help in finding the best set of hyperparameters for training a CNN
effectively.

1. Grid Search

● Definition: Exhaustively tries all possible combinations of hyperparameters from a

predefined set (grid).

● ✅ Simple and systematic.

● ❌ Computationally expensive, especially with many parameters or large datasets.
2. Random Search

● Definition: Randomly samples a fixed number of hyperparameter combinations from

defined ranges.

● ✅ More efficient than grid search in high-dimensional spaces.

● ✅ Often finds good results faster.
● ❌ May miss optimal combinations.
3. Bayesian Optimization

● Definition: Builds a probabilistic model of the objective function and uses it to

choose the next set of hyperparameters.

● ✅ Smart and efficient search.

● ✅ Focuses on promising regions of the parameter space.
● ❌ More complex to implement.
Transfer Learning with Convolutional Neural Networks (CNNs)

Transfer learning allows a CNN trained on a large dataset (like ImageNet) to be reused for a
related but different task, saving time, computation, and data.

Why Use Transfer Learning?

● Pre-trained CNNs learn general visual features (edges, textures, shapes).

● Saves time and resources.

● Improves performance on small datasets.

● Reduces the need for extensive training.

Key Idea:

● Use pre-trained CNN as a feature extractor.

● Freeze pre-trained layers (do not update them).

● Add new task-specific layers on top and train them on your dataset.

Common Pre-trained Models:

● VGG

● ResNet

● Inception

● MobileNet
(Available in TensorFlow, PyTorch, etc.)
Steps to Implement Transfer Learning

1. Select Pre-trained Model

○ Choose a model suited to your problem (e.g., image classification, detection).

2. Load Model without Top Layers

○ Remove the original fully-connected layers (used for previous task).

3. Customize the Model

○ Add your own layers (e.g., Dense, Dropout, Softmax) for the new
classification task.

4. Freeze Pre-trained Layers

○ Prevent updates to these layers during training.

5. Prepare the Dataset

○ Resize, normalize, and augment images to match the input format of the
model.

6. Train the Model

○ Only newly added layers are trained. Use appropriate optimizer and loss
function.

7. Fine-tune (Optional)

○ Unfreeze some top layers of the pre-trained model and re-train with a low
learning rate.

8. Evaluate the Model

○ Use validation/test data. Assess performance using metrics like accuracy,

loss, precision, or recall.
Fine-tuning Strategies in CNNs (Transfer Learning)

1. Freezing Layers

● Purpose: Preserve previously learned features.

● How:

○ Freeze early layers (low-level features like edges/textures).

○ Optionally freeze all layers except the final one.

○ Only train the last few or new layers.

2. Modifying Layers

● Output Layer:

○ Replace if the number of classes is different from the original task.

● Input Layer:

○ Adjust if the input feature size or shape has changed.

● Add New Layers:

○ Add custom layers (Dense, Dropout, etc.) and train only these initially.

3. Adjusting the Learning Rate

● Use a smaller learning rate during fine-tuning:

○ Allows gradual adaptation.

○ Prevents drastic changes to useful learned features.

● Typical strategy:

○ Use 1/10th of the original learning rate.

○ Example:

■ Original task: lr = 0.01

■ Fine-tuning: lr = 0.001

Why are Hyperparameters Important in CNNs?

● Influence training speed – Determines how fast the model learns.

● Affect model accuracy – Key to achieving high prediction performance.

● Control overfitting and underfitting – Help balance model complexity.

● Crucial for generalization – Ensure good performance on unseen/test data.

● Enable faster convergence – Reduce training time by optimizing learning steps.

● Improve overall model performance – Well-tuned settings lead to better results.

Example: CNN Hyperparameter Setup # Sample CNN Hyperparameters

learning_rate = 0.001

batch_size = 32

epochs = 50

optimizer = 'Adam'

num_filters = [32, 64, 128]

filter_size = (3, 3)

stride = 1

padding = 'same'

activation = 'ReLU'

dropout_rate = 0.5

CNN Training Aspects Presentation
No ratings yet
CNN Training Aspects Presentation
26 pages
UNIT-III DLL Full Unit
No ratings yet
UNIT-III DLL Full Unit
63 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
DeekshikaJadyada21 AP24LDS11
No ratings yet
DeekshikaJadyada21 AP24LDS11
5 pages
Unit3 DL JNTUK
No ratings yet
Unit3 DL JNTUK
15 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Deep Learning Regularization Techniques
No ratings yet
Deep Learning Regularization Techniques
36 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
2 Marks Gen AI
No ratings yet
2 Marks Gen AI
14 pages
Terms To Review
No ratings yet
Terms To Review
9 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Exam Gen AI
No ratings yet
Exam Gen AI
14 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
CNN-Based Gender Classification Guide
No ratings yet
CNN-Based Gender Classification Guide
7 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
CNN Image Classification Guide
No ratings yet
CNN Image Classification Guide
20 pages
Lec5 CNN RNN Attention
No ratings yet
Lec5 CNN RNN Attention
71 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Unit 4
No ratings yet
Unit 4
51 pages
Seminar Report cnn1
No ratings yet
Seminar Report cnn1
23 pages
Deep Neural Network Training Techniques
No ratings yet
Deep Neural Network Training Techniques
47 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
Assignment Jaiprakash
No ratings yet
Assignment Jaiprakash
5 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Deep Learning Techniques
No ratings yet
Deep Learning Techniques
72 pages
SocrAI Day 2: Neural Networks & CNNs
No ratings yet
SocrAI Day 2: Neural Networks & CNNs
66 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Neural Networks: Concepts and Challenges
No ratings yet
Neural Networks: Concepts and Challenges
13 pages
Deep Learning Advanced Basics
No ratings yet
Deep Learning Advanced Basics
13 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Question Bank Advanced CO1, CO2
No ratings yet
Question Bank Advanced CO1, CO2
4 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
Week 6
No ratings yet
Week 6
8 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
DL 3 Regularization
No ratings yet
DL 3 Regularization
50 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
Deep Learning U3
No ratings yet
Deep Learning U3
3 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
Unit 4
No ratings yet
Unit 4
13 pages
More On CNN
No ratings yet
More On CNN
131 pages
DL Lect 7
No ratings yet
DL Lect 7
15 pages
Understanding NN Architecture Basics
No ratings yet
Understanding NN Architecture Basics
19 pages
Deep Neural Network Optimization Techniques
No ratings yet
Deep Neural Network Optimization Techniques
23 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Unit 4 (CNN and SOM)
No ratings yet
Unit 4 (CNN and SOM)
15 pages
03 Convolution Neural Networks and Computer Vision With Tensorflow
No ratings yet
03 Convolution Neural Networks and Computer Vision With Tensorflow
21 pages
Dynamic Q Learning Agent For Efficient Grid Navigation in Restricted Pathfinding Scenarios
No ratings yet
Dynamic Q Learning Agent For Efficient Grid Navigation in Restricted Pathfinding Scenarios
4 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
25 pages
5 - Automated Legal Consulting in Construction Procurement Using Metaheuristically Optimized Large Language Models
No ratings yet
5 - Automated Legal Consulting in Construction Procurement Using Metaheuristically Optimized Large Language Models
11 pages
Result Based Paper
No ratings yet
Result Based Paper
12 pages
Machine Learning Quiz for Enterprises
No ratings yet
Machine Learning Quiz for Enterprises
13 pages
Part1 Ai&Ml Course File
No ratings yet
Part1 Ai&Ml Course File
26 pages
Heart Failure Prediction Using XGB Classifier Logistic Regression and Support Vector Classifier
No ratings yet
Heart Failure Prediction Using XGB Classifier Logistic Regression and Support Vector Classifier
5 pages
Practical Guide To Keras
No ratings yet
Practical Guide To Keras
28 pages
Non Syllabus Project
No ratings yet
Non Syllabus Project
26 pages
Iitd Cpadsai01 Brochure 2
No ratings yet
Iitd Cpadsai01 Brochure 2
20 pages
Soft Computing
No ratings yet
Soft Computing
46 pages
VC Exit Predictor Technical Documentation
No ratings yet
VC Exit Predictor Technical Documentation
8 pages
A Study of Machine Learning-Based Approaches For SQL Injection Detection and Prevention
No ratings yet
A Study of Machine Learning-Based Approaches For SQL Injection Detection and Prevention
10 pages
Computing The Linguistic-Based Cues of Fake Ne
No ratings yet
Computing The Linguistic-Based Cues of Fake Ne
7 pages
AI's Role in Enhancing ESG Strategies
No ratings yet
AI's Role in Enhancing ESG Strategies
34 pages
Final Thesis Project 4
No ratings yet
Final Thesis Project 4
13 pages
Chem Prop
No ratings yet
Chem Prop
9 pages
cs329s 02 Note Intro ML Sys Design
No ratings yet
cs329s 02 Note Intro ML Sys Design
27 pages
Diabetes Detection with ML
No ratings yet
Diabetes Detection with ML
10 pages
Bayesian Optimization: Theory and Practice Using Python Peng Liu Sample
No ratings yet
Bayesian Optimization: Theory and Practice Using Python Peng Liu Sample
157 pages
Credit Risk Analyst Project
No ratings yet
Credit Risk Analyst Project
17 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
12 - 23ECE216 - Nearest Neighbors
No ratings yet
12 - 23ECE216 - Nearest Neighbors
29 pages
Databricks Certified Machine Learning Associate Exam Guide
No ratings yet
Databricks Certified Machine Learning Associate Exam Guide
9 pages
Machine Learning in Mental Health Prediction
No ratings yet
Machine Learning in Mental Health Prediction
9 pages
EAAI-24-11159 R1 Reviewer
No ratings yet
EAAI-24-11159 R1 Reviewer
97 pages
ML Lab Report
No ratings yet
ML Lab Report
23 pages
Machine Learning Data Science Project Documentation
No ratings yet
Machine Learning Data Science Project Documentation
4 pages
Machine Learning Python
No ratings yet
Machine Learning Python
48 pages
Basic Machine Learning Terms 2
No ratings yet
Basic Machine Learning Terms 2
4 pages