What is Batch Normalization In Deep Learning?
Last Updated :
03 May, 2025
Batch Normalization is used to reduce the problem of internal covariate shift in neural networks. It works by normalizing the data within each mini-batch. This means it calculates the mean and variance of data in a batch and then adjusts the values so that they have similar range. After that it scales and shifts the values so that model learn effectively.
In traditional neural networks as the input data propagates through the network, the distribution of each layer's inputs changes. This phenomenon is known as internal covariate shift and it can slow down training process. Batch Normalization aims to reduce this issue by normalizing the inputs of each layer.
This process keeps the inputs to each layer of the network in a stable range even if the outputs of earlier layers change during training. As a result training becomes faster and more stable.
Need of Batch Normalization
Batch Normalization makes sure outputs of each layer stay steady as model learns. This helps model train faster and learn more effectively.
- Solves the problem of internal covariate shift.
- Makes training faster and more stable.
- Allows use of higher learning rates.
- Helps avoid vanishing or exploding gradients.
- Can act like a regularizer sometimes reduce the need for dropout.
Fundamentals of Batch Normalization
In this section we are going to discuss the steps taken to perform batch normalization.
Step 1: Compute the Mean and Variance of Mini-Batches
For mini-batch of activations x_1,x_2,...,x_m, the mean μ_B and variance \sigma_{B}^{2} of the mini-batch are computed.
Step 2: Normalization
Each activation x_i is normalized using the computed mean and variance of the mini-batch. The normalization process subtracts the mean \mu_B from each activation and divides by the square root of the variance \sigma_{B}^{2}, ensuring that the normalized activations have a zero mean and unit variance.
Additionally a small constant \epsilon is added to the denominator for numerical stability, particularly to prevent division by zero.
\widehat{x_i} = \frac{x_i - \mu_{B}}{\sqrt{\sigma_{B}^{2} +\epsilon}}
Step 3: Scale and Shift the Normalized Activations
The normalized activations x^i are then scaled by a learnable parameter \gamma and shifted by another learnable parameter \beta. These parameters allow the model to learn the optimal scaling and shifting of the normalized activations giving the network additional flexibility.
y_i = \gamma \widehat{x_i} + \beta
Batch Normalization in TensorFlow
In the code below we built a simple neural network using TensorFlow. We added Batch Normalization layer using tf.keras.layers.BatchNormalization(). This layer helps normalize the output or activations from the previous layer
Python
import tensorflow as tf
# Define a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,)),
# Add Batch Normalization layer
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation('softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=32)
Batch Normalization in PyTorch
In the following code we have build a simple neural network with batch normalization using PyTorch. We have define a subclass of 'nn.Module' and added the 'nn.BatchNorm1D' after the first fully connected layer to normalize the activations.
We have used 'nn.BatchNorm1D' as the input data is one-dimensional but for two-dimensional data like Convolutional Neural Networks 'BatchNorm2D' is used.
Python
import torch
import torch.nn as nn
# Define a simple model
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(784, 64)
# Add Batch Normalization layer
self.bn = nn.BatchNorm1d(64)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(64, 10)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.fc1(x)
# Apply Batch Normalization
x = self.bn(x)
x = self.relu(x)
x = self.fc2(x)
x = self.softmax(x)
return x
# Instantiate the model
model = Model()
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Train the model
for epoch in range(5):
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
Benefits of Batch Normalization
- Faster Convergence: Batch Normalization reduces internal covariate shift, allowing for faster convergence during training.
- Higher Learning Rates: With Batch Normalization, higher learning rates can be used without the risk of divergence.
- Regularization Effect: Batch Normalization introduces a slight regularization effect that reduces the need for adding regularization techniques like dropout.
By normalizing the inputs to each layer Batch Normalization helps stabilize the learning process and allows for faster convergence, making training more effective and reducing the need for careful initialization and learning rate tuning.
Similar Reads
What is Batch Normalization in CNN?
Batch Normalization is a technique used to improve the training and performance of neural networks, particularly CNNs. The article aims to provide an overview of batch normalization in CNNs along with the implementation in PyTorch and TensorFlow. Table of Content Overview of Batch Normalization Need
5 min read
What is Normalization in DBMS?
The normalization concept for relational databases, developed by E.F. Codd, the inventor of the relational database model, is from the 1970s. Before Codd, the most common method of storing data was in large, cryptic, and unstructured files, generating plenty of redundancy and lack of consistency. Wh
8 min read
What is Layer Normalization?
Layer Normalization stabilizes and accelerates the training process in deep learning. In typical neural networks, activations of each layer can vary drastically which leads to issues like exploding or vanishing gradients which slow down training. Layer Normalization addresses this by normalizing the
5 min read
Quantization in Deep Learning
Quantization is an optimization technique aimed at reducing the computational load and memory footprint of neural networks without significantly impacting model accuracy. It involves converting a modelâs high-precision floating-point numbers into lower-precision representations such as integers, whi
8 min read
What is Standardization in Machine Learning
In Machine Learning we train our data to predict or classify things in such a manner that isn't hardcoded in the machine. So for the first, we have the Dataset or the input data to be pre-processed and manipulated for our desired outcomes. Any ML Model to be built follows the following procedure: Co
6 min read
What is Data Acquisition in Machine Learning?
Data acquisition, or DAQ, is the cornerstone of machine learning. It is essential for obtaining high-quality data for model training and optimizing performance. Data-centric techniques are becoming more and more important across a wide range of industries, and DAQ is now a vital tool for improving p
12 min read
Batch Normalization Implementation in PyTorch
Batch Normalization (BN) is a critical technique in the training of neural networks, designed to address issues like vanishing or exploding gradients during training. In this tutorial, we will implement batch normalization using PyTorch framework. Table of Content What is Batch Normalization?How Bat
7 min read
What is Group Normalization?
Group Normalization (GN) is a technique introduced by Yuxin Wu and Kaiming He in 2018. It addresses some of the limitations posed by Batch Normalization, especially when dealing with small batch sizes that are common in high-resolution images or video processing tasks. Unlike Batch Normalization, wh
4 min read
Instance Normalization vs Batch Normalization
Instance normalization and batch normalization are techniques used to make machine learning models train better by normalizing data, but they work differently. Instance normalization normalizes each input individually focusing only on its own features. This is more like giving personalized feedback
5 min read
Mini-Batch Gradient Descent in Deep Learning
Mini-batch gradient descent is a variant of the traditional gradient descent algorithm used to optimize the parameters i.e weights and biases of a neural network. It divides the training data into small subsets called mini-batches allowing the model to update its parameters more frequently compared
7 min read