Initialize weights in PyTorch
Last Updated :
26 Apr, 2025
If we are trying to build a neural network then we have to initialize the layers of the network with some initial weights which we try to optimize as the training process of the model goes on. The method by which the weights of a neural network are initialized does affect the time required to reach the optimized solution and solve the problem of vanishing or exploding gradients. In this article, we will try to learn the method by which effective initialization of weights can be done by using the PyTorch machine learning framework.
Why initialize weights?
Initializing the weights of a neural network is a vital step in the training process as appropriate weight initialization is an instrumental factor impacting the convergence and performance of a network. Weights that are initialized to the same value can cause the model to converge to the same suboptimal solution, regardless of the optimization algorithm being used.
Weights that are initialized to large values can lead to vanishing or exploding gradients, depending on the activation function being used. This can cause the model to converge slowly or not at all. Weights that are initialized to small random values can lead to more efficient training, as the optimization algorithm is able to make larger updates to the weights at the beginning of training. Different initialization methods can be more suitable for different types of problems and model architectures.
Using the nn.init Module for Weights Initialization
The PyTorch nn.init module is a conventional way to initialize weights in a neural network, which provides a multitude of weight initialization methods such as:
- Uniform initialization
- Xavier initialization
- Kaiming initialization
- Zeros initialization
- One’s initialization
- Normal initialization
An example implementation of the same is provided below:
Uniform Initialization
Using a uniform distribution to initialize the weights can help prevent the ‘vanishing gradient’ problem, as the distribution has a finite range and the weights are distributed evenly across that range. However, this method can suffer from the ‘exploding gradient’ problem if the range is too large.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.uniform_(linear_layer.weight)
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[-0.1768, -0.4942],
[ 0.0756, -0.0967],
[-0.3923, 0.3283]], requires_grad=True)
Xavier Initialization
Using Xavier initialization can help prevent the ‘vanishing gradient’ problem, as it scales the weights such that the variance of the outputs of each layer is the same as the variance of the inputs.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.xavier_uniform_(linear_layer.weight)
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[ 0.4442, -0.3890],
[-0.2876, -0.3379],
[-0.5261, 0.5227]], requires_grad=True)
Kaiming Initialization
Using Kaiming initialization can help prevent the ‘vanishing gradient’ problem, as it scales the weights such that the variance of the outputs is the same as the variance of the inputs, taking into account the nonlinearity of the activation function.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.kaiming_uniform_(linear_layer.weight,
a = 0 , mode = "fan_in" ,
nonlinearity = "relu" )
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[ 0.0582, 0.4701],
[ 0.4982, 0.5452],
[-0.0384, 0.5999]], requires_grad=True)
Zeros and Ones Initialisation
Initializing the weights to zeros can cause the model to converge slowly, as all of the weights will be updated in the same direction. This can also lead to the ‘vanishing gradient’ problem.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.zeros_(linear_layer.weight)
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[0., 0.],
[0., 0.],
[0., 0.]], requires_grad=True)
Initializing the weights to ones can cause the model to converge slowly, as all of the weights will be updated in the same direction. This can also lead to the ‘exploding gradient’ problem.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.ones_(linear_layer.weight)
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[1., 1.],
[1., 1.],
[1., 1.]], requires_grad=True)
Normal Initialisation
Using a normal distribution to initialize the weights can help prevent the ‘exploding gradient’ problem, as the distribution has a finite range and the weights are distributed evenly around the mean. It must be noted that the neural network’s performance is not impacted by the weights alone; the learning rate, the optimization algorithms and the hyperparameters used also play a crucial role in increasing the efficiency of the neural network.
Python3
import torch
linear_layer = torch.nn.Linear( 2 , 3 )
torch.nn.init.normal_(linear_layer.weight,
mean = 0 , std = 1 )
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[-0.1759, 0.5192],
[-0.5621, -0.3871],
[-0.6071, 0.3538]], requires_grad=True)
Applying a Custom Function for Weights Initialization
An alternative method is to create a customized function to initialize the weights, which can be applied to the layer using the apply attribute.
Python3
import torch
def custom_weights(m):
torch.nn.init.uniform_(m.weight,
- 0.5 , 0.5 )
linear_layer = torch.nn.Linear( 2 , 3 )
linear_layer. apply (custom_weights)
print (linear_layer.weight)
|
Output:
Parameter containing:
tensor([[ 0.4341, -0.3424],
[ 0.2095, 0.1782],
[-0.4244, 0.1719]], requires_grad=True)
Using a user-defined Layer Class for Weights Initialization
Another method involves creating a user-defined class that inherits from the torch.nn.Module class. Therein, the constructor can be overridden in order to implement custom weights.
Python3
import torch
class MyLayer(torch.nn.Module):
def __init__( self , independent, dependent):
super (MyLayer, self ).__init__()
self .linear = torch.nn.Linear(independent,
dependent)
torch.nn.init.uniform_( self .linear.weight,
- 0.5 , 0.5 )
def forward( self , x):
return self .linear(x)
linear_layer = MyLayer( 2 , 3 )
print (linear_layer.linear.weight)
|
Output:
Parameter containing:
tensor([[-0.1566, 0.2461],
[-0.3361, -0.0551],
[ 0.4607, 0.3077]], requires_grad=True)
In conclusion, initializing the weights of a neural network model is an important step in the training process, as it can have a significant impact on the model’s performance. PyTorch provides several built-in initialization methods, including uniform, normal, Xavier, Kaiming, ones, and zeros. Each of these methods has its own advantages and disadvantages, and the choice of method will depend on the specific problem and model architecture being used. It is important to choose an initialization method that is suitable for the problem at hand, as it can help prevent vanishing or exploding gradient problems and improve the convergence speed and final accuracy of the model.
Similar Reads
What is PyTorch Ignite?
PyTorch Ignite is a high-level library designed to simplify the process of training and evaluating neural networks using PyTorch. It provides a flexible and transparent framework that allows developers to focus on building models rather than dealing with the complexities of the training process. Thi
7 min read
Saving and Loading Weights in PyTorch Lightning
In Machine learning models, it is important to save and load weights efficiently. This helps us preserve the state of our model during training, so we can resume later without starting from scratch. In this article, we are going to discuss how to save and load weights in PyTorch Lightning. PyTorch L
8 min read
One-Dimensional Tensor in Pytorch
In this article, we are going to discuss a one-dimensional tensor in Python. We will look into the following concepts: Creation of One-Dimensional TensorsAccessing Elements of TensorSize of TensorData Types of Elements of TensorsView of TensorFloating Point TensorIntroduction The Pytorch is used to
5 min read
How to normalize images in PyTorch ?
Image transformation is a process to change the original values of image pixels to a set of new values. One type of transformation that we do on images is to transform an image into a PyTorch tensor. When an image is transformed into a PyTorch tensor, the pixel values are scaled between 0.0 and 1.0.
6 min read
Python - tensorflow.constant_initializer()
TensorFlow is open-source Python library designed by Google to develop Machine Learning models and deep learning neural networks. constant_initializer() is initializer that generate a Tensor with constant value. Syntax: tensorflow.constant_initializer( value ) Parameters: value: It is the value that
1 min read
What is PyTorch ?
PyTorch is a deep learning library built on Python and Torch (a Lua-based framework). It provides GPU acceleration, dynamic computation graphs, and an intuitive interface for deep learning researchers and developers. PyTorch follows a "define-by-run" approach, meaning that its computational graphs a
5 min read
Reshaping a Tensor in Pytorch
In this article, we will discuss how to reshape a Tensor in Pytorch. Reshaping allows us to change the shape with the same data and number of elements as self but with the specified shape, which means it returns the same data as the specified array, but with different specified dimension sizes. Crea
7 min read
Distributed Applications with PyTorch
PyTorch, an open-source machine learning library developed by Facebook's AI Research lab, has become a favorite tool among researchers and developers for its flexibility and ease of use. One of the key features that enable PyTorch to scale efficiently across multiple devices and nodes is its distrib
6 min read
Two-Dimensional Tensors in Pytorch
PyTorch is a python library developed by Facebook to run and train machine learning and deep learning models. In PyTorch everything is based on tensor operations. Two-dimensional tensors are nothing but matrices or vectors of two-dimension with specific datatype, of n rows and n columns. Representat
3 min read
Custom Optimizers in Pytorch
In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD
11 min read