Open In App

Long Short Term Memory (LSTM) Networks using PyTorch

Last Updated : 28 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Long Short-Term Memory (LSTM) where designed to overcome the vanishing gradient problem which traditional RNNs face when learning long-term dependencies in sequential data. LSTMs are capable of retaining information for long periods by using memory cells and gating mechanisms. These memory cells works by three gates: the input gate, the forget gate and the output gate. This structure allows LSTMs to selectively store, update and discard information making them highly effective. In this article we will explore how to implement LSTMs using PyTorch.

1. Import Libraries and Prepare Data

In this step we will import the necessary libraries like pandas, numpy, matplotlib, pytorch and generate synthetic sine wave data for the model. We create sequences of length 10 from the sine data to predict the next value. Then convert inputs and targets into PyTorch tensors for model training.

  • np.linspace() generates points
  • np.sin() creates sine values
  • create_sequences() makes input-output pairs
  • torch.tensor() converts data to tensors with correct shapes
Python
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
torch.manual_seed(0)

t = np.linspace(0, 100, 1000)
data = np.sin(t)

def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data)-seq_length):
        x = data[i:(i+seq_length)]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

seq_length = 10
X, y = create_sequences(data, seq_length)

trainX = torch.tensor(X[:, :, None], dtype=torch.float32)
trainY = torch.tensor(y[:, None], dtype=torch.float32)

2. Defining the LSTM Model

  • LSTMModel class inherits from nn.Module.
  • constructor: initializes the LSTM layer and the fully connected layer. The LSTM layer processes the sequences and the fully connected layer maps the hidden state to the output.
  • forward() function: we check if hidden states (h0 and c0) are provided. If not they are initialized to zeros.
  • The output of the LSTM layer is passed through the fully connected layer which produces the final prediction.
Python
class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x, h0=None, c0=None):
        if h0 is None or c0 is None:
            h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(x.device)
            c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(x.device)
        
        out, (hn, cn) = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out, hn, cn

3. Initializing Model, Loss Function and Optimizer

We create an instance of the LSTMModelclass with specified input and output dimensions, hidden units and number of LSTM layers.

  • Loss Function: We use Mean Squared Error (MSE) loss for regression tasks.
  • Optimizer: We use the Adam optimizer which is a popular choice for training deep learning models.
Python
model = LSTMModel(input_dim=1, hidden_dim=100, layer_dim=1, output_dim=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

4. Training the LSTM Model

We define the number of epochs (100 iterations) for training.

  • In each epoch we set the model to training mode, perform a forward pass, compute the loss and then backpropagate the error to update the weights.
  • We detach the hidden states (h0 and c0) after each iteration to prevent backpropagating through the entire sequence.
  • Every 10 epochs we print the current loss value to monitor the model's progress.
Python
num_epochs = 100
h0, c0 = None, None

for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()

    outputs, h0, c0 = model(trainX, h0, c0)

    loss = criterion(outputs, trainY)
    loss.backward()
    optimizer.step()

    h0 = h0.detach()
    c0 = c0.detach()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Output:

output
Training the LSTM Model

5. Evaluating and Plotting Predictions

We evaluate model using model.eval() and get the predicted outputs.

  • We adjust the original data and predicted data to align them for plotting.
  • We use matplotlib to plot both the original sine wave data and the predicted values to visualize the performance of the LSTM model.
  • To make the results more realistic we intentionally added outliers by adding a value of 0.2 to every 30th and 70th predicted value.
Python
model.eval()
predicted, _, _ = model(trainX, h0, c0)

original = data[seq_length:]
time_steps = np.arange(seq_length, len(data))

predicted[::30] += 0.2 
predicted[::70] -= 0.2

plt.figure(figsize=(12, 6))
plt.plot(time_steps, original, label='Original Data')
plt.plot(time_steps, predicted.detach().numpy(), label='Predicted Data', linestyle='--')
plt.title('LSTM Model Predictions vs. Original Data')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()

Output:

download
Evaluating and Plot Predictions

The plot compares the original sine wave data with the predicted values from the LSTM model. The blue line represents the original data while the orange dashed line shows the model's predictions. Despite the presence of outliers the predictions closely follow original sine wave pattern showing model's ability to learn temporal dependencies from data.

You can download source code from here.

You can also implement them using Tenserflow: Long short-term memory (LSTM) in Tensorflow


Next Article

Similar Reads