Long Short Term Memory (LSTM) Networks using PyTorch
Last Updated :
28 May, 2025
Long Short-Term Memory (LSTM) where designed to overcome the vanishing gradient problem which traditional RNNs face when learning long-term dependencies in sequential data. LSTMs are capable of retaining information for long periods by using memory cells and gating mechanisms. These memory cells works by three gates: the input gate, the forget gate and the output gate. This structure allows LSTMs to selectively store, update and discard information making them highly effective. In this article we will explore how to implement LSTMs using PyTorch.
1. Import Libraries and Prepare Data
In this step we will import the necessary libraries like pandas, numpy, matplotlib, pytorch and generate synthetic sine wave data for the model. We create sequences of length 10 from the sine data to predict the next value. Then convert inputs and targets into PyTorch tensors for model training.
- np.linspace() generates points
- np.sin() creates sine values
- create_sequences() makes input-output pairs
- torch.tensor() converts data to tensors with correct shapes
Python
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
torch.manual_seed(0)
t = np.linspace(0, 100, 1000)
data = np.sin(t)
def create_sequences(data, seq_length):
xs = []
ys = []
for i in range(len(data)-seq_length):
x = data[i:(i+seq_length)]
y = data[i+seq_length]
xs.append(x)
ys.append(y)
return np.array(xs), np.array(ys)
seq_length = 10
X, y = create_sequences(data, seq_length)
trainX = torch.tensor(X[:, :, None], dtype=torch.float32)
trainY = torch.tensor(y[:, None], dtype=torch.float32)
2. Defining the LSTM Model
- LSTMModel class inherits from nn.Module.
- constructor: initializes the LSTM layer and the fully connected layer. The LSTM layer processes the sequences and the fully connected layer maps the hidden state to the output.
- forward() function: we check if hidden states (h0 and c0) are provided. If not they are initialized to zeros.
- The output of the LSTM layer is passed through the fully connected layer which produces the final prediction.
Python
class LSTMModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
super(LSTMModel, self).__init__()
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim
self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x, h0=None, c0=None):
if h0 is None or c0 is None:
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(x.device)
c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(x.device)
out, (hn, cn) = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out, hn, cn
3. Initializing Model, Loss Function and Optimizer
We create an instance of the LSTMModelclass with specified input and output dimensions, hidden units and number of LSTM layers.
- Loss Function: We use Mean Squared Error (MSE) loss for regression tasks.
- Optimizer: We use the Adam optimizer which is a popular choice for training deep learning models.
Python
model = LSTMModel(input_dim=1, hidden_dim=100, layer_dim=1, output_dim=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
4. Training the LSTM Model
We define the number of epochs (100 iterations) for training.
- In each epoch we set the model to training mode, perform a forward pass, compute the loss and then backpropagate the error to update the weights.
- We detach the hidden states (h0 and c0) after each iteration to prevent backpropagating through the entire sequence.
- Every 10 epochs we print the current loss value to monitor the model's progress.
Python
num_epochs = 100
h0, c0 = None, None
for epoch in range(num_epochs):
model.train()
optimizer.zero_grad()
outputs, h0, c0 = model(trainX, h0, c0)
loss = criterion(outputs, trainY)
loss.backward()
optimizer.step()
h0 = h0.detach()
c0 = c0.detach()
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
Output:
Training the LSTM Model5. Evaluating and Plotting Predictions
We evaluate model using model.eval() and get the predicted outputs.
- We adjust the original data and predicted data to align them for plotting.
- We use matplotlib to plot both the original sine wave data and the predicted values to visualize the performance of the LSTM model.
- To make the results more realistic we intentionally added outliers by adding a value of
0.2
to every 30th and 70th predicted value.
Python
model.eval()
predicted, _, _ = model(trainX, h0, c0)
original = data[seq_length:]
time_steps = np.arange(seq_length, len(data))
predicted[::30] += 0.2
predicted[::70] -= 0.2
plt.figure(figsize=(12, 6))
plt.plot(time_steps, original, label='Original Data')
plt.plot(time_steps, predicted.detach().numpy(), label='Predicted Data', linestyle='--')
plt.title('LSTM Model Predictions vs. Original Data')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.legend()
plt.show()
Output:
Evaluating and Plot PredictionsThe plot compares the original sine wave data with the predicted values from the LSTM model. The blue line represents the original data while the orange dashed line shows the model's predictions. Despite the presence of outliers the predictions closely follow original sine wave pattern showing model's ability to learn temporal dependencies from data.
You can download source code from here.
You can also implement them using Tenserflow: Long short-term memory (LSTM) in Tensorflow