Time Series Prediction With Recurrent Neural Networks
Time Series Prediction With Recurrent Neural Networks
Neural Networks
Shubhayan
Dept.of.Math, Jadavpur University
12 Nov 2023
1 Introduction
Time series prediction is a crucial task in various domains, from finance to
weather forecasting. Accurate predictions enable better decision-making. This
document explores the use of Recurrent Neural Networks (RNNs) for effective
time series forecasting. We’ll delve into the mathematical foundation, present
algorithmic steps, and provide practical Python code examples.
1
the features and interactions to include in the model, which can be difficult and
time-consuming.
Mathematically, a neural network for time series analysis can be described
as a function that maps an input sequence x1:T = (x1 , x2 , ..., xT ) to an output
sequence y1:T = (y1 , y2 , ..., yT ), where T is the length of the sequence. The func-
tion can be composed of multiple layers, each with a different activation function
and parameters. For example, a recurrent neural network can be defined as:
ht = f (Whh ht−1 + Wxh xt + bh )
yt = g(Why ht + by )
where ht is the hidden state at time t, f and g are activation functions, and
Whh , Wxh , bh , Why , by are the parameters of the network.
Algorithmically, a neural network for time series analysis can be trained using
a variant of gradient descent, which is an optimization method that iteratively
updates the parameters of the network to minimize a loss function that measures
the difference between the predicted output and the true output. For example,
a common loss function for time series prediction is the mean squared error
(MSE), which is defined as:
T
1X
MSE = (yt − ŷt )2
T t=1
where ŷt is the predicted output at time t. The gradient descent algorithm can
be summarized as:
1. Initialize the parameters of the network randomly.
2. For each epoch (a full pass over the data):
(a) For each input-output pair (x1:T , y1:T ) in the data:
i. Feed the input sequence x1:T to the network and compute the
output sequence ŷ1:T .
ii. Compute the loss function MSE(y1:T , ŷ1:T ) and its gradient with
respect to the parameters of the network.
iii. Update the parameters of the network using a learning rate α
∂W
and the gradient: W ← W − α ∂MSE , where W represents any
parameter of the network.
(b) Evaluate the performance of the network on a validation set and
adjust the learning rate or stop the training if necessary.
3 Mathematical Formulation
3.1 RNN Update Equation
In an RNN, the hidden state ht at time t evolves through the following update
equation:
2
ht = σ(Whh ht−1 + Wxh Xt + bh )
Here, ht is the hidden state, σ is the activation function, Whh and Wxh are
weight matrices, Xt is the input, and bh is the bias for the hidden layer.
Ŷt = Why ht + by
4 Algorithmic Overview
4.1 RNN Training Algorithm
The training of an RNN involves iterative steps:
1. Initialization: Set the initial hidden state h0 and configure model pa-
rameters.
2. Forward Pass: For each time step t,
(a) Update the hidden state using the RNN update equation.
(b) Compute the predicted output using the updated hidden state.
(c) Backward Pass: Calculate the loss between predicted and actual
values.
(d) Gradient Descent: Update weights and biases to minimize the loss.
This iterative process fine-tunes the model parameters to make more ac-
curate predictions over time.
5 Graphical Representation
5.1 Recurrent Structure
The graphical representation in Figure 1 illustrates the recurrent connec-
tions within an RNN. Each block corresponds to a time step, and arrows
depict the flow of information through time. These connections allow the
RNN to capture dependencies in the time series data.
3
Xt Xt+1 Xt+2
Yt Yt+1 Yt+2
2 import numpy as np
3 import matplotlib . pyplot as plt
4 from tensorflow . keras . models import Sequential
5 from tensorflow . keras . layers import SimpleRNN ,
Dense
6
4
25 Dense (1)
26 ])
27
34 # Generate predictions
35 future_steps = 10
36 future_data = series [ -10:] # Use the last 10
steps as input for predicting the next steps
37
7 Appendix
7.1 Introduction to Neural Networks
Neural networks are a class of machine learning models inspired by the
human brain. They consist of interconnected nodes, or neurons, organized
into layers: input, hidden, and output layers. The network learns to make
predictions or decisions based on input data.
5
n
!
X
Output = Activation inputi × weighti + bias
i=1
7.6 Backpropagation
Training a neural network involves minimizing a loss function, typically
calculated using the mean squared error for regression or cross-entropy
for classification. Backpropagation is used to adjust weights and biases to
minimize this loss. It consists of a forward pass to calculate outputs and
a backward pass to update weights using the chain rule.
During the forward pass, the output of each neuron is calculated layer by
layer until the final output is obtained. For a given layer l, the weighted
(l) (l)
sum zi and the activated output ai are computed as:
(l−1)
nX
(l) (l) (l−1) (l)
zi = wij aj + bi
j=1
(l) (l)
ai = Activation(zi )
(l)
Here, wij represents the weight from neuron j in layer (l − 1) to neuron
(l)
i in layer l, and bi is the bias for neuron i in layer l.
6
7.6.2 Backward Pass
The backward pass involves computing the gradients of the loss with re-
spect to the weights and biases, starting from the output layer and mov-
ing backward through the network. The gradients are used to update the
weights and biases to minimize the loss.
The update rule for weights using gradient descent is given by:
(l) ∂L
∆wij = −η (l)
∂wij
∂L
Here, η is the learning rate, and (l) represents the partial derivative of
∂wij
(l)
the loss with respect to the weight wij .
7.7 Algorithm