Recurrent Neural Networks (RNNs)
• Another type of neural network that is dominating difficult machine
learning problems that involve sequences of inputs called recurrent
neural networks.
• Recurrent Neural Networks (RNNs) are a special type of neural
network designed for “sequence problems”.
• Recurrent neural networks have connections that have loops,
adding feedback and memory to the networks over time.
• This memory allows this type of network to learn and generalize
across sequences of inputs rather than individual patterns.
• A powerful type of Recurrent Neural Network called → Long Short-
Term Memory Network (LSTM).
• Use cases: diverse array of problems i.e., Language translation,
automatic captioning of images and videos. 1
Support For Sequences in Neural
Networks
• There exist certain problem types that are best framed involving either
a sequence as an input or an output. Example → a univariate time
series problem, i.e., the price of a stock over time.
• This dataset can be framed as a prediction problem for a classical
feedforward Multilayer Perceptron network by defining a windows size
(e.g. 5) and training the network to learn to make short term predictions
from the fixed sized window of inputs.
• This strategy would work, but is very limited. The window of inputs adds
memory to the problem, but is limited to just a fixed number of points
and must be chosen with sufficient knowledge of the problem.
• A naive window would not capture the broader trends over minutes,
hours and days that might be relevant to making a prediction. From one
prediction to the next, the network only knows about the specific inputs
it is provided. 2
Problems that involve sequences
Following is the taxonomy of sequence problems that require a
mapping of an input to an output:
✓ One-to-Many: sequence output, for image captioning.
✓ Many-to-One: sequence input, for sentiment classification.
✓ Many-to-Many: sequence in and out, for machine translation.
✓ Synchronized Many-to-Many: synced sequences in and out,
for video classification.
3
Challenges / Issues → Solutions
For the techniques to be effective on real problems, two major
issues needed to be resolved for the network to be useful:
1) How to train the network with Backpropagation??
2) How to stop gradients vanishing or exploding during
training??
4
How to Train Recurrent Neural
Networks?
• Backpropagation breaks down in a recurrent neural network, because
of the recurrent or loop connections. This was addressed with a
modification of the Backpropagation technique called
Backpropagation Through Time or BPTT.
• Instead of performing Backpropagation on the recurrent network as
stated, the structure of the network is unrolled, where copies of the
neurons that have recurrent connections are created.
• For example: a single neuron with a connection to itself (A → A) could
be represented as two neurons with the same weight values (A → B).
This allows the cyclic graph of a recurrent neural network to be turned
into an acyclic graph like a classic feedforward neural network, and
Backpropagation can be applied.
5
How to Have Stable Gradients During
Training?
• When Backpropagation is used in very deep neural networks and
in unrolled recurrent neural networks, the gradients that are
calculated in order to update the weights can become unstable.
• They can become very large numbers called exploding gradients
or very small numbers called the vanishing gradient problem.
These large numbers in turn are used to update the weights in the
network, making training unstable and the network unreliable.
• This problem is alleviated in deep Multilayer Perceptron networks
through the use of the Rectifier transfer (activation) function.
• In recurrent neural network architectures, this problem has been
alleviated using a new type of architecture called the Long Short-
Term Memory Networks.
6
Long Short-Term Memory Networks
• LSTM network is a recurrent neural network that is trained using
Backpropagation through Time and overcomes the vanishing
gradient problem.
• Instead of neurons, LSTM networks have memory blocks that are
connected into layers.
• A block has components that make it smarter than a classical
neuron and a memory for recent sequences. A block contains gates
that manage the block's state and output. A unit operates upon an
input sequence and each gate within a unit uses the sigmoid
activation function to control whether they are triggered or not,
making the change of state and addition of information owing
through the unit conditional.
7
Contd…
There are three types of gates within a memory unit:
1. Forget Gate: conditionally decides what information to discard
from the unit.
2. Input Gate: conditionally decides which values from the input to
update the memory state.
3. Output Gate: conditionally decides what to output based on input
and the memory of the unit.
Each unit is like a mini state machine where the gates of the units
have weights that are learned during the training procedure.