MODULE 4
MODULE 4
Recurrent Neural Network (RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other, but in cases when it is required to predict the next
word of a sentence, the previous words are required and hence there is a need to remember
the previous words. Thus, RNN came into existence, which solved this issue with the help of
a Hidden Layer. The main and most important feature of RNN is its Hidden state, which
remembers some information about a sequence. The state is also referred to as Memory
State since it remembers the previous input to the network. It uses the same parameters for
each input as it performs the same task on all the inputs or hidden layers to produce the
output. This reduces the complexity of parameters, unlike other neural networks.
How RNN works: The Recurrent Neural Network consists of multiple fixed activation
function units, one for each time step. Each unit has an internal state which is called the
hidden state of the unit. This hidden state signifies the past knowledge that the network
currently holds at a given time step. This hidden state is updated at every time step to signify
the change in the knowledge of the network about the past. The hidden state is updated using
the following recurrence relation: - The formula for calculating the current state:
Formula for applying Activation Function (tanh):
These parameters are updated using Backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known as
Backpropagation through time.
Applications of RNN
1. Autocomplete
2. Translation
4. Sentiment Analysis
Problems in RNN
Because of vanishing gradient problem, RNN do not remember what happened in the beginning
of the sentence. i.e. Short time memory
Eg:
• Today, due to my current job situation and family conditions, I need to take a loan.
• Last year, due to my current job situation and family conditions, I had to take a loan.
In the first sentence the word need can be identified using the word Today. So, RNN has to
remember the word Today to predict the word need. Because of vanishing gradient problem
RNN fails in these situations.
LSTM networks are a type of RNN that uses special units in addition to standard units.
LSTM units include a ‘memory cell’ that can maintain information in memory for long periods
of time. This memory cell lets them learn longer-term dependencies.
LSTMs deal with vanishing and exploding gradient problem by introducing new gates,
such as input and forget gates, which allow for a better control over the gradient flow and
enable better preservation of “long-range dependencies”. The long-range dependency in RNN
is resolved by increasing the number of repeating layers in LSTM.
Architecture LSTM:
The basic difference between the architectures of RNNs and LSTMs is that the hidden layer
of LSTM is a gated unit or gated cell. It consists of four layers that interact with one another
in a way to produce the output of that cell along with the cell state. These two things are then
passed onto the next hidden layer. Unlike RNNs which have got the only single neural net
layer of tanh, LSTMs comprises of three logistic sigmoid gates and one tanh layer. Gates
have been introduced in order to limit the information that is passed through the cell. They
determine which part of the information will be needed by the next cell and which part is to
be discarded. The output is usually in the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’
means ‘include all’.
• The key to LSTMs is the cell state.
• Stores information of the past → long-term memory
• Passes along time steps with minor linear interactions →
“additive”
• Results in an uninterrupted gradient flow → errors in the
past pertain and impact learning in the future
• The LSTM cell manipulates input information with three
gates.
• Input gate → controls the intake of new information
• Forget gate → determines what part of the cell state to be
updated
• Output gate → determines what part of the cell state to
output
Forget Gate
• Step 1: Decide what information to throw away from the cell state
(memory) ➔
• The output of the previous state ℎ𝑡−1 and the new
information 𝑥𝑡 jointly determine what to forget
• ℎ𝑡−1 contains selected features from the memory 𝐶𝑡−1
• Forget gate 𝑓𝑡 ranges between [0, 1]
Input Gate
• Step 2: Prepare the updates for the cell state from input
Output Gate
• Step 4: Decide the filtered output from the new cell state
GRUs are very similar to Long Short Term Memory (LSTM). Just like LSTM, GRU uses gates
to control the flow of information. They are relatively new as compared to LSTM. This is the
reason they offer some improvement over LSTM and have simpler architecture.
Unlike LSTM, it does not have a separate cell state (Ct). It only has a hidden state(Ht). Due to
the simpler architecture, GRUs are faster to train.
GRU cell which more or less similar to an LSTM cell or RNN cell.
At each timestamp t, it takes an input Xt and the hidden state Ht-1 from the previous timestamp
t-1. Later it outputs a new hidden state Ht which again passed to the next timestamp. Now there
are primarily two gates in a GRU as opposed to three gates in an LSTM cell. The first gate is
the Reset gate and the other one is the update gate.
The Reset Gate is responsible for the short-term memory of the network i.e the hidden state
(Ht). Here is the equation of the Reset gate.
LSTM gate equation it is very similar to that. The value of rt will range from 0 to 1 because of
the sigmoid function. Here Ur and Wr are weight matrices for the reset gate.
Similarly, we have an Update gate for long-term memory and the equation of the gate is shown
below.
Reset gate: determines how much old information is needed in the alternative state ℎ̃𝑡
New state: replace selected old information with new information in the new state
Encoder – Decoder sequence to sequence architectures
y0 y1 y2 … ym
Decoded sequence
Encoded semantics B B B B
Input sequence
There are three main blocks in the encoder-decoder model,
• Encoder
• Hidden Vector
• Decoder
The Encoder will convert the input sequence into a single-dimensional vector (hidden vector).
The decoder will convert the hidden vector into the output sequence.
Encoder
• Multiple RNN cells can be stacked together to form the encoder. RNN reads each
inputs sequentially
• For every timestep (each input) t, the hidden state (hidden vector) h is updated
according to the input at that timestep X[i].
• After all the inputs are read by encoder model, the final hidden state of the model
represents the context/summary of the whole input sequence.
• At the first timestep t1, the previous hidden state h0 will be considered as zero or
randomly chosen. So the first RNN cell will update the current hidden state with
the first input and h0. Each layer outputs two things — updated hidden state and
the output for each stage. The outputs at each stage are rejected and only the hidden
states will be propagated to the next layer.
• The hidden states h_i are computed using the formula:
• At second timestep t2, the hidden state h1 and the second input X[2] will be given
as input , and the hidden state h2 will be updated according to both inputs. Then the
hidden state h1 will be updated with the new input and will produce the hidden state
h2. This happens for all the four stages wrt example taken.
• A stack of several recurrent units (LSTM or GRU cells for better performance)
where each accepts a single element of the input sequence, collects information for
that element, and propagates it forward.
This simple formula represents the result of an ordinary recurrent neural network. As you can
see, we just apply the appropriate weights to the previously hidden state h_(t-1) and the input
vector x_t.
Encoder Vector
• This is the final hidden state produced from the encoder part of the model. It is
calculated using the formula above.
• This vector aims to encapsulate the information for all input elements in order to
help the decoder make accurate predictions.
• It acts as the initial hidden state of the decoder part of the model.
Decoder
• The Decoder generates the output sequence by predicting the next output Yt given
the hidden state ht.
• The input for the decoder is the final hidden vector obtained at the end of encoder
model.
• Each layer will have three inputs, hidden vector from previous layer ht-1 and the
previous layer output yt-1, original hidden vector h.
• At the first layer, the output vector of encoder and the random symbol START,
empty hidden state ht-1 will be given as input, the outputs obtained will be y1 and
updated hidden state h1 (the information of the output will be subtracted from the
hidden vector).
• The second layer will have the updated hidden state h1 and the previous output y1
and original hidden vector h as current inputs, produces the hidden vector h2 and
output y2.
• The outputs occurred at each timestep of decoder is the actual output. The model
will predict the output until the END symbol occurs.
• A stack of several recurrent units where each predicts an output y_t at a time step t.
• Each recurrent unit accepts a hidden state from the previous unit and produces an
output as well as its own hidden state.
The recursive networks are adaptive models that are capable of learning deep structured
erudition. Therefore, you may say that the Recursive Neural Networks are among complex
inherent chains.
The recurrent neural network is a recursive neural network. Both the neural networks are
denoted by the same acronym – RNN. If neural networks are recurring over a period of time
or say it is a recursive networking chain type, it is a recurrent neural network. To generalize, it
belongs to the recursive network.
The above image depicts the recursive neural network. Here, if you see, you will find that each
of the parent nodes, its children are a node quite similar to the parent node. Therefore, it’s
evident that the recurrent neural network is more similar to a hierarchical network type. You
can see clearly that there is no concept of structured input & output processing here. It is just
performed in a tree-like hierarchical manner where there are no time specifications &
dependencies associated.
Hence, the major difference between the recursive neural network and recurrent neural
networks is clearly not very well defined. It is seen that the efficiency of any recursive neural
network is far better compared to a feed-forward network. Recurrent neural networks are
created in a chain-like structure. There are no branching methods, but the recurrent neural
networks are created in the form of a deep tree structure. Recurrent networks do not differ from
Recursive neural networks. But in fact, it is a Recursive neural network. There is a fact related
to that recursive networks are inherently complex and, therefore, not accepted on a broader
platform. These RNN’s are even more expensive at all computational learning stages & phases.
Sentiment analysis of sentences is among the major tasks of NLP (Natural Language
Processing), that can identify writers writing tone & sentiments in any specific sentences. When
a writer expresses any sentiments, basic labels around the tone of writing are identified. For
instance, whether the meaning is a constructive form of writing or negative word choices.
For instance, in the undermentioned case of the variable dataset, it expresses every emotion in
distinctive classes.
So, if you see the above image for the Sentiment analysis, it is completely implemented with
the help of Recursive Neural Networks algorithms. The RNN is a form of a recursive neural
net that has a tree structure.