0% found this document useful (0 votes)
10 views

MODULE 4

Recurrent Neural Networks (RNNs) are designed to process sequential data by using previous outputs as inputs for current steps, allowing them to remember past information through a hidden state. They face challenges like the vanishing gradient problem, which affects their ability to retain information over long sequences, leading to the development of advanced architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) that better manage memory retention. RNNs and their variants are widely used in applications such as language translation, sentiment analysis, and autocomplete features.

Uploaded by

hicey94162
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

MODULE 4

Recurrent Neural Networks (RNNs) are designed to process sequential data by using previous outputs as inputs for current steps, allowing them to remember past information through a hidden state. They face challenges like the vanishing gradient problem, which affects their ability to retain information over long sequences, leading to the development of advanced architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) that better manage memory retention. RNNs and their variants are widely used in applications such as language translation, sentiment analysis, and autocomplete features.

Uploaded by

hicey94162
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Recurrent Neural Networks

Recurrent Neural Network (RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other, but in cases when it is required to predict the next
word of a sentence, the previous words are required and hence there is a need to remember
the previous words. Thus, RNN came into existence, which solved this issue with the help of
a Hidden Layer. The main and most important feature of RNN is its Hidden state, which
remembers some information about a sequence. The state is also referred to as Memory
State since it remembers the previous input to the network. It uses the same parameters for
each input as it performs the same task on all the inputs or hidden layers to produce the
output. This reduces the complexity of parameters, unlike other neural networks.

How RNN works: The Recurrent Neural Network consists of multiple fixed activation
function units, one for each time step. Each unit has an internal state which is called the
hidden state of the unit. This hidden state signifies the past knowledge that the network
currently holds at a given time step. This hidden state is updated at every time step to signify
the change in the knowledge of the network about the past. The hidden state is updated using
the following recurrence relation: - The formula for calculating the current state:
Formula for applying Activation Function (tanh):

where: whh →weight at recurrent neuron


wxh →weight at input neuron

The formula for calculating output:

Yt → output; Why → weight at output layer

These parameters are updated using Backpropagation. However, since RNN works on
sequential data here we use an updated backpropagation which is known as
Backpropagation through time.

Computational Graph of RNN


A computational graph is essentially a directed graph with functions and operations as nodes.
Computing the outputs from the inputs is called the forward pass, and it’s customary to show
the forward pass above the edges of the graph. In the backward pass, RNN compute the gradients
of the output with respect to the inputs and show them below the edges.

Applications of RNN
1. Autocomplete

2. Translation

3. Named entity recognition

Roger Federer, Honda city, Samsung Galaxy S10

4. Sentiment Analysis

product review- 1 star, 2 star, 3 star etc

Problems in RNN
Because of vanishing gradient problem, RNN do not remember what happened in the beginning
of the sentence. i.e. Short time memory

Eg:

• Today, due to my current job situation and family conditions, I need to take a loan.
• Last year, due to my current job situation and family conditions, I had to take a loan.

In the first sentence the word need can be identified using the word Today. So, RNN has to
remember the word Today to predict the word need. Because of vanishing gradient problem
RNN fails in these situations.

Modern RNNs-LSTM &GRU

LSTM (Long Short-Term Memory)


The main difference between RNN and LSTM is in terms of which one maintain information
in the memory for the long period of time. Here LSTM has advantage over RNN as LSTM
can handle the information in memory for the long period of time as compare to RNN. LSTMs
are capable of maintaining long term temporal dependencies (remembering information for
long period of time).
Difference between RNN & LSTM

LSTM networks are a type of RNN that uses special units in addition to standard units.
LSTM units include a ‘memory cell’ that can maintain information in memory for long periods
of time. This memory cell lets them learn longer-term dependencies.

LSTMs deal with vanishing and exploding gradient problem by introducing new gates,
such as input and forget gates, which allow for a better control over the gradient flow and
enable better preservation of “long-range dependencies”. The long-range dependency in RNN
is resolved by increasing the number of repeating layers in LSTM.

Architecture LSTM:

The basic difference between the architectures of RNNs and LSTMs is that the hidden layer
of LSTM is a gated unit or gated cell. It consists of four layers that interact with one another
in a way to produce the output of that cell along with the cell state. These two things are then
passed onto the next hidden layer. Unlike RNNs which have got the only single neural net
layer of tanh, LSTMs comprises of three logistic sigmoid gates and one tanh layer. Gates
have been introduced in order to limit the information that is passed through the cell. They
determine which part of the information will be needed by the next cell and which part is to
be discarded. The output is usually in the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’
means ‘include all’.
• The key to LSTMs is the cell state.
• Stores information of the past → long-term memory
• Passes along time steps with minor linear interactions →
“additive”
• Results in an uninterrupted gradient flow → errors in the
past pertain and impact learning in the future
• The LSTM cell manipulates input information with three
gates.
• Input gate → controls the intake of new information
• Forget gate → determines what part of the cell state to be
updated
• Output gate → determines what part of the cell state to
output

Forget Gate

• Step 1: Decide what information to throw away from the cell state
(memory) ➔
• The output of the previous state ℎ𝑡−1 and the new
information 𝑥𝑡 jointly determine what to forget
• ℎ𝑡−1 contains selected features from the memory 𝐶𝑡−1
• Forget gate 𝑓𝑡 ranges between [0, 1]
Input Gate

• Step 2: Prepare the updates for the cell state from input

• An alternative cell state 𝐶̃𝑡 is created from the new


information 𝑥𝑡 with the guidance of ℎ𝑡−1 .
• Input gate 𝑖𝑡 ranges between [0, 1]

Output Gate

• Step 4: Decide the filtered output from the new cell state

• tanh function filters the new cell state to characterize stored


information
• Significant information in 𝐶𝑡 → ±1
• Minor details → 0
• Output gate 𝑜𝑡 ranges between [0, 1]
• ℎ𝑡 serves as a control signal for the next time step
Gated Recurrent Unit (GRU)
GRU or Gated recurrent unit is an advancement of the standard RNN i.e recurrent neural
network. It was introduced by Kyunghyun Cho et al in the year 2014.

GRUs are very similar to Long Short Term Memory (LSTM). Just like LSTM, GRU uses gates
to control the flow of information. They are relatively new as compared to LSTM. This is the
reason they offer some improvement over LSTM and have simpler architecture.

Unlike LSTM, it does not have a separate cell state (Ct). It only has a hidden state(Ht). Due to
the simpler architecture, GRUs are faster to train.

The architecture of Gated Recurrent Unit

GRU cell which more or less similar to an LSTM cell or RNN cell.

At each timestamp t, it takes an input Xt and the hidden state Ht-1 from the previous timestamp
t-1. Later it outputs a new hidden state Ht which again passed to the next timestamp. Now there
are primarily two gates in a GRU as opposed to three gates in an LSTM cell. The first gate is
the Reset gate and the other one is the update gate.

Reset Gate (Short term memory)

The Reset Gate is responsible for the short-term memory of the network i.e the hidden state
(Ht). Here is the equation of the Reset gate.
LSTM gate equation it is very similar to that. The value of rt will range from 0 to 1 because of
the sigmoid function. Here Ur and Wr are weight matrices for the reset gate.

Update Gate (Long Term memory)

Similarly, we have an Update gate for long-term memory and the equation of the gate is shown
below.

The only difference is of weight metrics i.e Uu and Wu

Update gate: controls the composition of the new state

Reset gate: determines how much old information is needed in the alternative state ℎ̃𝑡

Alternative state: contains new information

New state: replace selected old information with new information in the new state
Encoder – Decoder sequence to sequence architectures

y0 y1 y2 … ym
Decoded sequence

Encoded semantics B B B B

An RNN as the encoder An RNN as the decoder

Input sequence
There are three main blocks in the encoder-decoder model,

• Encoder

• Hidden Vector

• Decoder

The Encoder will convert the input sequence into a single-dimensional vector (hidden vector).
The decoder will convert the hidden vector into the output sequence.

Encoder

• Multiple RNN cells can be stacked together to form the encoder. RNN reads each
inputs sequentially

• For every timestep (each input) t, the hidden state (hidden vector) h is updated
according to the input at that timestep X[i].

• After all the inputs are read by encoder model, the final hidden state of the model
represents the context/summary of the whole input sequence.

• Example: Consider the input sequence “I am a Student” to be encoded. There will


be totally 4 timesteps ( 4 tokens) for the Encoder model. At each time step, the
hidden state h will be updated using the previous hidden state and the current input.

• At the first timestep t1, the previous hidden state h0 will be considered as zero or
randomly chosen. So the first RNN cell will update the current hidden state with
the first input and h0. Each layer outputs two things — updated hidden state and
the output for each stage. The outputs at each stage are rejected and only the hidden
states will be propagated to the next layer.
• The hidden states h_i are computed using the formula:

• At second timestep t2, the hidden state h1 and the second input X[2] will be given
as input , and the hidden state h2 will be updated according to both inputs. Then the
hidden state h1 will be updated with the new input and will produce the hidden state
h2. This happens for all the four stages wrt example taken.

• A stack of several recurrent units (LSTM or GRU cells for better performance)
where each accepts a single element of the input sequence, collects information for
that element, and propagates it forward.

• In the question-answering problem, the input sequence is a collection of all words


from the question. Each word is represented as x_i where i is the order of that word.

This simple formula represents the result of an ordinary recurrent neural network. As you can
see, we just apply the appropriate weights to the previously hidden state h_(t-1) and the input
vector x_t.

Encoder Vector

• This is the final hidden state produced from the encoder part of the model. It is
calculated using the formula above.

• This vector aims to encapsulate the information for all input elements in order to
help the decoder make accurate predictions.

• It acts as the initial hidden state of the decoder part of the model.

Decoder

• The Decoder generates the output sequence by predicting the next output Yt given
the hidden state ht.

• The input for the decoder is the final hidden vector obtained at the end of encoder
model.

• Each layer will have three inputs, hidden vector from previous layer ht-1 and the
previous layer output yt-1, original hidden vector h.

• At the first layer, the output vector of encoder and the random symbol START,
empty hidden state ht-1 will be given as input, the outputs obtained will be y1 and
updated hidden state h1 (the information of the output will be subtracted from the
hidden vector).
• The second layer will have the updated hidden state h1 and the previous output y1
and original hidden vector h as current inputs, produces the hidden vector h2 and
output y2.

• The outputs occurred at each timestep of decoder is the actual output. The model
will predict the output until the END symbol occurs.

• A stack of several recurrent units where each predicts an output y_t at a time step t.

• Each recurrent unit accepts a hidden state from the previous unit and produces an
output as well as its own hidden state.

• In the question-answering problem, the output sequence is a collection of all words


from the answer. Each word is represented as y_i where i is the order of that word.

Deep recurrent networks


Up until now, we have focused on defining networks consisting of a sequence input, a single
hidden RNN layer, and an output layer. Despite having just one hidden layer between the input
at any time step and the corresponding output, there is a sense in which these networks are
deep. Inputs from the first-time step can influence the outputs at the final time step T (often
100s or 1000s of steps later). These inputs pass through T applications of the recurrent layer
before reaching the final output. However, we often also wish to retain the ability to express
complex relationships between the inputs at a given time step and the outputs at that same time
step. Thus, we often construct RNNs that are deep not only in the time direction but also in the
input-to-output direction. This is precisely the notion of depth that we have already encountered
in our development of MLPs and deep CNNs.
The standard method for building this sort of deep RNN is strikingly simple: we stack the
RNNs on top of each other. Given a sequence of length T, the first RNN produces a sequence
of outputs, also of length T. These, in turn, constitute the inputs to the next RNN layer. Each
hidden state operates on a sequential input and produces a sequential output. Moreover, any
RNN cell at each time step depends on both the same layer’s value at the previous time step
and the previous layer’s value at the same time step.

Recursive neural networks


A Recursive Neural Network is a type of deep neural network. So, with this, you can expect
& get a structured prediction by applying the same number of sets of weights on structured
inputs. With this type of processing, you get a typical deep neural network known as
a recursive neural network. These networks are non-linear in nature.

The recursive networks are adaptive models that are capable of learning deep structured
erudition. Therefore, you may say that the Recursive Neural Networks are among complex
inherent chains.

Recurrent Neural Network vs. Recursive Neural Networks

The recurrent neural network is a recursive neural network. Both the neural networks are
denoted by the same acronym – RNN. If neural networks are recurring over a period of time
or say it is a recursive networking chain type, it is a recurrent neural network. To generalize, it
belongs to the recursive network.
The above image depicts the recursive neural network. Here, if you see, you will find that each
of the parent nodes, its children are a node quite similar to the parent node. Therefore, it’s
evident that the recurrent neural network is more similar to a hierarchical network type. You
can see clearly that there is no concept of structured input & output processing here. It is just
performed in a tree-like hierarchical manner where there are no time specifications &
dependencies associated.

Hence, the major difference between the recursive neural network and recurrent neural
networks is clearly not very well defined. It is seen that the efficiency of any recursive neural
network is far better compared to a feed-forward network. Recurrent neural networks are
created in a chain-like structure. There are no branching methods, but the recurrent neural
networks are created in the form of a deep tree structure. Recurrent networks do not differ from
Recursive neural networks. But in fact, it is a Recursive neural network. There is a fact related
to that recursive networks are inherently complex and, therefore, not accepted on a broader
platform. These RNN’s are even more expensive at all computational learning stages & phases.

Application: Recursive Neural Network for sentiment analysis in sentences.

Sentiment analysis of sentences is among the major tasks of NLP (Natural Language
Processing), that can identify writers writing tone & sentiments in any specific sentences. When
a writer expresses any sentiments, basic labels around the tone of writing are identified. For
instance, whether the meaning is a constructive form of writing or negative word choices.
For instance, in the undermentioned case of the variable dataset, it expresses every emotion in
distinctive classes.

So, if you see the above image for the Sentiment analysis, it is completely implemented with
the help of Recursive Neural Networks algorithms. The RNN is a form of a recursive neural
net that has a tree structure.

You might also like