LSTM & GRU
LSTM & GRU
(LSTM)
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-
memory-lstm/?utm_source=blog&utm_medium=gated_recurrent_unit
https://youtu.be/Z03f7Wu5a6A
Shipra Saxena — Published On March 16, 2021 and Last Modified On March 18th, 2021
Objective
● LSTM is a special kind of recurrent neural network capable of
handling long-term dependencies.
Introduction
Long Short Term Memory Network is an advanced RNN, a sequential
Let’s say while watching a video you remember the previous scene or
while reading a book you know what happened in the earlier chapter.
use it for processing the current input. The shortcoming of RNN is,
they can not remember Long term dependencies due to vanishing
dependency problems.
LSTM Architecture
At a high-level LSTM works very much like an RNN cell. Here is the
individual function.
The first part chooses whether the information coming from the
forgotten. In the second part, the cell tries to learn new information
from the input to this cell. At last, in the third part, the cell passes the
timestamp.
These three parts of an LSTM cell are known as gates. The first part is
called Forget gate, the second part is known as the Input gate and the
Just like a simple RNN, an LSTM also has a hidden state where H(t-1)
have a cell state represented by C(t-1) and C(t) for previous and
Here the hidden state is known as Short term memory and the cell
nice person” and the second sentence is “Dan, on the Other hand, is
evil”. It is very clear, in the first sentence we are talking about Bob
Dan.
network should realize that we are no more talking about Bob. Now
our subject is Dan. Here, the Forget gate of the network allows it to
forget about it. Let’s understand the roles played by these gates in
LSTM architecture.
Forget Gate
should keep the information from the previous timestamp or forget it.
Later, a sigmoid function is applied over it. That will make ft a number
between 0 and 1. This ft is later multiplied with the cell state of the
1 it will forget nothing. Let’s get back to our example, The first
sentence was talking about Bob and after a full stop, the network will
encounter Dan, in an ideal case the network should forget about Bob.
Input Gate
So, in both these sentences, we are talking about Bob. However, both
sentence tells he uses the phone and served in the navy for four
years.
Now just think about it, based on the context given in the first
Again we have applied sigmoid function over it. As a result, the value
New information
Now the new information that needed to be passed to the cell state is
state and if the value is positive the information is added to the cell
Here, Ct-1 is the cell state at the current timestamp and others are the
Output Gate
“Bob single-handedly fought the enemy and died for his country. For
During this task, we have to complete the second sentence. Now, the
minute we see the word brave, we know that we are talking about a
person. In the sentence only Bob is brave, we can not say the enemy is
have to give a relevant word to fill in the blank. That word is our
Here is the equation of the Output gate, which is pretty similar to the
Now to calculate the current hidden state we will use Ot and tanh of
It turns out that the hidden state is a function of Long term memory
(Ct) and the current output. If you need to take the output of the
Ht.
Here the token
End Notes
To summarize, in this article we saw the architecture of a sequential
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-gated-recurrent-unit-gru/
https://youtu.be/6IBhu4tDpOI
Objective
● In sequence modeling techniques, the Gated Recurrent Unit is
LSTM
Introduction
GRU or Gated recurrent unit is an advancement of the standard RNN
Visual format, We have this entire article explained in the video below.
LSTM, GRU uses gates to control the flow of information. They are
Another Interesting thing about GRU is that, unlike LSTM, it does not
have a separate cell state (Ct). It only has a hidden state(Ht). Due to
In case you are unaware of the LSTM network, I will suggest you go
Xt and the hidden state Ht-1 from the previous timestamp t-1. Later it
timestamp.
Now there are primarily two gates in a GRU as opposed to three gates
in an LSTM cell. The first gate is the Reset gate and the other one is
network i.e the hidden state (Ht). Here is the equation of the Reset
gate.
function. Here Ur and Wr are weight matrices for the reset gate.
Update Gate (Long Term memory)
Similarly, we have an Update gate for long-term memory and the
It takes in the input and the hidden state from the previous timestamp
t-1 which is multiplied by the reset gate output rt. Later passed this
The most important part of this equation is how we are using the value
of the reset gate to control how much influence the previous hidden
the value of rt is 0 then that means the information from the previous
Hidden state
Once we have the candidate state, it is used to generate the current
hidden state Ht. It is where the Update gate comes into the picture.
gate like in LSTM in GRU we use a single update gate to control both
Now assume the value of ut is around 0 then the first term in the
equation will vanish which means the new hidden state will not have
much information from the previous hidden state. On the other hand,
the second part becomes almost one that essentially means the hidden
state at the current timestamp will consist of the information from the
0 and the current hidden state will entirely depend on the first term i.e
the information from the hidden state at the previous timestamp t-1.
In case, you are interested to know more about GRU I suggest you
End Notes
So just to summarize, Let’s see how different GRU is from LSTM.
LSTM has three gates on the other hand GRU has only two gates.
In LSTM they are the Input gate, Forget gate, and Output gate.
In LSTM we have two states Cell state or Long term memory and
In the case of GRU, there is only one state i.e Hidden state (Ht).