0% found this document useful (0 votes)

10 views

LSTM & GRU

Uploaded by

sankeerthrockz2002

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

LSTM & GRU

Uploaded by

sankeerthrockz2002

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction to Long Short Term Memory

(LSTM)
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-
memory-lstm/?utm_source=blog&utm_medium=gated_recurrent_unit

https://youtu.be/Z03f7Wu5a6A
Shipra Saxena — Published On March 16, 2021 and Last Modified On March 18th, 2021

Advanced Deep Learning Videos

Objective
● LSTM is a special kind of recurrent neural network capable of
handling long-term dependencies.

● Understand the architecture and working of an LSTM network

Introduction
Long Short Term Memory Network is an advanced RNN, a sequential

network, that allows information to persist. It is capable of handling

the vanishing gradient problem faced by RNN. A recurrent neural

network also known as RNN is used for persistent memory.

Let’s say while watching a video you remember the previous scene or

while reading a book you know what happened in the earlier chapter.

Similarly RNNs work, they remember the previous information and

use it for processing the current input. The shortcoming of RNN is,
they can not remember Long term dependencies due to vanishing

gradient. LSTMs are explicitly designed to avoid long-term

dependency problems.

LSTM Architecture
At a high-level LSTM works very much like an RNN cell. Here is the

internal functioning of the LSTM network. The LSTM consists of three

parts, as shown in the image below and each part performs an

individual function.

The first part chooses whether the information coming from the

previous timestamp is to be remembered or is irrelevant and can be

forgotten. In the second part, the cell tries to learn new information

from the input to this cell. At last, in the third part, the cell passes the

updated information from the current timestamp to the next

timestamp.
These three parts of an LSTM cell are known as gates. The first part is

called Forget gate, the second part is known as the Input gate and the

last one is the Output gate.

Just like a simple RNN, an LSTM also has a hidden state where H(t-1)

represents the hidden state of the previous timestamp and Ht is the

hidden state of the current timestamp. In addition to that LSTM also

have a cell state represented by C(t-1) and C(t) for previous and

current timestamp respectively.

Here the hidden state is known as Short term memory and the cell

state is known as Long term memory. Refer to the following image.

It is interesting to note that the cell state carries the information along

with all the timestamps.

Let’s take an example to understand how LSTM works. Here we have

two sentences separated by a full stop. The first sentence is “Bob is a

nice person” and the second sentence is “Dan, on the Other hand, is
evil”. It is very clear, in the first sentence we are talking about Bob

and as soon as we encounter the full stop(.) we started talking about

Dan.

As we move from the first sentence to the second sentence, our

network should realize that we are no more talking about Bob. Now

our subject is Dan. Here, the Forget gate of the network allows it to

forget about it. Let’s understand the roles played by these gates in

LSTM architecture.

Forget Gate

In a cell of the LSTM network, the first step is to decide whether we

should keep the information from the previous timestamp or forget it.

Here is the equation for forget gate.

Let’s try to understand the equation, here

● Xt: input to the current timestamp.

● Uf: weight associated with the input

● Ht-1: The hidden state of the previous timestamp

● Wf: It is the weight matrix associated with hidden state

Later, a sigmoid function is applied over it. That will make ft a number

between 0 and 1. This ft is later multiplied with the cell state of the

previous timestamp as shown below.

If ft is 0 then the network will forget everything and if the value of ft is

1 it will forget nothing. Let’s get back to our example, The first

sentence was talking about Bob and after a full stop, the network will

encounter Dan, in an ideal case the network should forget about Bob.

Input Gate

Let’s take another example

“Bob knows swimming. He told me over the phone that he had served

the navy for four long years.”

So, in both these sentences, we are talking about Bob. However, both

give different kinds of information about Bob. In the first sentence, we

get the information that he knows swimming. Whereas the second

sentence tells he uses the phone and served in the navy for four

years.

Now just think about it, based on the context given in the first

sentence, which information of the second sentence is critical. First,

he used the phone to tell or he served in the navy. In this context, it

doesn’t matter whether he used the phone or any other medium of

communication to pass on the information. The fact that he was in the

navy is important information and this is something we want our

model to remember. This is the task of the Input gate.

Input gate is used to quantify the importance of the new information

carried by the input. Here is the equation of the input gate

Here,

● Xt: Input at the current timestamp t

● Ui: weight matrix of input

● Ht-1: A hidden state at the previous timestamp

● Wi: Weight matrix of input associated with hidden state

Again we have applied sigmoid function over it. As a result, the value

of I at timestamp t will be between 0 and 1.

New information

Now the new information that needed to be passed to the cell state is

a function of a hidden state at the previous timestamp t-1 and input x

at timestamp t. The activation function here is tanh. Due to the tanh

function, the value of new information will between -1 and 1. If the

value is of Nt is negative the information is subtracted from the cell

state and if the value is positive the information is added to the cell

state at the current timestamp.

However, the Nt won’t be added directly to the cell state. Here comes

the updated equation

Here, Ct-1 is the cell state at the current timestamp and others are the

values we have calculated previously.

Output Gate

Now consider this sentence

“Bob single-handedly fought the enemy and died for his country. For

his contributions, brave________ .”

During this task, we have to complete the second sentence. Now, the

minute we see the word brave, we know that we are talking about a

person. In the sentence only Bob is brave, we can not say the enemy is

brave or the country is brave. So based on the current expectation we

have to give a relevant word to fill in the blank. That word is our

output and this is the function of our Output gate.

Here is the equation of the Output gate, which is pretty similar to the

two previous gates.

Its value will also lie between 0 and 1 because of this sigmoid function.

Now to calculate the current hidden state we will use Ot and tanh of

the updated cell state. As shown below.

It turns out that the hidden state is a function of Long term memory

(Ct) and the current output. If you need to take the output of the

current timestamp just apply the SoftMax activation on hidden state

Ht.
Here the token

with the maximum score in the output is the prediction.

This is the More intuitive diagram of the LSTM network.

This diagram is taken from an interesting blog. I urge you all to go

through it. Here is the link-

● Understanding LSTM Networks

End Notes
To summarize, in this article we saw the architecture of a sequential

model LSTM and how it works in detail.

Introduction to Gated Recurrent Unit
(GRU)
Shipra Saxena — Published On March 17, 2021 and Last Modified On March 18th, 2021
Advanced Deep Learning Videos

https://www.analyticsvidhya.com/blog/2021/03/introduction-to-gated-recurrent-unit-gru/
https://youtu.be/6IBhu4tDpOI
Objective
● In sequence modeling techniques, the Gated Recurrent Unit is

the newest entrant after RNN and LSTM, hence it offers an

improvement over the other two.

● Understand the working of GRU and how it is different from

LSTM

Introduction
GRU or Gated recurrent unit is an advancement of the standard RNN

i.e recurrent neural network. It was introduced by Kyunghyun Cho et

al in the year 2014.

Note: If you are more interested in learning concepts in an Audio-

Visual format, We have this entire article explained in the video below.

If not, you may continue reading.

GRUs are very similar to Long Short Term Memory(LSTM). Just like

LSTM, GRU uses gates to control the flow of information. They are

relatively new as compared to LSTM. This is the reason they offer

some improvement over LSTM and have simpler architecture.

Another Interesting thing about GRU is that, unlike LSTM, it does not

have a separate cell state (Ct). It only has a hidden state(Ht). Due to

the simpler architecture, GRUs are faster to train.

In case you are unaware of the LSTM network, I will suggest you go

through the following article-

● Introduction to Long Short term Memory(LSTM)

The architecture of Gated Recurrent Unit

Now lets’ understand how GRU works. Here we have a GRU cell which

more or less similar to an LSTM cell or RNN cell.

At each timestamp t, it takes an input

Xt and the hidden state Ht-1 from the previous timestamp t-1. Later it

outputs a new hidden state Ht which again passed to the next

timestamp.

Now there are primarily two gates in a GRU as opposed to three gates

in an LSTM cell. The first gate is the Reset gate and the other one is

the update gate.

Reset Gate (Short term memory)

The Reset Gate is responsible for the short-term memory of the

network i.e the hidden state (Ht). Here is the equation of the Reset

gate.

If you remember from the LSTM gate equation it is very similar to

that. The value of rt will range from 0 to 1 because of the sigmoid

function. Here Ur and Wr are weight matrices for the reset gate.
Update Gate (Long Term memory)
Similarly, we have an Update gate for long-term memory and the

equation of the gate is shown below.

The only difference is of weight metrics i.e Uu and Wu.

How GRU Works

Now let’s see the functioning of these gates. To find the Hidden state

Ht in GRU, it follows a two-step process. The first step is to generate

what is known as the candidate hidden state. As shown below

Candidate Hidden State

It takes in the input and the hidden state from the previous timestamp

t-1 which is multiplied by the reset gate output rt. Later passed this

entire information to the tanh function, the resultant value is the

candidate’s hidden state.

The most important part of this equation is how we are using the value

of the reset gate to control how much influence the previous hidden

state can have on the candidate state.

If the value of rt is equal to 1 then it means the entire information

from the previous hidden state Ht-1 is being considered. Likewise, if

the value of rt is 0 then that means the information from the previous

hidden state is completely ignored.

Hidden state
Once we have the candidate state, it is used to generate the current

hidden state Ht. It is where the Update gate comes into the picture.

Now, this is a very interesting equation, instead of using a separate

gate like in LSTM in GRU we use a single update gate to control both

the historical information which is Ht-1 as well as the new information

which comes from the candidate state.

Now assume the value of ut is around 0 then the first term in the

equation will vanish which means the new hidden state will not have

much information from the previous hidden state. On the other hand,

the second part becomes almost one that essentially means the hidden

state at the current timestamp will consist of the information from the

candidate state only.

Similarly, if the value of ut is on the second term will become entirely

0 and the current hidden state will entirely depend on the first term i.e

the information from the hidden state at the previous timestamp t-1.

Hence we can conclude that the value of ut is very critical in this

equation and it can range from 0 to 1.

In case, you are interested to know more about GRU I suggest you

read this Paper.

End Notes
So just to summarize, Let’s see how different GRU is from LSTM.

LSTM has three gates on the other hand GRU has only two gates.

In LSTM they are the Input gate, Forget gate, and Output gate.

Whereas in GRU we have a Reset gate and Update gate.

In LSTM we have two states Cell state or Long term memory and

Hidden state also known as Short term memory.

In the case of GRU, there is only one state i.e Hidden state (Ht).

What is LSTM - Long Short Term Memory_ - GeeksforGeeks
No ratings yet
What is LSTM - Long Short Term Memory_ - GeeksforGeeks
10 pages
Introduction To Long Short Term Memory LSTM
No ratings yet
Introduction To Long Short Term Memory LSTM
6 pages
LSTM
No ratings yet
LSTM
14 pages
4. LSTM and GRU
No ratings yet
4. LSTM and GRU
22 pages
MODULE 4
No ratings yet
MODULE 4
14 pages
UNIT_2_DL
No ratings yet
UNIT_2_DL
44 pages
Lecture 3 LSTM,GRU.pptx
No ratings yet
Lecture 3 LSTM,GRU.pptx
45 pages
UNIT_2_DL[1]
No ratings yet
UNIT_2_DL[1]
43 pages
DLT UNIT-4
No ratings yet
DLT UNIT-4
18 pages
LSTM_ppt
No ratings yet
LSTM_ppt
22 pages
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
No ratings yet
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
15 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
Machine-learning-Unit-4-RNN (2)
No ratings yet
Machine-learning-Unit-4-RNN (2)
11 pages
LSTM
No ratings yet
LSTM
22 pages
RNN
No ratings yet
RNN
28 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
LSTM
No ratings yet
LSTM
12 pages
LSTM Deep Learning
No ratings yet
LSTM Deep Learning
11 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
LSTM
No ratings yet
LSTM
19 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
UNIT-5-Modern Recurrent Neural Networks
No ratings yet
UNIT-5-Modern Recurrent Neural Networks
60 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
NN Text Generation Zaid Bouslikhin
No ratings yet
NN Text Generation Zaid Bouslikhin
14 pages
LSTM 006
No ratings yet
LSTM 006
6 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
longshorttermmemorylstm-231215171600-1feb7b1b
No ratings yet
longshorttermmemorylstm-231215171600-1feb7b1b
17 pages
lstm
No ratings yet
lstm
12 pages
CS 601 Machine Learning Unit 4
No ratings yet
CS 601 Machine Learning Unit 4
14 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Addition Multiplication RNN
No ratings yet
Addition Multiplication RNN
7 pages
RNN_2
No ratings yet
RNN_2
144 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
5 LSTM
No ratings yet
5 LSTM
4 pages
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
No ratings yet
RNN & LSTM: Vamsi Krishna B 1 9 M E 0 2 3
14 pages
DL Unit-4
No ratings yet
DL Unit-4
4 pages
LSTM
No ratings yet
LSTM
24 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
This Is A Cat, and - Is A Good Pet Animal
No ratings yet
This Is A Cat, and - Is A Good Pet Animal
17 pages
LSTM
No ratings yet
LSTM
11 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
Long Short-Term Memory
No ratings yet
Long Short-Term Memory
9 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
EPJ LSTM Survey
No ratings yet
EPJ LSTM Survey
14 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Unit 3
No ratings yet
Unit 3
8 pages
LSTM.pptx
No ratings yet
LSTM.pptx
11 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
SPANNING TREE PROTOCOL: Most important topic in switching
From Everand
SPANNING TREE PROTOCOL: Most important topic in switching
Mulayam Singh
No ratings yet
Combs Method: Fundamentals and Applications
From Everand
Combs Method: Fundamentals and Applications
Fouad Sabry
No ratings yet
14326_253_125_Example_of_a_PL
No ratings yet
14326_253_125_Example_of_a_PL
2 pages
14393_253_125_concurrency_control_NE
No ratings yet
14393_253_125_concurrency_control_NE
48 pages
Adobe Scan 12-Mar-2024 (2)
No ratings yet
Adobe Scan 12-Mar-2024 (2)
1 page
C++ reduced list 2021
No ratings yet
C++ reduced list 2021
13 pages
Need for Upsampling in GANs
No ratings yet
Need for Upsampling in GANs
6 pages
Real-time_eye_blink_detection_using_general_camera
No ratings yet
Real-time_eye_blink_detection_using_general_camera
8 pages
Novel_Transfer_Learning_Approach_for_Driver_Drowsiness_Detection_Using_Eye_Movement_Behavior
No ratings yet
Novel_Transfer_Learning_Approach_for_Driver_Drowsiness_Detection_Using_Eye_Movement_Behavior
14 pages
14743_253_125_21_8_UnitTesting
No ratings yet
14743_253_125_21_8_UnitTesting
19 pages
Q1
No ratings yet
Q1
3 pages
Image Sampling and Quantization
100% (1)
Image Sampling and Quantization
41 pages
Aiml Question Bank
No ratings yet
Aiml Question Bank
4 pages
Tos Math 10-1st Quarter
No ratings yet
Tos Math 10-1st Quarter
2 pages
The Euler Method
No ratings yet
The Euler Method
28 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Sas Handour EEE f243
No ratings yet
Sas Handour EEE f243
3 pages
DSP PPT LISt
No ratings yet
DSP PPT LISt
2 pages
Speeding Up Kernel Methods, and Intro To Unsupervised Learning
No ratings yet
Speeding Up Kernel Methods, and Intro To Unsupervised Learning
103 pages
Lec15 PDF
No ratings yet
Lec15 PDF
83 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
ML.1.Lecture.9 (Where It Actually Comes From)
No ratings yet
ML.1.Lecture.9 (Where It Actually Comes From)
31 pages
K-NN Algorithm in Machine Learning
No ratings yet
K-NN Algorithm in Machine Learning
11 pages
DSP Lab Python Tutorial
No ratings yet
DSP Lab Python Tutorial
3 pages
Question Bank
No ratings yet
Question Bank
5 pages
Network Security MAC
No ratings yet
Network Security MAC
55 pages
Practice Problems
No ratings yet
Practice Problems
2 pages
Antenna 5
No ratings yet
Antenna 5
11 pages
Best-First Search
No ratings yet
Best-First Search
2 pages
Implementation of Single Layer Perceptron Model Using MATLAB
No ratings yet
Implementation of Single Layer Perceptron Model Using MATLAB
5 pages
Bit Cipher 1 Example of Bit Cipher 2 Practical Stream Cipher 3
No ratings yet
Bit Cipher 1 Example of Bit Cipher 2 Practical Stream Cipher 3
13 pages
Exam DM 071214 Ans
No ratings yet
Exam DM 071214 Ans
7 pages
Lab Report Template
No ratings yet
Lab Report Template
3 pages
Digital Arithmetic - Ercegovac & Lang 2004 Chapter 7: Solutions To Exercises
No ratings yet
Digital Arithmetic - Ercegovac & Lang 2004 Chapter 7: Solutions To Exercises
6 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
Main Focus: Sanitization Problem Is NP-hard. Towards The Solution
No ratings yet
Main Focus: Sanitization Problem Is NP-hard. Towards The Solution
1 page
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
DS Module 4 Trees
No ratings yet
DS Module 4 Trees
21 pages
BSC 3 Year Mathematics Operations Research 2520 May 2021
No ratings yet
BSC 3 Year Mathematics Operations Research 2520 May 2021
2 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
55 pages