0% found this document useful (0 votes)

101 views14 pages

Long Short Term Memory Networks - Architecture of LSTM

The document introduces Long Short Term Memory (LSTM) networks, which are a type of recurrent neural network capable of learning long-term dependencies. LSTMs were developed to address the limitation of standard RNNs in remembering contexts over long periods of time. The document explains how RNNs consider sequential relationships but struggle with long-term context, due to the problem of vanishing gradients. LSTMs were created to remember information for longer by incorporating mechanisms like gates that allow them to selectively forget previous states.

Uploaded by

Rajat Jurel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views14 pages

Long Short Term Memory Networks - Architecture of LSTM

Uploaded by

Rajat Jurel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

R
GenAI Projects for 2024: Train LLMs like GPT-3.5, Create
Your ChatGPT, Build Text-to-Image Models, and more.
Download Projects

Home Advanced
Essentials of Deep Learning : Introduction to Long Short Term Memory

Essentials of Deep Learning :

Introduction to Long Short Term
Memory
Pranj52 Srivastava
18 Feb, 2024 • 13 min read

Introduction
Long Short Term Memory Networks Sequence prediction
problems have been around for a long time. They are
considered as one of the hardest problems to solve in the
data science industry. These include a wide range of
problems; from predicting sales to finding patterns in
stock markets’ data, from understanding movie plots to
recognizing your way of speech, from language
translations to predicting your next word on your iPhone’s
keyboard.
With the recent breakthroughs that have been happening
in data science, it is found that for almost all of these
sequence prediction problems, Long short Term Memory
networks, a.k.a LSTMs have been observed as the most
effective solution.
LSTMs have an edge over conventional feed-forward
neural networks and RNN in many ways. This is because
of their property of selectively remembering patterns for
long durations of time. The purpose of this article is to
explain LSTM and enable you to use it in real life
problems. Let’s have a look!
Note: To go through the article, you must have basic
knowledge of neural networks and how Keras (a deep

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 1/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

learning library) works. You can refer the mentioned

articles3to understand
28 these concepts:
Understanding Neural Network From Scratch
Fundamentals of Deep Learning – Introduction to
Recurrent Neural Networks
Tutorial: Optimizing Neural Networks using Keras (with
Image recognition case study)

Table of contents

1. Flashback: A look into Recurrent Neural

Networks (RNN)
Take an example of sequential data, which can be the
stock market’s data for a particular stock. A simple
machine learning model or an Artificial Neural Network
may learn to predict the stock prices based on a number
of features: the volume of the stock, the opening value
etc. While the price of the stock depends on these
features, it is also largely dependent on the stock values
in the previous days. In fact for a trader, these values in
the previous days (or the trend) is one major deciding
factor for predictions.
In the conventional feed-forward neural networks, all test
cases are considered to be independent. That is when
fitting the model for a particular day, there is no
consideration for the stock prices on the previous days.

This dependency on time is achieved via Recurrent Neural

Networks. A typical RNN looks like:

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 2/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

3 28

This may be intimidating at first sight, but once unfolded,

it looks a lot simpler:

Now it is easier for us to visualize how these networks are

considering the trend of stock prices, before predicting
the stock prices for today. Here every prediction at time t
(h_t) is dependent on all previous predictions and the
information learned from them.
RNNs can solve our purpose of sequence handling to a
great extent but not entirely. We want our computers to
be good enough to write Shakespearean sonnets. Now
RNNs are great when it comes to short contexts, but in
order to be able to build a story and remember it, we need
our models to be able to understand and remember the
context behind the sequences, just like a human brain.
This is not possible with a simple RNN.
Why? Let’s have a look.
2. Limitations of RNNs
Recurrent Neural Networks work just fine when we are
dealing with short-term dependencies. That is when
applied to problems like:

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 3/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

RNNs turn out to be quite effective. This is because this

problem3 has nothing
28 to do with the context of the
statement. The RNN need not remember what was said
before this, or what was its meaning, all they need to
know is that in most cases the sky is blue. Thus the
prediction would be:

However, vanilla RNNs fail to understand the context

behind an input. Something that was said long before,
cannot be recalled when making predictions in the
present. Let’s understand this as an example:

Here, we can understand that since the author has

worked in Spain for 20 years, it is very likely that he may
possess a good command over Spanish. But, to make a
proper prediction, the RNN needs to remember this
context. The relevant information may be separated from
the point where it is needed, by a huge load of irrelevant
data. This is where a Recurrent Neural Network fails!
The reason behind this is the problem of Vanishing
Gradient. In order to understand this, you’ll need to have
some knowledge about how a feed-forward neural
network learns. We know that for a conventional feed-
forward neural network, the weight updating that is
applied on a particular layer is a multiple of the learning
rate, the error term from the previous layer and the input
to that layer. Thus, the error term for a particular layer is
somewhere a product of all previous layers’ errors. When
dealing with activation functions like the sigmoid function,
the small values of its derivatives (occurring in the error
function) gets multiplied multiple times as we move
towards the starting layers. As a result of this, the
gradient almost vanishes as we move towards the starting
layers, and it becomes difficult to train these layers.
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 4/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

A similar case is observed in Recurrent Neural Networks.

3
RNN remembers 28 things for just small durations of time, i.e.
if we need the information after a small time it may be
reproducible, but once a lot of words are fed in, this
information gets lost somewhere. This issue can be
resolved by applying a slightly tweaked version of RNNs –
the Long Short-Term Memory Networks.

3. Improvement over RNN: LSTM (Long

Short-Term Memory) Networks
When we arrange our calendar for the day, we prioritize
our appointments right? If in case we need to make some
space for anything important we know which meeting
could be canceled to accommodate a possible meeting.
Turns out that an RNN doesn’t do so. In order to add a
new information, it transforms the existing information
completely by applying a function. Because of this, the
entire information is modified, on the whole, i. e. there is
no consideration for ‘important’ information and ‘not so
important’ information.
LSTMs on the other hand, make small modifications to the
information by multiplications and additions. With LSTMs,
the information flows through a mechanism known as cell
states. This way, LSTMs can selectively remember or
forget things. The information at a particular cell state has
three different dependencies.
We’ll visualize this with an example. Let’s take the example
of predicting stock prices for a particular stock. The stock
price of today will depend upon:
1. The trend that the stock has been following in the
previous days, maybe a downtrend or an uptrend.
2. The price of the stock on the previous day, because
many traders compare the stock’s previous day price
before buying it.
3. The factors that can affect the price of the stock for
today. This can be a new company policy that is being
criticized widely, or a drop in the company’s profit, or

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 5/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

maybe an unexpected change in the senior leadership

3 company.
of the 28

These dependencies can be generalized to any problem

as:
1. The previous cell state (i.e. the information that was
present in the memory after the previous time step)
2. The previous hidden state (i.e. this is the same as the
output of the previous cell)
3. The input at the current time step (i.e. the new
information that is being fed in at that moment)
Another important feature of LSTM is its analogy with
conveyor belts!
That’s right!
Industries use them to move products around for different
processes. LSTMs use this mechanism to move
information around.
We may have some addition, modification or removal of
information as it flows through the different layers, just
like a product may be molded, painted or packed while it
is on a conveyor belt.
The following diagram explains the close relationship of
LSTMs and conveyor belts.

Source
Although this diagram is not even close to the actual
architecture of an LSTM, it solves our purpose for now.
Just because of this property of LSTMs, where they do
not manipulate the entire information but rather modify
them slightly, they are able to forget and remember things
selectively. How do they do so, is what we are going to
learn in the next section?
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 6/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

4. Architecture of LSTMs
3 28
The functioning of LSTM can be visualized by
understanding the functioning of a news channel’s team
covering a murder story. Now, a news story is built around
facts, evidence and statements of many people.
Whenever a new event occurs you take either of the three
steps.
Let’s say, we were assuming that the murder was done by
‘poisoning’ the victim, but the autopsy report that just
came in said that the cause of death was ‘an impact on
the head’. Being a part of this news team what do you do?
You immediately forget the previous cause of death and
all stories that were woven around this fact.
What, if an entirely new suspect is introduced into the
picture. A person who had grudges with the victim and
could be the murderer? You input this information into
your news feed, right?
Now all these broken pieces of information cannot be
served on mainstream media. So, after a certain time
interval, you need to summarize this information and
output the relevant things to your audience. Maybe in the
form of “XYZ turns out to be the prime suspect.”.
Now let’s get into the details of the architecture of LSTM
network:

Source
Now, this is nowhere close to the simplified version which
we saw before, but let me walk you through it. A typical
LSTM network is comprised of different memory blocks
called cells
(the rectangles that we see in the image). There are two
states that are being transferred to the next cell; the cell
state and the hidden state. The memory blocks are
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 7/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

responsible for remembering things and manipulations to

3 is28done through three major mechanisms,
this memory
called gates. Each of them is being discussed below.
4.1 Forget Gate
Taking the example of a text prediction problem. Let’s
assume an LSTM is fed in, the following sentence:

As soon as the first full stop after “person” is

encountered, the forget gate realizes that there may be a
change of context in the next sentence. As a result of this,
the subject of the sentence is forgotten and the place for
the subject is vacated. And when we start speaking about
“Dan” this position of the subject is allocated to “Dan”.
This process of forgetting the subject is brought about by
the forget gate.
A forget gate is responsible for removing information from
the cell state. The information that is no longer required
for the LSTM to understand things or the information that
is of less importance is removed via multiplication of a
filter. This is required for optimizing the performance of
the LSTM network.
This gate takes in two inputs; h_t-1 and x_t.
h_t-1 is the hidden state from the previous cell or the
output of the previous cell and x_t is the input at that
particular time step. The given inputs are multiplied by the
weight matrices and a bias is added. Following this, the
sigmoid function is applied to this value. The sigmoid
function outputs a vector, with values ranging from 0 to 1,
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 8/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

corresponding to each number in the cell state. Basically,

3 function
the sigmoid 28 is responsible for deciding which
values to keep and which to discard. If a ‘0’ is output for a
particular value in the cell state, it means that the forget
gate wants the cell state to forget that piece of
information completely. Similarly, a ‘1’ means that the
forget gate wants to remember that entire piece of
information. This vector output from the sigmoid function
is multiplied to the cell state.
4.2 Input Gate
Okay, let’s take another example where the LSTM is
analyzing a sentence:

Now the important information here is that “Bob” knows

swimming and that he has served the Navy for four years.
This can be added to the cell state, however, the fact that
he told all this over the phone is a less important fact and
can be ignored. This process of adding some new
information can be done via the input gate.
Here is its structure:

The input gate is responsible for the addition of

information to the cell state. This addition of information is
basically three-step process as seen from the diagram
above.
1. Regulating what values need to be added to the cell
state by involving a sigmoid function. This is basically
very similar to the forget gate and acts as a filter for
all the information from h_t-1 and x_t.
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 9/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

2. Creating a vector containing all possible values that

can3be added28 (as perceived from h_t-1 and x_t) to the
cell state. This is done using the tanh function, which
outputs values from -1 to +1.
3. Multiplying the value of the regulatory filter (the
sigmoid gate) to the created vector (the tanh function)
and then adding this useful information to the cell
state via addition operation.
Once this three-step process is done with, we ensure that
only that information is added to the cell state that is
important and is not redundant.
4.3 Output Gate
Not all information that runs along the cell state, is fit for
being output at a certain time. We’ll visualize this with an
example:

In this phrase, there could be a number of options for the

empty space. But we know that the current input of
‘brave’, is an adjective that is used to describe a noun.
Thus, whatever word follows, has a strong tendency of
being a noun. And thus, Bob could be an apt output.
This job of selecting useful information from the current
cell state and showing it out as an output is done via the
output gate. Here is its structure:

The functioning of an output gate can again be broken

down to three steps:

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 10/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

1. Creating a vector after applying tanh function to the

cell3state, 28thereby scaling the values to the range -1 to
+1.
2. Making a filter using the values of h_t-1 and x_t, such
that it can regulate the values that need to be output
from the vector created above. This filter again
employs a sigmoid function.
3. Multiplying the value of this regulatory filter to the
vector created in step 1, and sending it out as a output
and also to the hidden state of the next cell.
The filter in the above example will make sure that it
diminishes all other values but ‘Bob’. Thus the filter needs
to be built on the input and hidden state values and be
applied on the cell state vector.
5. Text generation using LSTMs
We have had enough of theoretical concepts and
functioning of LSTMs. Now we would be trying to build a
model that can predict some n number of characters after
the original text of Macbeth. Most of the classical texts
are no longer protected under copyright and can be found
here. An updated version of the .txt file can be found here.
We will use the library Keras, which is a high-level API for
neural networks and works on top of TensorFlow or
Theano. So make sure that before diving into this code
you have Keras installed and functional.
Okay, so let’s generate some text!
Importing dependencies
# Importing dependencies numpy and keras
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.utils import np_utils

We import all the required dependencies and this is pretty

much self-explanatory.
Loading text file and creating character to
integer mappings
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 11/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

# load text
3 28
filename = "/macbeth.txt"

text = (open(filename).read()).lower()

# mapping characters with integers

unique_chars = sorted(list(set(text)))

char_to_int = {}
int_to_char = {}

for i, c in enumerate (unique_chars):

char_to_int.update({c: i})
int_to_char.update({i: c})

The text file is open, and all characters are converted to

lowercase letters. In order to facilitate the following steps,
we would be mapping each character to a respective
number. This is done to make the computation part of the
LSTM easier.
Preparing dataset
# preparing input and output dataset
X = []
Y = []

for i in range(0, len(text) - 50, 1):

sequence = text[i:i + 50]
label =text[i + 50]
X.append([char_to_int[char] for char in sequence])
Y.append(char_to_int[label])

Data is prepared in a format such that if we want the

LSTM to predict the ‘O’ in ‘HELLO’ we would feed in [‘H’,
‘E‘ , ‘L ‘ , ‘L‘ ] as the input and [‘O’] as the expected output.
Similarly, here we fix the length of the sequence that we
want (set to 50 in the example) and then save the
encodings of the first 49 characters in X and the expected
output i.e. the 50th character in Y.
Reshaping of X
# reshaping, normalizing and one hot encoding
X_modified = numpy.reshape(X, (len(X), 50, 1))
X_modified = X_modified / float(len(unique_chars))
Y_modified = np_utils.to_categorical(Y)

A LSTM network expects the input to be in the form

[samples, time steps, features] where samples is the
number of data points we have, time steps is the number
of time-dependent steps that are there in a single data
point, features refers to the number of variables we have
for the corresponding true value in Y. We then scale the
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 12/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

values in X_modified between 0 to 1 and one hot encode

our true3 values28in Y_modified.
Defining the LSTM model
# defining the LSTM model
model = Sequential()
model.add(LSTM(300, input_shape=(X_modified.shape[1], X_modifie
model.add(Dropout(0.2))
model.add(LSTM(300))
model.add(Dropout(0.2))
model.add(Dense(Y_modified.shape[1], activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam'

A sequential model which is a linear stack of layers is

used. The first layer is an LSTM layer with 300 memory
units and it returns sequences. This is done to ensure that
the next LSTM layer receives sequences and not just
randomly scattered data. A dropout layer is applied after
each LSTM layer to avoid overfitting of the model. Finally,
we have the last layer as a fully connected layer with a
‘softmax’ activation and neurons equal to the number of
unique characters, because we need to output one hot
encoded result.
Fitting the model and generating characters
# fitting the model
model.fit(X_modified, Y_modified, epochs=1, batch_size=30)

# picking a random seed

start_index = numpy.random.randint(0, len(X)-1)
new_string = X[start_index]

# generating characters
for i in range(50):
x = numpy.reshape(new_string, (1, len(new_string), 1))
x = x / float(len(unique_chars))

#predicting
pred_index = numpy.argmax(model.predict(x, verbose=0))
char_out = int_to_char[pred_index]
seq_in = [int_to_char[value] for value in new_string]
print(char_out)

new_string.append(pred_index)
new_string = new_string[1:len(new_string)]

The model is fit over 100 epochs, with a batch size of 30.
We then fix a random seed (for easy reproducibility) and
start generating characters. The prediction from the
model gives out the character encoding of the predicted
character, it is then decoded back to the character value
and appended to the pattern.
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 13/17
2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

This is how the output of the network would look like

3 28

Eventually, after enough training epochs, it will give better

and better results over the time. This is how you would
use LSTM to solve a sequence prediction task.
End Notes
LSTMs are a very promising solution to sequence and time
series related problems. However, the one disadvantage
that I find about them, is the difficulty in training them. A
lot of time and system resources go into training even a
simple model. But that is just a hardware constraint! I
hope I was successful in giving you a basic understanding
of these networks. For any problems or issues related to
the blog, please feel free to comment below.
Learn, engage , hack and get hired!
forget gate input gate
long short term memory network LSTM
output gate RNN sequence prediction
text generator Time Series

Pranj52 Srivastava
18 Feb 2024
I am a Senior Undergraduate at IIT (BHU), Varanasi and a
Deep Learning enthusiast. Data is surely going to be the
biggest thing of this century, instead of witnessing this as
a mere spectator, I chose to be a part of this revolution.

Advanced Algorithm Deep Learning NLP

Python

Responses From Readers

https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/ 14/17

LSTM Solutions for Stock Prediction
No ratings yet
LSTM Solutions for Stock Prediction
16 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Stock Prediction with RNN and LSTM
0% (1)
Stock Prediction with RNN and LSTM
24 pages
RNN Part1
No ratings yet
RNN Part1
12 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
Deep Learning RNN
100% (2)
Deep Learning RNN
53 pages
LSTM
No ratings yet
LSTM
22 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
RNNs
No ratings yet
RNNs
22 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
DS303 RNN LSTM
No ratings yet
DS303 RNN LSTM
16 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
Unit 4
No ratings yet
Unit 4
50 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
LSTM, RNN
No ratings yet
LSTM, RNN
38 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Lec 10
No ratings yet
Lec 10
37 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
LSTM Networks for AI Enthusiasts
No ratings yet
LSTM Networks for AI Enthusiasts
8 pages
DL Unit-4
No ratings yet
DL Unit-4
31 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
Final PDL - Unit IV
No ratings yet
Final PDL - Unit IV
51 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
DeepLear Qes
No ratings yet
DeepLear Qes
9 pages
Module 6
No ratings yet
Module 6
42 pages
Part 5
No ratings yet
Part 5
37 pages
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
No ratings yet
Deep Learning - AD3501 - Notes - Unit 3 - Recurrent Neural Networks
29 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Module 4 Recurrent Neural Network
100% (1)
Module 4 Recurrent Neural Network
78 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
No ratings yet
AD3501 DL UNIT 3 Notes - Nil AD3501 DL UNIT 3 Notes - Nil
31 pages
RNN With LSTM
No ratings yet
RNN With LSTM
41 pages
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
No ratings yet
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
9 pages
5 Deep Learning RNNs
No ratings yet
5 Deep Learning RNNs
26 pages
RNN With LSTM
No ratings yet
RNN With LSTM
36 pages
LSTM
No ratings yet
LSTM
123 pages
Slides RNN
No ratings yet
Slides RNN
75 pages
RNNs and Their Types - Simple Explanation
No ratings yet
RNNs and Their Types - Simple Explanation
5 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
Advanced Deep Learning with RNNs
No ratings yet
Advanced Deep Learning with RNNs
50 pages
RNNs Explained for Tech Enthusiasts
No ratings yet
RNNs Explained for Tech Enthusiasts
6 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Unit 4
No ratings yet
Unit 4
27 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
LSTM Networks for AI Enthusiasts
No ratings yet
LSTM Networks for AI Enthusiasts
16 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Slide 1
No ratings yet
Slide 1
5 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
15 pages
Agilent Technologies E7475A GSM Drive-Test System: Product Overview
No ratings yet
Agilent Technologies E7475A GSM Drive-Test System: Product Overview
16 pages
Cube-Voyager - Technical Brochure
No ratings yet
Cube-Voyager - Technical Brochure
3 pages
Aac Sural
No ratings yet
Aac Sural
8 pages
Digital Transmission
No ratings yet
Digital Transmission
48 pages
Millikan Oil Drop Experiment
No ratings yet
Millikan Oil Drop Experiment
6 pages
Chemistry Worksheet 5
No ratings yet
Chemistry Worksheet 5
4 pages
AC Compressor Manual
No ratings yet
AC Compressor Manual
48 pages
APM Agents
No ratings yet
APM Agents
102 pages
Oracle Analytic Functions Guide
100% (1)
Oracle Analytic Functions Guide
3 pages
PK Valve Page-13-17
No ratings yet
PK Valve Page-13-17
5 pages
Power Plant Steam Turbine Issues
100% (1)
Power Plant Steam Turbine Issues
10 pages
Automatic Light Reflector
67% (3)
Automatic Light Reflector
6 pages
Sintang City Drainage Masterplan
No ratings yet
Sintang City Drainage Masterplan
11 pages
Did You Know That Over $140 Billion
No ratings yet
Did You Know That Over $140 Billion
11 pages
0625 w15 Ms 61
No ratings yet
0625 w15 Ms 61
5 pages
Week 24 Lesson Plan
No ratings yet
Week 24 Lesson Plan
2 pages
SJI 2017 Prelims
No ratings yet
SJI 2017 Prelims
23 pages
ALLGON 2002 Product Catalog
0% (1)
ALLGON 2002 Product Catalog
92 pages
GEA PHE NT en
100% (3)
GEA PHE NT en
2 pages
Synopsis DIYA TERM 2
No ratings yet
Synopsis DIYA TERM 2
54 pages
Newtons Law of Motion - Phys213
No ratings yet
Newtons Law of Motion - Phys213
8 pages
LAB 1 Installing Servers
No ratings yet
LAB 1 Installing Servers
7 pages
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
No ratings yet
Multiplying Two-Digit by Two-Digit Numbers Education Presentation in Cream Green Orange Nostalgic Handdrawn Style
13 pages
Legendre Polynomials Assignment
100% (1)
Legendre Polynomials Assignment
2 pages
FMEA
100% (2)
FMEA
55 pages
All Inequalities Questions From CAT Previous Papers PDF
No ratings yet
All Inequalities Questions From CAT Previous Papers PDF
24 pages
Extraction Notes
No ratings yet
Extraction Notes
16 pages
Vehicle Technology: Curtis Instruments, Inc
No ratings yet
Vehicle Technology: Curtis Instruments, Inc
11 pages
Latihan Soal-Soal Bab 1-4 (Fismod)
No ratings yet
Latihan Soal-Soal Bab 1-4 (Fismod)
35 pages
Graph Theory Applications in Science & CS
No ratings yet
Graph Theory Applications in Science & CS
4 pages

Long Short Term Memory Networks - Architecture of LSTM

Uploaded by

Long Short Term Memory Networks - Architecture of LSTM

Uploaded by

2/20/24, 10:57 AM Long Short Term Memory Networks | Architecture Of LSTM

Essentials of Deep Learning :

learning library) works. You can refer the mentioned

1. Flashback: A look into Recurrent Neural

This dependency on time is achieved via Recurrent Neural

This may be intimidating at first sight, but once unfolded,

Now it is easier for us to visualize how these networks are

RNNs turn out to be quite effective. This is because this

However, vanilla RNNs fail to understand the context

Here, we can understand that since the author has

A similar case is observed in Recurrent Neural Networks.

3. Improvement over RNN: LSTM (Long

maybe an unexpected change in the senior leadership

These dependencies can be generalized to any problem

responsible for remembering things and manipulations to

As soon as the first full stop after “person” is

corresponding to each number in the cell state. Basically,

Now the important information here is that “Bob” knows

The input gate is responsible for the addition of

2. Creating a vector containing all possible values that

In this phrase, there could be a number of options for the

The functioning of an output gate can again be broken

1. Creating a vector after applying tanh function to the

We import all the required dependencies and this is pretty

# mapping characters with integers

for i, c in enumerate (unique_chars):

The text file is open, and all characters are converted to

for i in range(0, len(text) - 50, 1):

Data is prepared in a format such that if we want the

A LSTM network expects the input to be in the form

values in X_modified between 0 to 1 and one hot encode

A sequential model which is a linear stack of layers is

# picking a random seed

This is how the output of the network would look like

Eventually, after enough training epochs, it will give better

Advanced Algorithm Deep Learning NLP

Responses From Readers

You might also like