0% found this document useful (0 votes)
59 views

RNN and LSTM

Recurrent neural networks (RNNs) are neural networks that can process sequential data by incorporating information about previous elements in the sequence into the current state. Unlike feedforward neural networks, RNNs contain loops that allow information to persist. Long short-term memory (LSTM) networks are a type of RNN designed to avoid the long-term dependency problem by using gates to control the flow of information. RNNs and LSTMs are well-suited for tasks involving sequential data like natural language processing, speech recognition, and time series prediction.

Uploaded by

Abhiroop Chadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

RNN and LSTM

Recurrent neural networks (RNNs) are neural networks that can process sequential data by incorporating information about previous elements in the sequence into the current state. Unlike feedforward neural networks, RNNs contain loops that allow information to persist. Long short-term memory (LSTM) networks are a type of RNN designed to avoid the long-term dependency problem by using gates to control the flow of information. RNNs and LSTMs are well-suited for tasks involving sequential data like natural language processing, speech recognition, and time series prediction.

Uploaded by

Abhiroop Chadha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Recurrent Neural Networks (RNN)

An RNN is a type of neural network architecture specifically designed to


work with sequential data. Unlike traditional feedforward neural networks,
RNNs have connections that create a loop, allowing information to be
passed from one step of the sequence to the next.
2

Neural Networks ▪ Output depends on


Current Input

▪ No Cycles or loop in
▪ Network No Memory about
the past
3

Recurrent Neural
Networks
▪ Can handle sequential data
▪ Considers the current input and
also the previously received
inputs
▪ Can memorize inputs due to its
internal memory

4
5

WHY RNN ?
Application of RNN
Time Series
2 NLP
Text Classification, Sentiment Analysis,
Document
prediction
Summary, Answer
3 Machine Translation
Translate the input into different language
4 Image Captioning
Caption the image by analysing the activities
5 Speech Recognition
PURPOSE

RNNs are well-suited for tasks where the temporal order of the data matters. This
includes applications such as time series prediction, natural language processing,
and speech recognition.
11

Networks with loops in them, allowing


information to persist
12

An unrolled recurrent
neural network
18
19

Recurrent Neuron
20
21
22
30

RNNs face challenges during


training, particularly the
vanishing gradient problem.
This occurs when the
gradients of the loss function
become extremely small
CHALLANGES
during backpropagation,
making it difficult for the
network to learn and capture
long-term dependencies.
30

Solution of RNN Problem

LSTMs were introduced to


address the shortcomings of
traditional RNNs, especially
the vanishing gradient LSTM
problem. This problem
hinders the ability of RNNs
to capture long-range
dependencies in the data.
31

Long Short Term


Memory networks -
“LSTM”
▪A Special kind of RNN,
capable of learning
long-term dependencies

▪Introduced by Hochreiter
& Schmidhuber (1997)
32

In standard RNNs, this


repeating module will have
a very simple structure,
such as a single tanh layer.
33

LSTMs also have this chain like structure,


but the repeating module has a different
structure. Instead of having a single
neural network layer, there are four,
interacting in a very special way.
34

Notation
35

Input Gates
Hidden State
LSTM Breakdown Forget Gates
Output Gates
36

An LSTM has three of these


gates, to protect and
control the cell state.

The sigmoid layer outputs numbers betweenzero,and one how much of each
describing
component should be let through. A value of zerolet nothing
means through
“,” while a value
of one means “let everything through!”
37

Forget Gates
What is added to the
hidden state

Let’s go to the example of a language model trying


to predict the next word based on all the previous
ones. In such a problem, the cell state might
include the gender of the present subject, so that
the correct pronouns can be used. When we see a
new subject, we want to forget the gender of the old
subject.
38

Input Gates
What is kept from
previous states

In the example of our language model, we’d want to add the


gender of the new subject to the cell state, to replace the old
one we’re forgetting.
39

Hidden State
Carry previous
information

In the case of the language model, this is where we’d


actually drop the information about the old subject’s
gender and add the new information, as we decided in
the previous steps.
40

Output Gates What is reported as


output

For the language model example, since it just saw a subject, it might want
to output information relevant to a verb, in case that’s what is coming next.
For example, it might output whether the subject is singular or plural, so that
we know what form a verb should be conjugated into if that’s what follows
next.
31

LSTM APPLICATIONS

LSTMs find applications in various domains, particularly


in natural language processing tasks such as language
translation and sentiment analysis. They excel in
scenarios where understanding and retaining context
over a sequence are crucial.
31

CONCLUSION

In conclusion, both RNNs and LSTMs are powerful tools for processing
sequential data. RNNs maintain a cyclic structure to retain memory, but
the vanishing gradient problem limits their effectiveness over long
sequences. LSTMs address this issue with a more sophisticated
architecture, allowing them to capture and retain long-term
dependencies more effectively. LSTMs have become instrumental in
various machine learning and artificial intelligence applications, where
understanding and utilizing context over time are critical.

You might also like