Deep Learning (MODULE-5)
Deep Learning (MODULE-5)
DEEP LEARNING
Module: 5
Module:5
RECURSIVE NEURAL NETWORKS
1. Long-Term Dependencies –
2. Echo State Networks –
3. Long Short-Term Memory and
4. Other Gated RNNs –
5. Optimization for Long-Term
Dependencies
6. Explicit Memory.
1:) RECURRENT NEURAL NETWORKS
INTRODUCTION:
Recursive Neural Networks (RvNNs) are
deep neural networks used for natural
language processing.
We get a Recursive Neural Network when
the same weights are applied recursively on a
structured input to obtain a structured
prediction.
1:) RECURRENT NEURAL NETWORKS
INTRODUCTION:
What Is a Recursive Neural Network?
Deep Learning is a subfield of machine learning
and artificial intelligence (AI) that attempts to
imitate how the human brain processes data and gains
certain knowledge.
Neural Networks form the backbone of Deep
Learning.
These are loosely modeled after the human brain
and designed to accurately recognize underlying
patterns in a data set. If you want to predict the
unpredictable, Deep Learning is the solution.
1:) RECURRENT NEURAL NETWORKS
INTRODUCTION:
What Is a Recursive Neural Network?
Due to their deep tree-like structure,
Recursive Neural Networks can handle hierarchical
data.
The tree structure means combining child nodes
and producing parent nodes. Each child-parent bond
has a weight matrix, and similar children have the same
weights.
The number of children for every node in the tree is
fixed to enable it to perform recursive operations and
use the same weights. RvNNs are used when there's a
need to parse an entire sentence.
1:) RECURRENT NEURAL NETWORKS
Recurrent Neural Network vs. Recursive Neural
Networks
1:) RECUrsive NEURAL NETWORKS
INTRODUCTION:
1:) RECURRENT NEURAL NETWORKS
INTRODUCTION:
1:) RECURRENT NEURAL NETWORKS
INTRODUCTION:
1:) LONG TERM DEPENDENCIES
Why long-term dependencies?
1:) LONG TERM DEPENDENCIES
INTRODUCTION:
What are long-term dependencies?
Long-term dependencies are the situations where
the output of an RNN depends on the input that occurred
many time steps ago. For instance, consider the sentence
"The cat, which was very hungry, ate the mouse".
To understand the meaning of this sentence, you need
to remember that the cat is the subject of the verb ate,
even though they are separated by a long clause.
This is a long-term dependency, and it can affect the
performance of an RNN that tries to generate or analyze
such sentences.
1:) LONG TERM DEPENDENCIES
2.) Why are long-term dependencies??
Recurrent neural networks (RNNs) are
powerful machine learning models that can
process sequential data, such as text, speech, or
video.
However, they often struggle to capture
long-term dependencies, which are the
relationships between distant elements in the
sequence
1:) LONG TERM DEPENDENCIES
2.) Why are long-term dependencies hard to
learn?
The main reason why long-term
dependencies are hard to learn is that RNNs
suffer from the vanishing or exploding gradient
problem.
This means that the gradient, which is the
signal that tells the network how to update its
weights, becomes either very small or very
large as it propagates through the network.
1:) LONG TERM DEPENDENCIES
2.) Why are long-term dependencies hard to
learn?
When the gradient vanishes, the network
cannot learn from the distant inputs, and when
it explodes, the network becomes unstable and
produces erratic outputs.
This problem is caused by the repeated
multiplication of the same matrix, which
represents the connections between the hidden
units, at each time step.
1:) LONG TERM DEPENDENCIES
2.) How can you use gated units to handle long-
term dependencies?
Another way to handle long-term
dependencies is to use gated units, which are
special types of hidden units that can control
the flow of information in the network.
The most popular gated units are the long
short-term memory (LSTM) and the gated
recurrent unit (GRU).
1:) LONG TERM DEPENDENCIES
2.) How can you use gated units to handle long-
term dependencies?
These units have internal mechanisms that
allow them to remember or forget the previous
inputs and outputs, depending on the current
input and output.
This way, they can selectively access the
relevant information from the and
ignore the irrelevant information. distant past
1:) LONG TERM DEPENDENCIES
2.) How can you use attention mechanisms to
handle long-term dependencies?
Another way to handle long-term
dependencies is to use attention mechanisms,
which are modules that can learn to focus on the
most important parts of the input or output
sequence. The most common attention
mechanism is the self-attention, which computes
1:) LONG TERM DEPENDENCIES
2.) How can you use attention mechanisms to
handle long-term dependencies?
the similarity between each element in the
sequence and assigns a weight to each one.
Then, it uses these weights to create a
context vector, which summarizes the
information from the whole sequence.
This way, it can capture the relationships
between the distant elements and enhance the
representation of the sequence.
1:) LONG TERM DEPENDENCIES
Challenges:
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM-Notations
LSTM
LSTM Inputs and Outputs of the unit
The network takes three inputs.
Candidate t-
New Input Relevant
check
LSTM Process of o/p Gate
LSTM Process of o/p Gate
LSTM Overall LSTM Architecture
LSTM
Advantages of LSTM
LSTM cells have several advantages over simple
RNN cells, such as their ability to learn long-term
dependencies and capture complex patterns in
sequential data.
For example:
They can predict the next word in a sentence
based on the previous words and the context, or
generate captions for images based on the visual
features and the language model.
LSTM
Advantages of LSTM
LSTM cells can avoid the vanishing or exploding
gradient problem, allowing them to learn from
longer sequences without losing or amplifying the
information.
To translate a long sentence from one language
to another without forgetting or distorting the
meaning.
They can handle noisy or missing data better than
simple RNN cells, such as filling in the blanks or
correcting errors in a text based on the surrounding
words and grammar.
LSTM Disadvantages of LSTM
LSTM cells have some drawbacks when
compared to simple RNN cells.
They are more computationally expensive and
require more memory and time to train and run
due to their additional parameters and operations.
Additionally, they are more prone to overfitting,
necessitating regularization techniques such as
dropout, weight decay, or early stopping.
Finally, they are harder to interpret and explain
than simple RNN cells since they have more hidden
layers and states.
LSTM Other Gated RNN
Gated recurrent units (GRUs) are a gating
mechanism in recurrent neural networks, introduced
in 2014.
The GRU is like a long short-term memory (LSTM)
with a gating mechanism to input or forget certain
features, but lacks a context vector or output gate,
resulting in fewer parameters than LSTM.
GRU's performance on certain tasks of polyphonic
music modeling, speech signal modeling and natural
language processing was found to be similar to that
of LSTM
LSTM Other Gated RNN
Gated Recurrent Network
The main difference with the LSTM is that a
single gating unit simultaneously controls the
forgetting factor and the decision to update the state
unit.
For example:
The reset gate (or forget gate) output could be
shared across multiple hidden units.
Alternately, the product of
Reference Link
https://colah.github.io/posts/2015-08-
Understanding-LSTMs/
Optimization for Long-Term
Dependencies
Second-order optimization algorithms may
roughly be understood as dividing the first derivative
by the second derivative (in higher dimension,
multiplying the gradient by the inverse Hessian).