0% found this document useful (0 votes)

26 views45 pages

Recurrent Neural Networks (RNNS)

Recurrent Neural Networks (RNNs) are specialized deep learning models designed for processing sequential data, enabling predictions based on prior inputs and maintaining context through hidden states. They are widely used in applications like language translation, sentiment analysis, and time series forecasting, but face challenges such as vanishing and exploding gradients. Variants like LSTM and GRU have been developed to address these limitations and enhance the ability to learn long-term dependencies.

Uploaded by

haradiancharlus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views45 pages

Recurrent Neural Networks (RNNS)

Uploaded by

haradiancharlus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

RECURRENT NEURAL

NETWORKS(RNNS)
Dr. Gaganpreet Kaur
RNNS
• Recurrent means “coming back of something that existed before”, i.e.
Holds the past and hence name of RNNs which have memory and are
capable of working on sequential data.

Imagine a rocket launched but we don’t know the path it shall

follow???

How do you get ECG??

How about cryptocurrency price predictions???

Time Series Prediction

RNNS
RNNs are a type of deep learning model designed for sequential data, enabling
predictions based on prior inputs. They are used for language translation,
sentiment analysis, and time series forecasting.
Unlike CNNs which take image inputs and work on grid data, RNNs process
sequence of values.
Features:
 Sequential Learning: RNNs excel in processing time series and sequential
data.
 Memory Mechanism: They maintain context through hidden states,
influencing predictions.
 Diverse Applications: Useful in language processing, flood prediction, and
more.
 Gradient Challenges: RNNs face vanishing and exploding gradient problems.
RNNS
• Utility in Sequential Data: RNNs are tailored for sequential processing,
making them suitable for real-time applications where past information is
crucial. This characteristic enables them to analyze trends and make
predictions based on historical data.
• Memory and Context: The hidden state mechanism allows RNNs to
remember information over time, crucial for tasks requiring context
understanding, such as natural language processing. This memory aspect
differentiates them from traditional neural networks.
• Architectural Variants: LSTM and GRU architectures are designed to
overcome the limitations of standard RNNs, particularly in learning long-
term dependencies, making them more effective for complex tasks. This
highlights the importance of architecture in achieving better
performance.
HOW DIFFERENT FROM ANNS
• Exploits Parameter sharing.
• Consider “I attended an International Conference in 2019”
or
“ In 2019 I had chance to attend an International Conference”

If I want to know when did I attend an International Conference answer

should be: 2019
But in Fully connected ANNs this would be problem of feature extraction
and varies with position of occurrence but RNN shares the weights and
maintains them over several time sequences.
HOW DIFFERENT FROM ANNS

• Neural Networks (and also Convolutional Networks) is that their API is too
constrained: they accept a fixed-sized vector as input (e.g. an image) and
produce a fixed-sized vector as output (e.g. probabilities of different classes).
• Generally mapping is done using a fixed amount of computational steps (e.g.
the number of layers in the model).
RNNs unlike CNNs and ANNs allow to learn from sequences of vectors
DIFFERENT TYPES FOR
DIFFERENT USES
• ANNs: Used for general Regression and Classification problems.
• CNNs: Used for object detection and image classification.
• Deep Belief Network: Used in healthcare sectors for cancer detection.
• RNN: Used for speech recognition, voice recognition, time series
prediction, and natural language processing.
Does ANN/CNN have parameter
sharing
ONE HOT
ENCODING
RNN
In simplest form RNN is as shown in Fig 1 but can have different architectures as shown in Fig2

Many to one One to many Many

to Many
RNN
Recurrent Neural Networks for the following tasks:
• Regression uses Score calculation
• Classification uses Error calculation using a loss function

RNN as it deals with long sequences with many time-steps, works well with
backpropagation through time (BPTT) works well.
• BPTT is an extension of the standard backpropagation algorithm for RNNs
• Truncated BPTT reduces the computational complexity
• of each parameter update in a Recurrent Neural Networks.
COMPUTATIONAL
GRAPHS FOR
RNNS
A recurrent network with no
outputs as computational
graph. This recurrent network
just processes information
from the input x by
incorporating it into the state
h that is passed forward
through time.

The computational graph to compute the

training loss of a recurrent network that
maps an input sequence of x values to a
corresponding sequence of output ‘o’
values. A loss L measures how far each
‘o’ is from the corresponding training
target y .
RNN
• Different from autoregressive(e.g Google deepmind PixelCNN) models
mostly used in GPTs are Feed forward models unlike RNNs use past output
and input to next step . In RNNs feedback is implemented through hidden
units.
• Autoregressive models are free from vanishing gradient problems.
• Transformer models are not auto regressive and more complex
WHY RNNS FOR SEQUENTIAL
MODELLING
• Most of the data around is sequential in real world is sequential;
• CNNs fail to capture as they work on fixed length vectors while sequential
data can be of varying length
• Sequential data should be able to track long term dependencies-
“Hello how r you? Howz your health now”
• In Sequential data order is important:
The movie is good not bad
vs
The movie is not good but bad

• Requires parameter sharing

COMPONENTS OF RNN
Three dimensions for input:
• Mini-batch size
• Number of columns in our
vector per time-step
• Time-series length
RNNS
RNNs can handle variable sequence lengths
Consider:
How old are you?
or
Please tell how old are you?
Or
May I now your age please

RNNs expand along time series and hence handles variable sequences
RNNS
• RNNs uses parameter sharing:
• Different learning parameters(U,V,W) are same at each time step and are
shared / expanded along time series .At each step learning
comes(depends) on previous output so learning is distributed in time

Consider, “It was sunny

day”
RNNS
RNNS
Unfolding across Time:
At each time step state ‘sn’ contains information from the past
RNNS
Captures differences in sequence order as it learns from immediate
previous word/input so captures localized dependencies of sequence

The movie is good not bad

vs
The movie is not good but bad
RNNS
• RNNs support Non-Linear Mapping

• RNNs use non-linear activation functions, which allows them to learn

complex, non-linear mappings between inputs and outputs.
STANDARDIZATION

• It generally helps to standardize the input data (e.g., zero mean,

unit variance).
• This helps transform the inputs into a range more suitable for the
standard activation functions.
• Standardization helps the relationship between the inputs and the
targets to be as simple and localized as possible for only real-
valued inputs.
• Not used for one-hot (categorical) inputs
LEARNING
• Uses BPTT :Takes derivative of loss function wrt each parameter(U,V,W) and
modify parameters so as to minimize loss

Jn is Target value
BATCH NORMALIZATION
• Following key points explain the intuition behind BN and how it works:
• It consists of adding an operation in the model just before or after the
activation function of each hidden layer.
• This operation simply zero-centers and normalizes each input, then scales
and shifts the result using two new parameter vectors per layer: one for
scaling, the other for shifting.
• In other words, the operation lets the model learn the optimal scale and
mean of each of the layer’s inputs.
• To zero-center and normalize the inputs, the algorithm needs to estimate
each input’s mean and standard deviation.
• It does so by evaluating the mean and standard deviation of the input over
the current mini-batch (hence the name “Batch Normalization”).
CHALLENGES
• Suffer from the vanishing and exploding gradient problem

Source: medium/analytics-vidhya
EXPLODING VS VANISHING
GRADIENTS
• n error gradient is the direction and magnitude calculated during the training of a
neural network that is used to update the network weights in the right direction and by
the right amount.
• In deep networks or recurrent neural networks, error gradients can accumulate during
an update and result in very large gradients. These in turn result in large updates to the
network weights, and in turn, an unstable network. At an extreme, the values of weights
can become so large as to overflow and result in NaN values.
• The explosion occurs through exponential growth by repeatedly multiplying gradients
through the network layers that have values larger than 1.0.
• May lead to avalanche learning
• Solution:
• exploding gradients are still occurring, you can check for and limit the size of gradients
during the training of your network.- gradient clipping
• check the size of network weights and apply a penalty to the networks loss function for
large weight values- weight regularization and often an L1 (absolute weights) or an L2
(squared weights) penalty can be use
• As the backpropagation algorithm advances downwards(or backward)
from the output layer towards the input layer, the gradients often get
smaller and smaller and approach zero which eventually leaves the
weights of the initial or lower layers nearly unchanged. As a result, the
gradient descent never converges to the optimum. This is known as
the vanishing gradients problem.
• May lead to learning stagnation/ saturation
• Solution: Using non-saturating Activation functions: ReLU, Leaky ReLU,
Batch normalization
CHALLENGES
• RNNs can have high Computational Complexity. RNNs can be
computationally expensive to train, especially when dealing with long
sequences. This is because the network has to process each input in
sequence, which can be slow.

• It is difficult to choose the right RNN architecture. There are many

different variants of RNNs, each with its own advantages and
disadvantages. Choosing the right architecture for a given task can be
challenging, and may require extensive experimentation and tuning.
CHALLENGES
• RNN is unable to capture Long-Term Dependencies. RNNs are designed to
capture information about past inputs, but they can struggle to capture
long-term dependencies in the input sequence as past information is
dominated by immediate inputs. This is because the gradients can
become very small as they propagate through time, which can cause the
network to forget important information.

• RNNs lack Of Parallelism due to their inherent sequential nature which

makes them slow. This can limit the speed and scalability of the network.
ACTIVATION FUNCTIONS IN RNN
• Sigmoid Function: It has a range between 0 and 1, which makes it useful
for binary classification tasks. The formula for the sigmoid function is:

σ(x) = 1 / (1 + e^(-x))

• Hyperbolic Tangent (Tanh) Function: It has a range between -1 and 1,

which makes it useful for non-linear classification tasks. The formula for
the tanh function is:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

ACTIVATION FUNCTIONS IN RNN
• Rectified Linear Unit (Relu) Function: It has a range between 0 and
infinity, which makes it useful for models that require positive outputs.
The formula for the ReLU function is:

ReLU(x) = max(0, x)

• Leaky Relu Function: It introduces a small slope to negative values, which

helps to prevent dead neurons in the model. The formula for the Leaky
ReLU function is:

Leaky ReLU(x) = max(0.01x, x)

ACTIVATION FUNCTIONS IN RNN
• Softmax Function: The softmax function is often used in the output layer
of RNNs for multi-class classification tasks. It converts the network output
into a probability distribution over the possible classes. The formula for
the softmax function is:

softmax(x) = e^x / ∑(e^x)

VARIANTS OF RNN
• Vanilla RNN( single i/p and single O/P), LSTM, Gated Recurrent Unit(GRU),
Bidirectional LSTM
• Single I/P and Single O/p:
• Single I/P many O/P: Image Captioning
• Many I/P and Many O/P: Language Translators
• Many I/P and single O/P: Sentiment Analysis
BACKPROPAGATION IN
TIME:RNNS
BACKPROPAGATION IN
TIME:RNNS
J is the loss
function at each
time step Ji where
‘I’ denotes the
time step
BACKPROPAGATION IN
TIME:RNNS

Applying the chain rule:

So, s2 depends on s1 which in turn

deoends on s0 and both depend on W
So S2 can be further expanded as:
Generalizing the xpression for backpropagation through time steps:
LSTMS
• Overcomes the vanishing gradient problem
• Uses gating mechanisms that control the flow of information through the
network:
• Input gate
• Forget gate
• Output gate.
• use of gates allow the LSTM network to selectively remember or forget
information from the input sequence, which makes it more effective for
long-term dependencies.

ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Unit 3 RCNN
No ratings yet
Unit 3 RCNN
25 pages
RNN LSTM
No ratings yet
RNN LSTM
71 pages
Bianchi
No ratings yet
Bianchi
62 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
Introduction to Recurrent Neural Networks
No ratings yet
Introduction to Recurrent Neural Networks
18 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
9 pages
RNN
No ratings yet
RNN
23 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DL 4 Notes
No ratings yet
DL 4 Notes
34 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
NNDL Unit 5 - Easy To Understand NNDL Unit 5 - Easy To Understand
No ratings yet
NNDL Unit 5 - Easy To Understand NNDL Unit 5 - Easy To Understand
37 pages
A Brief Overview of Recurrent Neural Networks (RNN)
No ratings yet
A Brief Overview of Recurrent Neural Networks (RNN)
8 pages
U5 PDF
No ratings yet
U5 PDF
37 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
What Is A Recurrent Neural Network
No ratings yet
What Is A Recurrent Neural Network
36 pages
Unit 4
No ratings yet
Unit 4
13 pages
Unit 5
No ratings yet
Unit 5
42 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
NNDL Unit 5 Understanding Recurrent Neural Networks in Depth
No ratings yet
NNDL Unit 5 Understanding Recurrent Neural Networks in Depth
32 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
Lec 10
No ratings yet
Lec 10
37 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Module 4 Recurrent Neural Network
100% (1)
Module 4 Recurrent Neural Network
78 pages
DL Notes
No ratings yet
DL Notes
35 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Advanced RNN Design & Applications
No ratings yet
Advanced RNN Design & Applications
41 pages
DL Mod4
No ratings yet
DL Mod4
105 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
DS303 RNN LSTM
No ratings yet
DS303 RNN LSTM
16 pages
Advanced Deep Learning with RNNs
No ratings yet
Advanced Deep Learning with RNNs
50 pages
Top 25 Interview Questions On RNN - Reader View
No ratings yet
Top 25 Interview Questions On RNN - Reader View
9 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
Module 6
No ratings yet
Module 6
51 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
5 Deep Learning RNNs
No ratings yet
5 Deep Learning RNNs
26 pages
598 114 216 Recurrent Neural Networks
No ratings yet
598 114 216 Recurrent Neural Networks
87 pages
RNN Recurrent Neural Network: Application Input Sequence Task
No ratings yet
RNN Recurrent Neural Network: Application Input Sequence Task
10 pages
Unit IV
No ratings yet
Unit IV
22 pages
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
21 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
6 pages
Ch1.6 Distribution of Discrete R.V.: Binomial Distribution
No ratings yet
Ch1.6 Distribution of Discrete R.V.: Binomial Distribution
4 pages
Optimization Techniques: Continuous - Discrete - Functional Optimization
No ratings yet
Optimization Techniques: Continuous - Discrete - Functional Optimization
20 pages
Classification of Dry Bean
No ratings yet
Classification of Dry Bean
16 pages
Staggered Grids PDF
No ratings yet
Staggered Grids PDF
13 pages
Toc PDF
No ratings yet
Toc PDF
5 pages
Self-Directed Online Machine Learning For Topology
No ratings yet
Self-Directed Online Machine Learning For Topology
19 pages
Mathematics As Level Mock Exams 2022
No ratings yet
Mathematics As Level Mock Exams 2022
29 pages
Practice Set 2
No ratings yet
Practice Set 2
5 pages
MATLAB Control Systems Lab
No ratings yet
MATLAB Control Systems Lab
4 pages
Otala 1980 Icassp
No ratings yet
Otala 1980 Icassp
2 pages
Principles of Artificial Intelligence: Printed Book
No ratings yet
Principles of Artificial Intelligence: Printed Book
1 page
Understanding The Sampling Process: Mixed-Signal
No ratings yet
Understanding The Sampling Process: Mixed-Signal
7 pages
Disease Prediction and Drug Recommendation Using Machine Learning
100% (1)
Disease Prediction and Drug Recommendation Using Machine Learning
26 pages
Instruction Detection System Using Explainable AI
No ratings yet
Instruction Detection System Using Explainable AI
2 pages
Android Programming for BSc IT Students
No ratings yet
Android Programming for BSc IT Students
9 pages
Course Material For cs391
No ratings yet
Course Material For cs391
21 pages
Lin Dissertation PDF
No ratings yet
Lin Dissertation PDF
167 pages
FCL KD
No ratings yet
FCL KD
7 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Econometrics: 2SLS & Hausman Test
No ratings yet
Econometrics: 2SLS & Hausman Test
4 pages
Lesson 1 - Time Series Basics
No ratings yet
Lesson 1 - Time Series Basics
23 pages
CS Exam: Data Structures & Algorithms
No ratings yet
CS Exam: Data Structures & Algorithms
2 pages
Lecture Notes EE301 Signals and Systems I
No ratings yet
Lecture Notes EE301 Signals and Systems I
127 pages
Step by Step Notes On J J Sakurai The Schroedinger Heisenberg and Feynman Pictures of Quantum Mechanics and Classical Mechanics
No ratings yet
Step by Step Notes On J J Sakurai The Schroedinger Heisenberg and Feynman Pictures of Quantum Mechanics and Classical Mechanics
133 pages
Wadekar, Chaurasia - 2022 - MobileViTv3 Mobile-Friendly Vision Transformer With Simple and Effective Fusion of Local, Global and Input F
No ratings yet
Wadekar, Chaurasia - 2022 - MobileViTv3 Mobile-Friendly Vision Transformer With Simple and Effective Fusion of Local, Global and Input F
20 pages
Stat Prob Q403.2 Constructing A Frequency Distribution Table
No ratings yet
Stat Prob Q403.2 Constructing A Frequency Distribution Table
30 pages
Objective Reduction in Many-Objective Optimization: Linear and Nonlinear Algorithms
No ratings yet
Objective Reduction in Many-Objective Optimization: Linear and Nonlinear Algorithms
23 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
1.5 CIRCLE GENERATING - Output Primitive
100% (1)
1.5 CIRCLE GENERATING - Output Primitive
35 pages

Recurrent Neural Networks (RNNS)

Uploaded by

Recurrent Neural Networks (RNNS)

Uploaded by

RECURRENT NEURAL

Imagine a rocket launched but we don’t know the path it shall

How do you get ECG??

How about cryptocurrency price predictions???

Time Series Prediction

If I want to know when did I attend an International Conference answer

Many to one One to many Many

The computational graph to compute the

• Requires parameter sharing

Consider, “It was sunny

The movie is good not bad

• RNNs use non-linear activation functions, which allows them to learn

• It generally helps to standardize the input data (e.g., zero mean,

• It is difficult to choose the right RNN architecture. There are many

• RNNs lack Of Parallelism due to their inherent sequential nature which

• Hyperbolic Tangent (Tanh) Function: It has a range between -1 and 1,

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

• Leaky Relu Function: It introduces a small slope to negative values, which

Leaky ReLU(x) = max(0.01x, x)

softmax(x) = e^x / ∑(e^x)

Applying the chain rule:

So, s2 depends on s1 which in turn

You might also like