0% found this document useful (0 votes)
4 views29 pages

Understanding Sequence Models

The document explains the differences between various neural network models used for processing sequences, particularly focusing on Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). It highlights how LSTMs address the vanishing gradient problem and improve memory retention for long sequences, making them more effective for tasks like sentiment analysis and complaint classification. The document emphasizes the importance of maintaining context in sequential data processing to enhance model accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views29 pages

Understanding Sequence Models

The document explains the differences between various neural network models used for processing sequences, particularly focusing on Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). It highlights how LSTMs address the vanishing gradient problem and improve memory retention for long sequences, making them more effective for tasks like sentiment analysis and complaint classification. The document emphasizes the importance of maintaining context in sequential data processing to enhance model accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Understanding Sequence

Models
Think of neural networks as systems that process information, much like
a human brain. A standard dense layer, similar to what's found in a
simple Artificial Neural Network (ANN), processes information in
isolation.

It only considers the current piece of data and has no memory of what
came before. This works fine for independent tasks like classifying a
single image, but fails when understanding sequences like language or
time-series data is required.
Recurrent Neural Networks (RNN)
An RNN is designed with short-term memory. It processes sequences one piece at a time,
maintaining a "hidden state" that summarises information processed so far.

Business Example

An RNN analysing customer reviews reads "The quality of the product was great" and
uses its hidden state to understand "great" refers to product quality, enabling accurate
sentiment analysis.

The Problem: Vanishing Gradients

As sequences get longer, information from the beginning tends to get lost or "forgotten." This
vanishing gradient problem prevents the network from learning from distant past
information.
Long Short-Term Memory (LSTM)
An LSTM solves the short-term memory problem using sophisticated gating mechanisms to manage
memory intelligently over very long sequences.

Forget Gate Input Gate


Acts as an intelligent filter, deciding which Determines which new information from
past information is no longer relevant and current input is important enough to add to
should be discarded. long-term memory.

Output Gate
Controls what part of the network's memory should be used for current prediction or output.

Business Application: An LSTM predicting stock prices can remember key events from months
ago—major product launches, earnings reports—and factor that long-term information into current
predictions.
Bidirectional LSTM (BiLSTM)
A BiLSTM extends LSTM by processing sequences in two directions: beginning-to-end and end-to-beginning, using separate LSTMs for each direction.

Why Bidirectional Matters


Consider this customer review:

"The hotel was beautiful, but the service was disappointing."

• Standard LSTM: Reads left-to-right, might predict positive sentiment


from "beautiful"

• BiLSTM: Combines forward and backward context, recognising the true


mixed/negative sentiment
Comparing Sequence Models for Complaint Classification
When categorizing consumer complaints—ranging from "Bank account or service" to "Payday loan"—the choice of neural network significantly
impacts accuracy and understanding of nuanced language. Let's compare how different architectures handle this task.

DNN (Dense Neural Network) RNN (Recurrent Neural Network)


Treats each word in a complaint in isolation, lacking any context. Fails to understand Processes complaints sequentially, retaining some short-term memory. Struggles with
phrases like "account closed without notice," as it doesn't remember "account" when very long complaints, like detailed descriptions of mortgage issues, due to vanishing
processing "closed." gradients.

LSTM (Long Short-Term Memory) BiLSTM (Bidirectional LSTM)


Excels at capturing long-term dependencies. Can understand complex complaints about The most robust for complaint classification. Processes text both forward and backward,
credit card fraud that unfold over many sentences, remembering initial details when allowing it to grasp full context, distinguishing between "loan approved, then denied"
processing later ones. versus "loan denied, then approved" accurately.
The Need
•Densely connected networks and convnets - have no memory

•Each input is processed independently, with no state kept between


inputs

•Disadvantage of memory less networks: You have to show the entire


sequence to the network at once

•In contrast in Recurrent Neural Networks – The inputs are given in a


sequence – Preserving the meaning present in the sequence
The Architecture
(Concept)
Principle: Biological intelligence processes information incrementally
while maintaining an internal model of what it’s processing, built from
past information and constantly updated as new information comes in.

A recurrent neural network (RNN) adopts the same principle,

albeit in an extremely simplified version: it processes

sequences by iterating through the sequence elements and

maintaining a state that contains information relative to what

it has seen so far.


The Architecture (Simple RNN)
The Architecture (Simple RNN)
The Architecture - RNN
The Architecture (Simple RNN)
Disadvantage of RNN
Vanishing Gradient Problem
Vanishing Gradient Problem
Vanishing Gradient Problem
Why LSTMs Help

Architectures like Long Short-Term Memory (LSTM) address this issue by introducing mechanisms like gates (input, forget,
and output gates in LSTMs). These gates regulate the flow of information and gradients:

Gates prevent vanishing gradients: They allow the network to retain important information over long sequences while
discarding irrelevant details. This ensures that the gradients can flow back through time without shrinking exponentially.

Cell state in LSTMs: The cell state provides a direct path for gradient propagation, which reduces the likelihood of vanishing
gradients. By combining these mechanisms, LSTMs enable better learning of long-term dependencies compared to standard
RNNs.
The Architecture (LSTM)
RNN States are not carried across
sequences, but in LSTM they are.
How is it achieved ?
Carry layer is an addition to Simple RNN.

It adds a way

to carry information across many timesteps.


This is done via input, forget and
This is essentially what LSTM does: it saves information for
output gates
later, thus preventing older signals from gradually vanishing
during processing.
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
The Architecture - LSTM
SUMMARIZING
Variants landscape
THANK YOU

You might also like