33-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024

Uploaded by

gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views4 pages

33-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024

Uploaded by

gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

`

LSTM AND BILSTM:

So far, our deep learning models of a sequence learning task have been language modeling,
where we aim to predict the next token given all previous tokens in a sequence. In this
scenario, we wish only to condition upon the leftward context, and thus the unidirectional
chaining of a standard RNN seems appropriate.
However, there are many other sequence learning tasks contexts where it is perfectly fine to
condition the prediction at every time step on both the leftward and the rightward context.
Consider, for example, part of speech detection. Why shouldn’t we take the context in both
directions into account when assessing the part of speech associated with a given word?

Another common task—often useful as a pretraining exercise prior to fine-tuning a model on

an actual task of interest—is to mask out random tokens in a text document and then to train a
sequence model to predict the values of the missing tokens. Note that depending on what
comes after the blank, the likely value of the missing token changes dramatically:

 I am ___.
 I am ___ hungry.
 I am ___ hungry, and I can eat 2 pizzas.
In the first sentence “happy” seems to be a likely candidate. The words “not” and “very”
seem plausible in the second sentence, but “not” seems incompatible with the third sentence.

Bidirectional Recurrent Neural Networks (BiRNNs) are an extension of traditional Recurrent

Neural Networks (RNNs) that can improve model performance on various sequential data
tasks. Unlike standard RNNs that process sequences in a single direction (from past to
future), BiRNNs process the sequence in both directions with two separate hidden layers,
which are then fed forwards to the same output layer.
The architecture of a BiRNN allows it to capture information from both the past and the
future within any point in the sequence. This is particularly useful for tasks where context
from the future is as important as the past for making predictions, such as in natural language
processing tasks (e.g., translation, sentiment analysis) or in any sequence classification task.
Overview of how BiRNNs work:
1. Forward Pass: In one direction, the RNN processes the sequence from the start to the
end, much like a conventional RNN. This forward pass captures the past context.
2. Backward Pass: Simultaneously, another RNN processes the sequence in the reverse
direction, from end to start, capturing future context.
3. Combining Contexts: At each time step, the hidden states of both the forward and
backward passes are typically concatenated or summed and then passed to the output
layer to make a prediction.
4. Training: During training, both the forward and backward networks are trained
simultaneously. The error is backpropagated through both networks, updating the
weights in both directions.
example of how to use a simple BiRNN with Keras:
from [Link] import Sequential
from [Link] import Bidirectional, LSTM, Dense

model = Sequential()
[Link](Bidirectional(LSTM(units=50, return_sequences=True),
input_shape=(sequence_length, feature_dim)))
[Link](Bidirectional(LSTM(units=50)))
[Link](Dense(output_dim, activation='softmax'))

[Link](optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In this example, units=50 defines the number of LSTM units in each
direction, sequence_length is the length of the input sequences, feature_dim is the number of
features in each timestep, and output_dim is the dimensionality of the output space (e.g.,
number of classes for classification tasks).

Advantages of BiRNNs:
 Dual Context: They can capture information from both past and future contexts
simultaneously.
 Improved Accuracy: For many tasks, this leads to better performance than
unidirectional RNNs.
Disadvantages of BiRNNs:
 Increased Computational Load: They essentially double the computation because
they process the sequence twice.
 Increased Memory Usage: BiRNNs require more memory to store the intermediate
states for both directions.
 Not Suitable for Real-Time Processing: Since future context is needed, a BiRNN
can't be used in real-time applications where full sequences are not available
immediately.
It's important to note that while BiRNNs can provide more context for making predictions,
they are not always the best choice. The suitability of BiRNNs largely depends on the nature
of the task and the data.
Code to use a standard LSTM:
from [Link] import Sequential
from [Link] import LSTM, Dense
model = Sequential()
# Adding the first LSTM layer where return_sequences=True because we will add more
layers to the model
[Link](LSTM(units=50, return_sequences=True, input_shape=(sequence_length,
feature_dim)))
# Adding the second LSTM layer, no need to specify the input_shape as Keras can infer from
the previous layer's output
[Link](LSTM(units=50))
# Adding the output layer
[Link](Dense(output_dim, activation='softmax'))

# Compiling the model

[Link](optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In this code, units=50 refers to the number of neurons in the LSTM cell. sequence_length is
the length of your input sequences, feature_dim is the number of features for each time step
in the input data, and output_dim is the size of the output layer (which often corresponds to
the number of classes in a classification problem).
The key differences between this LSTM example and the previous Bidirectional LSTM
(BiLSTM) one are:
 Directionality: This LSTM model processes the data in a single direction, from the
start of the sequence to the end, while the BiLSTM processes it in both directions.
 Layer Connections: In the unidirectional LSTM model, each layer feeds only into
the next layer moving forward, whereas in the BiLSTM model, there are forward and
backward passes that feed into the next layer.

data is available to predict the future, a unidirectional LSTM is more appropriate. However,
in tasks where the context in both directions is beneficial, a Bidirectional LSTM may perform
better.

DL Module 4 Notes
No ratings yet
DL Module 4 Notes
27 pages
Unit 5 Short Notes
No ratings yet
Unit 5 Short Notes
35 pages
Unit-2 Part-2
No ratings yet
Unit-2 Part-2
42 pages
UNIT-3 Part2
No ratings yet
UNIT-3 Part2
14 pages
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
100% (1)
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
9 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
23 pages
4-1 Nic
No ratings yet
4-1 Nic
26 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
36 pages
Semster - DL
No ratings yet
Semster - DL
15 pages
Sequence Modeling Recurrent Neural Networks
No ratings yet
Sequence Modeling Recurrent Neural Networks
18 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
DL Unit4
No ratings yet
DL Unit4
20 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
34 pages
BiLSTM BPTT
No ratings yet
BiLSTM BPTT
8 pages
DL Module 5
No ratings yet
DL Module 5
10 pages
RNN and LSTM Tutorial Overview
No ratings yet
RNN and LSTM Tutorial Overview
15 pages
1 s2.0 S0893608005001206 Main
No ratings yet
1 s2.0 S0893608005001206 Main
9 pages
Lab 9 RNN
No ratings yet
Lab 9 RNN
8 pages
4.2 Sequence2Sequence (RNN)
No ratings yet
4.2 Sequence2Sequence (RNN)
46 pages
Unit-Iv DL
No ratings yet
Unit-Iv DL
54 pages
Endsem Imp DL Unit 4
No ratings yet
Endsem Imp DL Unit 4
30 pages
Advanced RNN Design & Applications
No ratings yet
Advanced RNN Design & Applications
41 pages
DL 4
No ratings yet
DL 4
19 pages
Unit 5 Updated
No ratings yet
Unit 5 Updated
125 pages
Sequence Modeling for IT Students
No ratings yet
Sequence Modeling for IT Students
71 pages
Soft Computing 1
No ratings yet
Soft Computing 1
15 pages
UNIT5
No ratings yet
UNIT5
13 pages
Deep Learning with RNNs
No ratings yet
Deep Learning with RNNs
102 pages
Bidirectional RNNs in Deep Learning
No ratings yet
Bidirectional RNNs in Deep Learning
10 pages
Ad3501-Dl-Unit 3 Notes
No ratings yet
Ad3501-Dl-Unit 3 Notes
34 pages
Lec14 RNN3 8 Feb 18
No ratings yet
Lec14 RNN3 8 Feb 18
16 pages
Bianchi
No ratings yet
Bianchi
62 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
LSTM RNNs in NLP: Lecture Notes
No ratings yet
LSTM RNNs in NLP: Lecture Notes
57 pages
UNIT-3 Sequence Modeling
No ratings yet
UNIT-3 Sequence Modeling
20 pages
RNNs
No ratings yet
RNNs
22 pages
Bidirectional LSTM
100% (1)
Bidirectional LSTM
4 pages
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
No ratings yet
Definition of RNN (Recurrent Neural Network) :: H F W X W H B y G W H B
26 pages
RNN LSTM BiRNN Notes
No ratings yet
RNN LSTM BiRNN Notes
3 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
A Single-Layer RNN Can Approximate Stacked and Bidirectional RNNS, and Topologies in Between
No ratings yet
A Single-Layer RNN Can Approximate Stacked and Bidirectional RNNS, and Topologies in Between
18 pages
RNN Notes
No ratings yet
RNN Notes
45 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
19 pages
Efficient Parallel RNNs: Revisiting LSTMs and GRUs
No ratings yet
Efficient Parallel RNNs: Revisiting LSTMs and GRUs
20 pages
Lec02 RNN
No ratings yet
Lec02 RNN
52 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Black and White Simple Minimalist Pitch Deck Marketing Presentation
No ratings yet
Black and White Simple Minimalist Pitch Deck Marketing Presentation
13 pages
Unit 5 NNDL
No ratings yet
Unit 5 NNDL
43 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
Unit 3 Questions With Answers Ghanta Ka Password
No ratings yet
Unit 3 Questions With Answers Ghanta Ka Password
20 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Intro to Recurrent Neural Networks
No ratings yet
Intro to Recurrent Neural Networks
79 pages
9-Deep Neural Networks - Forward and Back Propagation-01-08-2024
No ratings yet
9-Deep Neural Networks - Forward and Back Propagation-01-08-2024
10 pages
7-Activation Functions - Gradient Descent - Back Propagation-31-07-2024
No ratings yet
7-Activation Functions - Gradient Descent - Back Propagation-31-07-2024
16 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
8-Activation Functions - Gradient Descent - Back Propagation-31-07-2024
No ratings yet
8-Activation Functions - Gradient Descent - Back Propagation-31-07-2024
9 pages
Lenet - 5 - On - CIFAR - 10 - DATASET - Ipynb - Colab
No ratings yet
Lenet - 5 - On - CIFAR - 10 - DATASET - Ipynb - Colab
4 pages
Week 4 - Classification Alternative Techniques
No ratings yet
Week 4 - Classification Alternative Techniques
87 pages
CS414-Lesson 07. Recurrent Neural Network
No ratings yet
CS414-Lesson 07. Recurrent Neural Network
23 pages
Image Classification Using Backpropagation Neural Network Without Using Built-In Function
No ratings yet
Image Classification Using Backpropagation Neural Network Without Using Built-In Function
8 pages
AI & Machine Learning Quiz
No ratings yet
AI & Machine Learning Quiz
13 pages
AIOT Unit 1
No ratings yet
AIOT Unit 1
23 pages
Lecture7 PDF
No ratings yet
Lecture7 PDF
228 pages
1.4 Deep Feed Forward Networks
No ratings yet
1.4 Deep Feed Forward Networks
16 pages
Convolutional Neural Network Architecture - CNN Architecture
No ratings yet
Convolutional Neural Network Architecture - CNN Architecture
13 pages
Question Bank of Machine Learning
No ratings yet
Question Bank of Machine Learning
4 pages
Neural - N - Problems - MLP
No ratings yet
Neural - N - Problems - MLP
15 pages
Simulate SVM Classification For A Dataset.: EX NO:04 Date
No ratings yet
Simulate SVM Classification For A Dataset.: EX NO:04 Date
4 pages
Dev Id3.ipynb - Colab
No ratings yet
Dev Id3.ipynb - Colab
4 pages
Capstone Project-1
No ratings yet
Capstone Project-1
15 pages
Neural Network Learning Models
No ratings yet
Neural Network Learning Models
7 pages
ML Question BanK
No ratings yet
ML Question BanK
5 pages
DL Unit5 RNN
No ratings yet
DL Unit5 RNN
107 pages
DL - Pre - Insem Exam Question Paper (2025 - 2026)
No ratings yet
DL - Pre - Insem Exam Question Paper (2025 - 2026)
1 page
Fashion Recommendation System Using Machine Learning: September 2023
No ratings yet
Fashion Recommendation System Using Machine Learning: September 2023
10 pages
Deep Learning Training Challenges
No ratings yet
Deep Learning Training Challenges
17 pages
L17 Perceptron
No ratings yet
L17 Perceptron
21 pages
Action Recognition 2
No ratings yet
Action Recognition 2
6 pages
Unit 1: 1. Introduction To Artificial Neural Network
0% (1)
Unit 1: 1. Introduction To Artificial Neural Network
17 pages
Neuron Activation in Machine Learning
No ratings yet
Neuron Activation in Machine Learning
41 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
8 pages
Unit 6 - Week 5: Assignment 5
No ratings yet
Unit 6 - Week 5: Assignment 5
3 pages
AI Crash Course For Beginners
No ratings yet
AI Crash Course For Beginners
60 pages
ML (5) Unit (Final)
No ratings yet
ML (5) Unit (Final)
19 pages
Input
No ratings yet
Input
7 pages
ch10 Sequence Modelling - Recurrent and Recursive Nets
No ratings yet
ch10 Sequence Modelling - Recurrent and Recursive Nets
45 pages

33-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024

Uploaded by

33-Bidirectional Encoder Representations From Transformers (BERT) - 30!09!2024

Uploaded by

`

LSTM AND BILSTM:

Another common task—often useful as a pretraining exercise prior to fine-tuning a model on

Bidirectional Recurrent Neural Networks (BiRNNs) are an extension of traditional Recurrent

[Link](optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Compiling the model

You might also like