0% found this document useful (0 votes)

5 views22 pages

Gen AI - Gen AI Models III

The document provides an overview of Generative AI models, focusing on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer Models. It explains the architecture and core components of Transformer models, including self-attention mechanisms, multi-head attention, and feed-forward neural networks, while also discussing their applications and advantages. Additionally, it highlights the challenges and limitations associated with Transformer models in natural language processing tasks.

Uploaded by

adomakojoseph2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views22 pages

Gen AI - Gen AI Models III

Uploaded by

adomakojoseph2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Master in Generative AI

Gen AI Models III

Enrichmentors Growing through Excellence over 40 years to become Best in Management
Purpose

The purpose of this section is to give you

an in-depth view of Generative AI Models
At the end of this lecture, you will learn the
following
• Generative Adversarial Networks (GANs)
• Variational Autoencoders (VAEs)
• Transformer Models

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Types of Generative Models

Generative
Adversarial
Transformer
Networks
Models
(GANs)

Variational
Autoencoders
(VAEs)

Enrichmentors Growing through Excellence over 40 years to become Best in Management

What are Transformer Models?
Such as GPT (Generative Pre-trained
Transformer) models, which are
particularly effective for generating
human-like text based on the input
they receive. Transformer models are a
type of deep learning architecture
designed for handling sequential data,
particularly useful in natural language
processing (NLP) tasks. Introduced by
Vaswani et al. in the 2017 paper
"Attention is All You Need,"
transformers have revolutionized the
field of NLP by enabling more efficient
and effective processing of text
compared to previous architectures like
recurrent neural networks (RNNs) and
long short-term memory networks
(LSTMs). Here’s an in-depth look at
transformer models

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Core Components of Transformer Models
Self-Attention Positional 1.Self-Attention
Mechanism Encoding Mechanism
2.Positional Encoding
3.Multi-Head
Multi-Head Feed-Forward Attention
Attention Neural Networks 4.Feed-Forward Neural
Networks
5.Layer Normalization
Layer and Residual
Normalization and
Residual Connections
Connections

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Self-Attention Mechanism

1.Self-
Scaled Attention
Self- Dot- 2.Scaled
Attention Product Dot-
Product
Attention
Attention
Enrichmentors Growing through Excellence over 40 years to become Best in Management
Self-Attention
This mechanism allows the model to weigh the importance of different words in a
sentence relative to each other, regardless of their position. It computes attention
scores for all pairs of words in a sequence, enabling the model to focus on relevant
parts of the input for each token
Smaller parts a sequence of text .
These tokens can be as small as
characters or as long as words.

numeric representations of words

in a lower-dimensional space,
capturing semantic and syntactic
information

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Self-Attention
Query, Key, Attention •Query, Key, and Value: For a given
and Value Scores word, the self-attention mechanism
computes three vectors: Query (Q),
For a given word, the Key (K), and Value (V). These
self-attention : The model calculates
mechanism computes
three vectors
attention scores vectors are learned during training.
• The query is the information that
is being looked for, the key is the
Query (Q)
Taking the dot
product of context or reference, and the value
is the content that is being
searched.
Key (K)
Query vector for the
current word
•Attention Scores: The model
calculates attention scores by
taking the dot product of the Query
Value (V)
Key vectors for all the
words in the input
vector for the current word and the
sequence Key vectors for all the words in the
input sequence. These scores
These vectors are These scores indicate indicate how much focus each word
learned during how much focus each
training. word should receive should receive

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Scaled Dot-Product Attention
It calculates attention
Self Attention scores using the dot
product of query and
Scaled key vectors, scaled by
the square root of
Dot- the dimension of the
Product key vectors. These
scores are then
Scaling by the
Attention passed through a
square root of
the dimension softmax function to
of the key
vectors obtain attention
weights
Enrichmentors Growing through Excellence over 40 years to become Best in Management
Core Components of Transformer Models
Self-Attention Positional
Mechanism Encoding

Multi-Head Feed-Forward
Attention Neural Networks

Layer
Normalization and
Residual
Connections
Enrichmentors Growing through Excellence over 40 years to become Best in Management
Positional Encoding
Since transformers do not
process sequences in order,
positional encodings are
added to the input
embeddings to provide
information about the
position of each token in
the sequence. These
encodings are combined
with the input embeddings
to retain the sequential
order information

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Multi-Head Attention

This component allows the

model to jointly attend to
information from different
representation subspaces
at different positions. It
consists of multiple self-
attention layers (heads)
running in parallel, whose
outputs are linked
together in series and
linearly transformed

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Feed-Forward Neural Networks
Each position in the
sequence is
processed
independently using
fully connected feed-
forward networks.
These networks are
applied after the
multi-head attention
mechanism

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Layer Normalization and Residual Connections
1. Layer normalization stabilizes and
Residual accelerates training by normalizing
Layer normalization connections (skip the inputs to each layer. It ensures
that the inputs have a consistent
connections) distribution and reduces the internal
covariate shift problem that can
occur during training
2. Residual connections (skip
Help in training deeper networks by allowing
Stabilizes and accelerates training by
gradients to flow through the network connections) help in training deeper
normalizing the inputs to each layer.
without vanishing networks by allowing gradients to
flow through the network without
vanishing. Gradient simply measures
the change in all weights with regard
Gradient simply measures the change in all to the change in error. You can also
weights with regard to the change in error.
It ensures that the inputs have a consistent You can also think of a gradient as the slope think of a gradient as the slope of a
distribution and reduces the internal of a function. The higher the gradient, the
covariate shift problem that can occur during steeper the slope and the faster a model can
function. The higher the gradient,
training learn. But if the slope is zero, the model stops the steeper the slope and the faster
learning
a model can learn. But if the slope is
zero, the model stops learning

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Architecture
The transformer
architecture
consists of an
encoder and a
decoder:
1.Encoder
2.Decoder

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Encoder
1. Composed of multiple
identical layers, each
A set of attention- containing two main
weighted representations components: a multi-head
self-attention mechanism
and a position-wise fully
connected feed-forward
Multiple
Identical network.
layers 2. The encoder processes the
input sequence and
generates a set of
attention-weighted
representations

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Decoder
1. Similar to the encoder, the
decoder also consists of
multiple identical layers. Each
A set of attention- layer includes a multi-head self-
weighted representations
attention mechanism, an
encoder-decoder attention
Multiple mechanism (which attends to
Identical the encoder's output), and a
Multiple layers feed-forward network.
Identical 2. The decoder generates the
layers output sequence one token at a
time, attending to the encoder's
output and previously
generated tokens

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Applications of Transformer Models
1.Language Modeling:
Models like GPT (Generative Pre-trained
Language Machine Text Question Sentiment Transformer) are used to predict the next
word in a sentence, enabling text
Modeling: Translation: Summarization: Answering: Analysis:
generation, auto-completion, and more.
2.Machine Translation:
Transformers are used in models like BERT
(Bidirectional Encoder Representations
from Transformers) and T5 (Text-To-Text
Transfer Transformer) to translate text from
Transformers are one language to another.
Models like BERT
used in models 3.Text Summarization:
Models like GPT and RoBERTa
like BERT
(Generative Pre-
(Bidirectional
(Robustly Transformers can generate concise
trained Optimized BERT summaries of long documents, extracting
Encoder Transformers can Transformers
Transformer) are Pretraining
used to predict
Representations generate concise
Approach) are
analyze the key information effectively.
from summaries of sentiment of 4.Question Answering:
the next word in employed in
Transformers) long documents, text, classifying
a sentence,
and T5 (Text-To- extracting key
question-
it as positive, Models like BERT and RoBERTa (Robustly
enabling text answering Optimized BERT Pretraining Approach) are
Text Transfer information negative, or
generation, systems,
Transformer) to effectively. neutral employed in question-answering systems,
auto- understanding
translate text understanding context to provide accurate
completion, and context to
from one
more. provide accurate answers.
language to
answers. 5.Sentiment Analysis:
another.
Transformers analyze the sentiment of text,
classifying it as positive, negative, or
neutral

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Advantages and Limitations of Transformer Models
1.Parallelization:
Recurrent Neural Unlike RNNs and LSTMs, transformers do not
Network(RNN) is a type process data sequentially, allowing for parallel
Challenges and
Advantages of Neural Network where the computation and faster training times.
Limitations output from the previous step 2.Long-Range Dependencies:
The self-attention mechanism effectively
is fed as input to the current captures long-range dependencies in the data,
step improving the model's ability to understand
context.
Scalability Interpretability 3.Scalability:
Transformers scale efficiently with data and
Long short-term memory (LSTM) computational resources, enabling the creation
is a type of recurrent neural of large models like GPT-3 with billions of
Long-Range Data network (RNN) aimed at dealing parameters.
Dependencies Requirements with the vanishing gradient Challenges and Limitations
problem present in traditional 1.Computational Resources:
RNNs. Its relative insensitivity to Training large transformer models requires
Computational significant computational power and memory.
Parallelization: gap length is its advantage over
Resources other RNNs, hidden Markov
2.Data Requirements:
Transformers often require large amounts of
models and other sequence training data to achieve high performance.
learning methods 3.Interpretability:
The complexity of transformer models can
make them difficult to interpret and
understand.

Enrichmentors Growing through Excellence over 40 years to become Best in Management

What are Transformer Models?
Conclusion
Transformer models have
transformed the landscape
of natural language
processing by enabling more
efficient and effective
handling of sequential data.
Their ability to capture long-
range dependencies and
process data in parallel has
led to significant
advancements in various NLP
tasks, making them a
cornerstone of modern AI
research and applications
Enrichmentors Growing through Excellence over 40 years to become Best in Management
What is next?
Model Development and Optimization

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Master in Generative AI

Gen AI Models III

Enrichmentors Growing through Excellence over 40 years to become Best in Management

Lecture 6 Transformers
No ratings yet
Lecture 6 Transformers
92 pages
Transformer
No ratings yet
Transformer
31 pages
CS414-Lesson 10.transformer and Applications
No ratings yet
CS414-Lesson 10.transformer and Applications
50 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
GenAI For Developers
No ratings yet
GenAI For Developers
205 pages
495 Lecture 10 Attall
No ratings yet
495 Lecture 10 Attall
18 pages
Transformers From Scratch PoliTO - Ipynb Colab
No ratings yet
Transformers From Scratch PoliTO - Ipynb Colab
17 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
495 Lecture 8
No ratings yet
495 Lecture 8
28 pages
AI Research & Innovation Guide
No ratings yet
AI Research & Innovation Guide
17 pages
Transformers
No ratings yet
Transformers
15 pages
GEN-AI Handout 1
No ratings yet
GEN-AI Handout 1
4 pages
TRANSFORMER
No ratings yet
TRANSFORMER
29 pages
Chap6 Transformer (20240219) - DL4H Practioner Guide
No ratings yet
Chap6 Transformer (20240219) - DL4H Practioner Guide
36 pages
Transformers - The Brain of ChatGPT
No ratings yet
Transformers - The Brain of ChatGPT
25 pages
A1
No ratings yet
A1
11 pages
Transformers
No ratings yet
Transformers
15 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
NLP Transformers Explained
No ratings yet
NLP Transformers Explained
21 pages
Transformer
No ratings yet
Transformer
4 pages
NLP Week8 Transformers
No ratings yet
NLP Week8 Transformers
66 pages
The Transformer Family
No ratings yet
The Transformer Family
25 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
15 pages
Chapter 4
No ratings yet
Chapter 4
24 pages
Transformer 24 Aug
No ratings yet
Transformer 24 Aug
56 pages
All You Need To Know About The Self-Attention Layer
No ratings yet
All You Need To Know About The Self-Attention Layer
80 pages
2024 Transformer Master
No ratings yet
2024 Transformer Master
50 pages
Lec 12
No ratings yet
Lec 12
30 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
Quiz1 Answers
No ratings yet
Quiz1 Answers
29 pages
09 Transformers
No ratings yet
09 Transformers
40 pages
Understanding Self-Attention in Transformers
No ratings yet
Understanding Self-Attention in Transformers
16 pages
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
No ratings yet
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
15 pages
Understanding Self-Attention
No ratings yet
Understanding Self-Attention
37 pages
Decoder-Only Transformers - The Workhorse of Generative LLMs
No ratings yet
Decoder-Only Transformers - The Workhorse of Generative LLMs
23 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
Understanding Transformer Models
No ratings yet
Understanding Transformer Models
29 pages
Ai900 M1 Notes
No ratings yet
Ai900 M1 Notes
4 pages
Generative AI
No ratings yet
Generative AI
54 pages
Transformer
No ratings yet
Transformer
5 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
58 pages
Transformers 22nd April 2025
No ratings yet
Transformers 22nd April 2025
67 pages
Transformers 1
No ratings yet
Transformers 1
6 pages
Transformer
No ratings yet
Transformer
33 pages
Attention in Neural Networks
No ratings yet
Attention in Neural Networks
8 pages
Transformer Models in NLP
No ratings yet
Transformer Models in NLP
21 pages
7181 Attention Is All You Need
No ratings yet
7181 Attention Is All You Need
11 pages
LLM Attention
No ratings yet
LLM Attention
13 pages
11 Transformers Notes
No ratings yet
11 Transformers Notes
25 pages
Transformer Mixture Key
No ratings yet
Transformer Mixture Key
27 pages
The Transformer Model - Revolutionizing Artificial Intelligence
No ratings yet
The Transformer Model - Revolutionizing Artificial Intelligence
6 pages
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
No ratings yet
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
11 pages
Module 3 Presentation
No ratings yet
Module 3 Presentation
48 pages
The Transformer Revolution How Attention Changed Everything
No ratings yet
The Transformer Revolution How Attention Changed Everything
10 pages
Transformer Architecture Explained in LLMs
No ratings yet
Transformer Architecture Explained in LLMs
2 pages
IPC Final
No ratings yet
IPC Final
7 pages
IPC Presentation Final
No ratings yet
IPC Presentation Final
18 pages
The Billion Dollar Secret
No ratings yet
The Billion Dollar Secret
23 pages
Gen+AI Introduction
No ratings yet
Gen+AI Introduction
7 pages
Gen+AI +Gen+AI+Models+I
No ratings yet
Gen+AI +Gen+AI+Models+I
12 pages
Gen+AI +Overview+III
No ratings yet
Gen+AI +Overview+III
17 pages
Gen+AI +Overview+IV
No ratings yet
Gen+AI +Overview+IV
12 pages
Gen AI - Overview II
No ratings yet
Gen AI - Overview II
12 pages
English Grammar Icse 25
No ratings yet
English Grammar Icse 25
12 pages
Chapter 1-Overview of Verb Tenses
No ratings yet
Chapter 1-Overview of Verb Tenses
7 pages
Linux DHCP Server Notes
No ratings yet
Linux DHCP Server Notes
3 pages
Syl Half Yearly
No ratings yet
Syl Half Yearly
5 pages
Play - Definition
No ratings yet
Play - Definition
5 pages
Gujarat Technological University: Diploma Engineering - Semester - 1 (Ctod) New - Examination - Summer - 2023
No ratings yet
Gujarat Technological University: Diploma Engineering - Semester - 1 (Ctod) New - Examination - Summer - 2023
5 pages
Great Ideas of Mathematics
No ratings yet
Great Ideas of Mathematics
2 pages
Inspiring People #1
No ratings yet
Inspiring People #1
4 pages
B2M Vocabulary Journal (2022-23) Updated
No ratings yet
B2M Vocabulary Journal (2022-23) Updated
148 pages
Golden and Noble Thoughts
100% (1)
Golden and Noble Thoughts
28 pages
Jee Main-2025 - Important Replica QS - Maths @
No ratings yet
Jee Main-2025 - Important Replica QS - Maths @
307 pages
Retractionofrizal 220617061037 C38e6e11
No ratings yet
Retractionofrizal 220617061037 C38e6e11
40 pages
Exception Quest
No ratings yet
Exception Quest
19 pages
Advanced Modal Usage with 'Can'
No ratings yet
Advanced Modal Usage with 'Can'
29 pages
Visvas Faq
No ratings yet
Visvas Faq
10 pages
Exam Preparation Guide: Week 10
No ratings yet
Exam Preparation Guide: Week 10
98 pages
Dep Ed
No ratings yet
Dep Ed
33 pages
Abinitio Intvw Questions
100% (1)
Abinitio Intvw Questions
20 pages
HPE AI-ML Accelerated With HPE Proliant
No ratings yet
HPE AI-ML Accelerated With HPE Proliant
33 pages
Burlingame-The Act of Truth (Saccakiriya), A Hindu Spell and Its Employment As A Psychic Motif-1917
No ratings yet
Burlingame-The Act of Truth (Saccakiriya), A Hindu Spell and Its Employment As A Psychic Motif-1917
40 pages
Eternal Recurrence Personal Infinity 2019
No ratings yet
Eternal Recurrence Personal Infinity 2019
18 pages
Anuj KumarKushwaha - InternshalaResume-3
No ratings yet
Anuj KumarKushwaha - InternshalaResume-3
2 pages
How To Clean Up The Cache After Applying Changes That Affect SAP Fiori Apps
No ratings yet
How To Clean Up The Cache After Applying Changes That Affect SAP Fiori Apps
9 pages
Semiotics of Power and Knowledge in Nasi
No ratings yet
Semiotics of Power and Knowledge in Nasi
29 pages
Lie Detection
No ratings yet
Lie Detection
4 pages
Troilus and Criseyde by Geoffrey Chaucer
No ratings yet
Troilus and Criseyde by Geoffrey Chaucer
22 pages
Diverse Ramayana Versions Explored
No ratings yet
Diverse Ramayana Versions Explored
3 pages
Text Books
No ratings yet
Text Books
236 pages
Eng Lang Paper CL VIII
No ratings yet
Eng Lang Paper CL VIII
4 pages
Travel Management System
No ratings yet
Travel Management System
118 pages