0% found this document useful (0 votes)
12 views

Generative AI Interview Questions and Answers

The document provides an overview of Transformers, a neural network architecture that processes input sequences in parallel using self-attention, making it superior to RNNs in handling long-range dependencies. It explains various attention mechanisms, including self-attention, multi-head attention, and cross-attention, and highlights the applications of Transformers in models like BERT, GPT, and T5. Additionally, it defines Large Language Models (LLMs) as AI programs trained on vast datasets to understand and generate human language.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Generative AI Interview Questions and Answers

The document provides an overview of Transformers, a neural network architecture that processes input sequences in parallel using self-attention, making it superior to RNNs in handling long-range dependencies. It explains various attention mechanisms, including self-attention, multi-head attention, and cross-attention, and highlights the applications of Transformers in models like BERT, GPT, and T5. Additionally, it defines Large Language Models (LLMs) as AI programs trained on vast datasets to understand and generate human language.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

50+ Generative AI

Interview Questions
Q1. What are Transformers?

A Transformer is a type of neural network architecture introduced


in the 2017 paper “Attention Is All You Need” by Vaswani et al. It
has become the backbone for many state-of-the-art natural
language processing models.
Here are the key points about Transformers:

Architecture: Unlike recurrent neural networks (RNNs), which


process input sequences sequentially, transformers handle
input sequences in parallel via a self-attention mechanism.

Key components:
Encoder-Decoder structure
Multi-head attention layers
Feed-forward neural networks
Positional encodings

Self-attention: This feature enables the model to efficiently


capture long-range relationships by assessing the relative
relevance of various input components as it processes each
element.

Parallelisation: Transformers can handle all input tokens


concurrently, which speeds up training and inference times
compared to RNNs.
Q2. What is Attention? What are some
attention mechanism types?
Attention is a technique used in generative AI and neural networks
that allows models to focus on specific input areas when
generating output. It enables the model to dynamically ascertain
the relative importance of each input component in the sequence
instead of considering all the input components similarly.

1. Self-Attention:
Also referred to as intra-attention, self-attention enables a model
to focus on various points within an input sequence. It plays a
crucial role in transformer architectures.

2. Multi-Head Attention:
This technique enables the model to attend to data from many
representation subspaces by executing numerous attention
processes simultaneously.

3. Cross-Attention:
This technique enables the model to process one sequence while
attending to information from another and is frequently utilised in
encoder-decoder systems.
Q3. How and why are transformers better
than RNN architectures?
How: Transformers process entire sequences in parallel.

Why better:

RNNs process sequences sequentially, which is slower.


Transformers can leverage modern GPU architectures more
effectively, resulting in significantly faster training and
inference times.

How: Transformers use self-attention to directly model


relationships between all pairs of tokens in a sequence.

Why better:

Because of the vanishing gradient issue, RNNs have difficulty


handling long-range dependencies.
Transformers perform better on tasks that require a grasp of
greater context because they can easily capture both short—
and long-range dependencies.
Q4. Where are Transformers used?

These models are significant advancements in natural language


processing, all built on the transformer architecture.

BERT (Bidirectional Encoder Representations from Transformers):


Architecture: Uses only the encoder part of the transformer.
Key feature: Bidirectional context understanding

GPT (Generative Pre-trained Transformer):


Architecture: Uses only the decoder part of the transformer.
Key feature: Autoregressive language modeling.

T5 (Text-to-Text Transfer Transformer):


Architecture: Encoder-decoder transformer.
Key feature: Frames all NLP tasks as text-to-text problems.

RoBERTa (Robustly Optimized BERT Approach):


Architecture: Similar to BERT, but with optimized training
process.
Key improvements: Longer training, larger batches, more data.

XLNet:
Architecture: Based on transformer-XL.
Key feature: Permutation language modeling for bidirectional
context without masks.
Q5. What is a Large Language Model (LLM)?

A large language model (LLM) is a type of artificial intelligence


(AI) program that can recognize and generate text, among
other tasks. LLMs are trained on huge sets of data — hence
the name “large.” LLMs are built on machine learning;
specifically, a type of neural network called a transformer
model.

To put it more simply, an LLM is a computer program that has


been fed enough instances to identify and comprehend
complicated data, like human language. Thousands or millions
of megabytes of text from the Internet are used to train a large
number of LLMs. However, an LLM’s programmers may
choose to employ a more carefully selected data set because
the caliber of the samples affects how successfully the LLMs
learn natural language.

A foundational LLM (Large Language Model) is a pre-trained


model trained on a large and diverse corpus of text data to
understand and generate human language. This pre-training
allows the model to learn the structure, nuances, and patterns
of language but in a general sense, without being tailored to
any specific tasks or domains. Examples include GPT-3 and
GPT-4.
For more information, visit the article

You might also like