0% found this document useful (0 votes)
66 views

Generative AI

Uploaded by

akilali1991999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Generative AI

Uploaded by

akilali1991999
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Generative AI

Generative AI, a subset of deep learning, creates artificial content by understanding


patterns in existing data. It uses various models for tasks like image synthesis, text
generation, and music composition.

Key Pointers
 Generative models include VAEs, GANs, autoregressive models, and
transformers.
 GANs use two neural networks to create realistic data.
 VAEs encode and decode data for tasks like anomaly detection.
 Autoregressive models generate sequences of data.
 Transformers, with an encoder-decoder structure, are used for language translation
and text generation.
 Applications include DeepArt, MuseNet, GPT, deepfake technology, AI Dungeon,
and NVIDIA GauGAN.
 ChatGPT enables human-like conversation Google Bard assists with queries and
content generation.
 DALL-E creates images from text prompts.
 Benefits include enhancing creativity, aiding research, and providing personalized
content.
 Limitations include dependency on data quality, potential biases, and high
computational requirements.
 Ethical concerns involve misinformation, privacy violations, and misuse in creating
deepfakes.
 Generative AI impacts various industries, posing risks to security, privacy, and
employment.
 Common uses are in content generation, language translation, chatbots, data
augmentation, and medical imagin
In simple words, It generally involves training AI models to understand different
patterns and structures within existing data and using that to generate new original
data.

It learns the underlying patterns and structures of the training data


before generating fresh samples as compared to properties.
Image synthesis, text generation, and music composition are all tasks that
use generative models.
Models of Gen AI:
Variational Autoencoders (VAEs) , Generative Adversarial Networks
(GANs), Autoregressive models, and Transformers are some examples of
popular generative model architectures these models help to create new
data that helps users in different aspects.

1. Generative Adversarial Networks (GANs)

The Generative Adversarial Network is a type of machine learning model


that creates new data that is similar to an existing dataset
GANs generally involve two neural networks.- The Generator and The
Discriminator.
The Generator generates new data samples, while the Discriminator
verifies the generated data.

The two networks are trained together in a process in which the Generator
attempts to fool the Discriminator into thinking the generated data is real,
while the Discriminator attempts to accurately detect whether the data is
real or fake.
This process is repeated until the Generator becomes so good at
providing realistic data that the Discriminator can no longer distinguish
difference.

GANs can be used for image synthesis, style transfer, data augmentation,
and other tasks.

2. Variational Autoencoders (VAEs)

As compared to GANs it follows a different approach. This transforms the


given input data into newly generated data through a process involving
both encoding and decoding.
The encoder transforms input data into a lower-dimensional latent
space representation (The latent space can be seen as a compressed representation
of the input data), while the decoder reconstructs the original data from
the latent space.
Through training, VAEs learn to generate data that resembles the original
inputs while exploring the latent space.
Some of the applications of VAEs are Image Generation, anomaly
detection, and latent space exploration.
3. Autoregressive models

Autoregressive models are a type of generative model that is used


in Generative AI to generate sequences of data like text, music, or
time series data.

These models generate data one element at a time, considering the


context of previously generated elements. Based on the element that
came before it, autoregressive models forecast the next element in the
sequence.

4. Transformers

Transformers have transformed Generative AI by introducing a highly


effective architecture for tasks like language translation, Text
Generation(like the GPT series), and even image synthesis.

Here is an overview of the main components of the transformer


architecture:
 Encoder-Decoder Structure: The transformer’s architecture is divided
into an encoder and a decoder. The encoder processes the input
sequence and the decoder processes that sequence.
 Multi-Head Attention: Multi-head attention captures diverse
dependencies and features by considering different aspects of the
input sequence simultaneously.
 Positional Encodings: Unlike RNNs, Transformers do not have built-
in word sense. Positional encodings are added to input embeddings to
represent the places of words within a sequence.
 Transformer Decoder: The decoder uses additional self-attention that
focuses on the previously generated words in the output sequence to
ensure coherence.
 Position-wise Feedforward Networks in Decoder: Positional
encodings include Feed-Forward layers, which are included in both the
encoder and the decoder and help to capture contextual information.
What are Examples of Generative AI tools?
 Art and Music: DeepArt or DeepDream tool which helps find and
enhance patterns images. by using a Convolutional Neural Network.
 Music: MuseNet is a deep learning model that can compose music in
multiple styles.
 Text Generation: Gpt-3 developed by Open AI is one such application
that can generate text as per the need of the user such as content
creation, programming help, and numerous applications.
 Deepfake Creation: Deepfake which uses GANs to swap faces in
videos. It is an image or a video recording that uses an algorithm to
replace the person in the original video or image with someone else.
 Game Development: AI Dungeon is a text-based adventure game
that uses the GPT-3 to generate a dynamic storyline that is based on
user input.
 3D Object Generation: Generative AI tools like NVIDIA’s
GauGAN allow users to create 3D landscapes by drawing simple
handmade sketches and pictures

What is Chat GPT, Google Bard, and DALL-E?

ChatGPT is an NLP tool that is driven by AI technology. It allows you to


have human-like conversations and much more features with the help of a
chatbot. This model can answer any questions and assist you with any
task related to development, programming, gaming, composing essays,
email, etc.
Google Bard: It provides answers to users’ various queries quickly and
usually within second
DALL-E: It is a new tool that generally helps to create new images
with text-to-graphic prompts. By using GPT-3 and getting trained on a
given dataset, DALL-E can produce images that don’t even exist.
Generative AI Vs AI

What is AI?
Artificial intelligence or AI, the broadest term of the three, is used to
classify machines that mimic human intelligence and human cognitive
functions like problem-solving and learning.
AI uses predictions and automation to optimize and solve complex tasks
that humans have historically done, such as facial and speech recognition,
decision-making and translation.
What is machine learning?
Machine learning is a subset of AI that allows for optimization. When set
up correctly, it helps you make predictions that minimize the errors that
arise from merely guessing. For example, companies like Amazon use
machine learning to recommend products to a specific customer based on
what they’ve looked at and bought before.
Classic or “nondeep” machine learning depends on human intervention to allow a computer
system to identify patterns, learn, perform specific tasks and provide accurate results.
Human experts determine the hierarchy of features to understand the differences between
data inputs, usually requiring more structured data to learn.

While the subset of AI called deep machine learning can leverage labeled data sets to inform
its algorithm in supervised learning, it doesn’t necessarily require a labeled data set. It can
ingest unstructured data in its raw form (for example, text, images), and it can automatically
determine the set of features that distinguish “pizza,” “burger” and “taco” from one another

What Is Deep Learning (DL)?

Deep learning plays an essential role as a separate branch within the Artificial

Intelligence (AI) field due to its unique capabilities and advancements. Deep

learning is defined as a machine learning technique that teaches the computer

to learn from the data that is inspired by humans.

DL utilizes deep neural networks with multiple layers to learn hierarchical

representations of data. It automatically extracts relevant features and

eliminates manual feature engineering. DL can handle complex tasks and large-

scale datasets more effectively. Despite the increased complexity and

interpretability challenges, DL has shown tremendous success in various

domains, including computer vision, natural language processing, and speech

recognition.

How deep learning differs from machine learning


As our article on deep learning explains, deep learning is a subset of
machine learning. The primary difference between machine learning and
deep learning is how each algorithm learns and how much data each type
of algorithm uses.

Deep learning automates much of the feature extraction piece of the


process, eliminating some of the manual human intervention required.
What’s the difference between deep learning and
neural networks?
As mentioned in the explanation of neural networks above, but worth
noting more explicitly, the “deep” in deep learning refers to the depth of
layers in a neural network. A neural network of more than three layers,
including the inputs and the output, can be considered a deep-learning
algorithm. That can be represented by the following diagram:

Most deep neural networks are feed-forward, meaning they only flow in
one direction from input to output. However, you can also train your
model through backpropagation, meaning moving in the opposite
direction, from output to input. Backpropagation allows us to calculate
and attribute the error that is associated with each neuron, allowing us to
adjust and fit the algorithm appropriately.

AI serves as the broad, encompassing concept, while ML


learns patterns from data, DL leverages deep neural
networks for intricate pattern recognition, and Generative
AI creates new content.
Fundamentals of Azure OpenAI Service

Stable AI models are regularly put into production and used commercially
around the world. For example, Microsoft's existing Azure AI services have
been handling the needs of businesses for many years to date.

In 2022, OpenAI, an AI research company, created a chatbot known as


ChatGPT and an image generation application known as DALL-E. These
technologies were built with AI models which can take natural language
input from a user and return a machine-created human-like response.

Azure OpenAI Service enables users to build enterprise-grade solutions


with OpenAI models.

With Azure OpenAI, users can summarize text, get code


suggestions, generate images for a web site, and much more.

Capabilities of OpenAI AI models


There are several categories of capabilities found in OpenAI AI models,
three of these include:

Expand table

Capability Examples

Generating natural Such as: summarizing complex text for different reading levels, suggesting alternative
language
wording for sentences, and much more

Generating code Such as: translating code from one programming language into another, identifying and
troubleshootingbugsincode, and much more

Generating images Such as: generating images for publications from text descriptions and much more

 Generative AI models can produce new content based on what is


described in the input. The OpenAI models are a collection of
generative AI models that can produce language, code, and images.

Azure OpenAI Service


The service combines Azure's enterprise-grade capabilities with OpenAI's
generative AI model capabilities.

Azure OpenAI is available for Azure users and consists of four


components:

 Pre-trained generative AI models


 Customization capabilities; the ability to fine-tune AI models with
your own data
 Built-in tools to detect and mitigate harmful use cases so users can
implement AI responsibly
 Enterprise-grade security with role-based access control (RBAC) and
private networks

Azure OpenAI's relationship to Azure AI services

Azure AI services are tools for solving AI workloads. The services you
choose to use depend on what you need to accomplish.

In particular, there are several overlapping capabilities between Azure AI


Language service and Azure OpenAI Service, such as translation,
sentiment analysis, and keyword extraction.

While there's no strict guidance on when to use a particular service, Azure


AI Language service can be used for widely known use-cases that require
minimal tuning (the process of optimizing a model's performance). Azure
OpenAI Service may be more beneficial for use-cases that require highly
customized generative models, or for exploratory research.

Azure OpenAI Studio


In the Azure OpenAI Studio, you can build AI models and deploy them for public
consumption in software applications.

Playgrounds

In the Azure OpenAI Studio, you can experiment with OpenAI models in
playgrounds. In the Completions playground, you can type in prompts,
configure parameters, and see responses without having to code.

In the Chat playground, you can use the assistant setup to instruct the
model about how it should behave. The assistant will try to mimic the
responses you include in tone, rules, and format you've defined in your
system message.

Understand OpenAI's natural language


capabilities
Azure OpenAI's natural language models are able to take in natural
language and generate responses.

Natural language learning models are trained on words or chunks of


characters known as tokens.

For example, the word "hamburger" gets broken up into the


tokens ham, bur, and ger, while a short and common word like "pear" is a
single token.

These tokens are mapped into vectors for a machine learning model to
use for training. When a trained natural language model takes in a user's
input, it also breaks down the input into tokens.

What does a response from a GPT model look like?

A key aspect of OpenAI's generative AI is that it takes an input,


or prompt, to return a natural language, visual, or code response.

GPT tries to infer, or guess, the context of the user's question based on
the prompt.
How models are applied to new use cases
You may have tried out ChatGPT's predictive capabilities in a chat portal,
where you can type prompts and receive automated responses.

The portal consists of the front-end user interface (UI) users see, and a
back-end that includes a generative AI model.

The combination of the front and back end can be described as a chatbot.

The model provided on the back end is what is available as a building


block with both the OpenAI API and Azure OpenAI API. You can utilize
ChatGPT's capabilities on Azure OpenAI via the GPT-35-turbo model.

When you see generative AI capabilities in other applications, developers


have taken the building blocks, customized them to a use case, and built
them into the back end of new front-end user interfaces.

Understand OpenAI code generation capabilities


GPT models are able to take natural language or code snippets and
translate them into code.

The OpenAI GPT models are proficient in over a dozen languages, such as
C#, JavaScript, Perl, PHP, and is most capable in Python.

GPT models have been trained on both natural language and billions of
lines of code from public repositories. The models are able to generate
code from natural language instructions such as code comments, and can
suggest ways to complete code functions.

For example, given the prompt "Write a for loop counting from 1 to 10 in
Python," the following answer is provided:

for i in range(1,11):
print(i)

GPT models can help developers code faster, understand new coding
languages, and focus on solving bigger problems in their application

GitHub Copilot
OpenAI partnered with GitHub to create GitHub Copilot, which they call
an AI pair programmer. GitHub Copilot integrates the power of OpenAI
Codex into a plugin for developer environments like Visual Studio Code.

Once the plugin is installed and enabled, you can start writing your code,
and GitHub Copilot starts automatically suggesting the remainder of the
function based on code comments or the function name. For example, we
have only a function name in the file, and the gray text is automatically
suggested to complete it.

Understand OpenAI's image generation


capabilities
Image generation models can take a prompt, a base image, or both, and
create something new. These generative AI models can create both
realistic and artistic images, change the layout or style of an image, and
create variations on a provided image.

DALL-E:
In addition to natural language capabilities, generative AI models can edit
and create images. The model that works with images is called DALL-E.
Much like GPT models, subsequent versions of DALL-E are appended onto
the name, such as DALL-E 2. Image capabilities generally fall into the
three categories of image creation, editing an image, and creating
variations of an image.
Large Language Model (LLM)
A large language model is a type of artificial intelligence algorithm that
applies neural network techniques with lots of parameters to process and
understand human languages or text using self-supervised learning
techniques.
Tasks like text generation, machine translation, summary writing, image
generation from texts, machine coding, chat-bots, or Conversational AI
are applications of the Large Language Model.
Examples of such LLM models are Chat GPT by open AI, BERT
(Bidirectional Encoder Representations from Transformers) by Google,
etc.
LLM is purely based on the deep learning methodologies.
LLM (Large language model) models are highly efficient in capturing the
complex entity relationships in the text at hand and can generate the text
using the semantic and syntactic of that particular language in which we
wish to do so.

LLM Models
If we talk about the size of the advancements in the GPT (Generative Pre-
trained Transformer) model only then:

 GPT-1 which was released in 2018 contains 117 million parameters


having 985 million words.
 GPT-2 which was released in 2019 contains 1.5 billion parameters.
 GPT-3 which was released in 2020 contains 175 billion parameters.
Chat GPT is also based on this model as well.
 GPT-4 model is expected to be released in the year 2023 and it is
likely to contain trillions of parameters.

How do Large Language Models work?


Models, are trained on vast datasets using self-supervised learning
techniques. The core of their functionality lies in the intricate patterns and
relationships they learn from diverse language data during training.

LLMs consist of multiple layers, including feedforward layers, embedding


layers, and attention layers. They employ attention mechanisms, like self-
attention, to weigh the importance of different tokens in a sequence,
allowing the model to capture dependencies and relationships.
Architecture of LLM
A Large Language Model’s (LLM) architecture is determined by a number
of factors, like the objective of the specific model design, the available
computational resources, and the kind of language processing tasks that
are to be carried out by the LLM.
The general architecture of LLM consists of many layers such as the feed
forward layers, embedding layers, attention layers. A text which is
embedded inside is collaborated together to generate predictions.

Important components to influence Large Language Model


architecture
 Model Size and Parameter Count
 input representations
 Self-Attention Mechanisms
 Training Objectives
 Computational Efficiency
 Decoding and Output Generation

Transformer-Based LLM Model Architectures

Transformer-Based LLM Model Architectures


Transformer-based models, which have revolutionized natural language
processing tasks, typically follow a general architecture that includes the
following components:
1. Input Embeddings: The input text is tokenized into smaller units, such
as words or sub-words, and each token is embedded into a continuous
vector representation. This embedding step captures the semantic and
syntactic information of the input.
2. Positional Encoding: Positional encoding is added to the input
embeddings to provide information about the positions of the tokens
because transformers do not naturally encode the order of the tokens.
This enables the model to process the tokens while taking their
sequential order into account.
3. Encoder: Based on a neural network technique, the encoder analyses
the input text and creates a number of hidden states that protect the
context and meaning of text data . Self-attention mechanism and feed-
forward neural network are the two fundamental sub-components of
each encoder layer.
1. Self-Attention Mechanism: Self-attention enables the model to
weigh the importance of different tokens in the input sequence by
computing attention scores. It allows the model to consider the
dependencies and relationships between different tokens in a
context-aware manner.
2. Feed-Forward Neural Network: After the self-attention step, a
feed-forward neural network is applied to each token independently.
This network includes fully connected layers with non-linear
activation functions, allowing the model to capture complex
interactions between tokens.
4. Decoder Layers: In some transformer-based models, a decoder
component is included in addition to the encoder. The decoder layers
enable autoregressive generation, where the model can generate
sequential outputs by attending to the previously generated tokens .
5. Multi-Head Attention: Transformers often employ multi-head
attention, where self-attention is performed simultaneously with
different learned attention weights. This allows the model to capture
different types of relationships and attend to various parts of the input
sequence simultaneously.
6. Layer Normalization: Layer normalization is applied after each sub-
component or layer in the transformer architecture. It helps stabilize the
learning process and improves the model’s ability to generalize across
different inputs.
7. Output Layers: The output layers of the transformer model can vary
depending on the specific task. For example, in language modeling, a
linear projection followed by SoftMax activation is commonly used to
generate the probability distribution over the next token.

Large Language Models Examples


GPT – 3: The full form for GPT is a Generative pre-trained Transformer
BERT – The full form for this is Bidirectional Encoder Representations
from Transformers. used to generate embeddings for a particular text may
be to train some other model.
Large Language Models Use Cases
From the above introductions and technical information about the LLMs
you must have understood that the Chat GPT is also an LLM
 Code Generation, Debugging and Documentation of Code, Question
Answering, Language Transfer

Difference Between NLP and LLM


NLP is Natural Language Processing, a field of artificial intelligence (AI).
It consists of the development of the algorithms. NLP is a broader field
than LLM, which consists of algorithms and techniques. NLP rules two
approaches i.e. Machine learning and the analyze language data.

Applications of NLP are-


 Automotive routine task
 Improve search
 Search engine optimization
 Analyzing and organizing large documents
 Social Media Analytics.

while on the other hand,


LLM is a Large Language Model, and is more specific to human- like
text, providing content generation, and personalized recommendations.

Challenges in Training of Large Language Models

 millions of dollars are required to set up that big computing power


that can train the model
 requires months of training and then humans in the loop for the fine-
tuning of models
 In the era of global warming and climate change, LLM it is said that
training a single AI model from scratch have carbon footprints equal
to the carbon footprint of five cars in their whole lifetime which is a
really serious concern
LangChain

LangChain is a framework designed to simplify the creation of applications using large language
models

LangChain is a framework to build with LLMs by chaining interoperable components.


LangGraph is the framework for building controllable agentic workflows.

LangChain provides tools and abstractions to improve the customization, accuracy, and
relevancy of the information the models generate. For example, developers can use
LangChain components to build new prompt chains or customize existing templates.
LangChain is a versatile Python library that empowers developers and researchers to create,
experiment with, and analyze language models and agent.

What is LangChain?
LangChain is an open source framework for the
development of applications using large language
models (LLMs). Available in both Python- and
Javascript-based libraries, LangChain’s tools and
APIs simplify the process of building LLM-driven
applications like chatbots and virtual agents.
LangChain serves as a interface for nearly any LLM, providing a centralized development
environment to build LLM applications and integrate them with external data sources and
software workflows.
LangChain’s module-based approach allows developers and data scientists to dynamically
compare different prompts and even different foundation models with minimal need to
rewrite code.
This modular environment also allows for programs that use multiple LLMs: for example, an
application that uses one LLM to interpret user queries and another LLM to author a
response.

How does LangChain work?

At LangChain’s core is a development environment


that streamlines the programming of LLM
applications through the use of abstraction: the
simplification of code by representing one or more
complex processes as a named component that
encapsulates all of its constituent steps.

Importing language models


Nearly any LLM can be used in LangChain.
Importing language models into LangChain is easy, provided you have an
API key. The LLM class is designed to provide a standard interface for all
models.

Prompt templates
Prompts are the instructions given to an LLM.

The PromptTemplate class in LangChain formalizes the composition of prompts


without the need to manually hard code context and queries.

A prompt template can thus contain and reproduce context, instructions (like “do not
use technical terms”), a set of examples to guide its responses (in what is called
“few-shot prompting”), a specified output format or a standardized question to be
answered.
Chains
As its name implies, chains are the core of LangChain’s workflows. They
combine LLMs with other components, creating applications by executing
a sequence of functions.

The most basic chain is LLMChain. It simply calls a model and prompt template for
that model.
For example, imagine you saved a prompt as “ExamplePrompt” and wanted to run it
against Flan-T5. You can import LLMChain from langchain.chains, then
define chain_example = LLMChain(llm = flan-t5, prompt = ExamplePrompt). To run
the chain for a given input, you simply call chain_example.run(“input”).

To use the output of one function as the input for the next function, you can
use SimpleSequentialChain. Each function could utilize different prompts, different
tools, different parameters or even different models, depending on your specific
needs

Indexes
To achieve certain tasks, LLMs will need access to specific external data
sources not included in its training dataset, such as internal documents,
emails or datasets. LangChain collectively refers to such external
documentation as “indexes”.

Document loaders
LangChain offers a wide variety of document loaders for third party
applications (link resides outside ibm.com). This allows for easy importation of
data from sources like file storage services (like Dropbox, Google Drive and
Microsoft OneDrive), web content (like YouTube, PubMed or specific URLs),
collaboration tools (like Airtable, Trello, Figma and Notion), databases (like Pandas,
MongoDB and Microsoft), among many others.

Vector databases
vector databases represent data points by converting them into vector
embeddings: numerical representations in the form of vectors with a fixed number
of dimensions, often clustering related data points using unsupervised learning
methods. Vector embeddings also store each vector’s metadata, further
enhancing search possibilities.

LangChain provides integrations for over 25 different embedding methods, as well as


for over 50 different vector stores (both cloud-hosted and local).

Text splitters
To increase speed and reduce computational demands, it’s often wise to split large
text documents into smaller pieces. LangChain’s TextSplitters split text up into
small, semantically meaningful chunks that can then be combined using
methods and parameters of your choosing.

Retrieval

Once external sources of knowledge have been connected, the model must be able
to quickly retrieve and integrate relevant information as needed.
LangChain offers retrieval augmented generation
(RAG): its retriever modules accept a string query as an input and return a
list of Document’s as output.

Memory
LLMs, by default, do not have any long-term memory of prior
conversations.
LangChain solves this problem with simple utilities for adding memory
to a system, with options ranging from retaining the entirety of all
conversations to retaining a summarization of the conversation thus far to
retaining the n most recent exchanges.

Agents
LangChain agents can use a given language model as a “reasoning
engine” to determine which actions to take. When building a chain
for an agent, inputs include:

 a list of available tools to be leveraged.


 user input (like prompts and queries).
 any relevant previously executed steps.

LangChain tools
are a set of functions that empower LangChain agents to interact with real-world
information in order to expand or improve the services it can provide.
Examples of prominent LangChain tools include:
 Wolfram Alpha: provides access to powerful computational and
data visualization functions, enabling sophisticated mathematical
capabilities.
 Google Search: provides access to Google Search, equipping
applications and agents with real-time information.
 OpenWeatherMap: fetches weather information.
 Wikipedia: provides efficient access to information from Wikipedia
articles.
LangSmith
LangSmith provides tools to monitor, evaluate and debug
applications, including the ability to automatically trace all model calls to
spot errors and test performance under different model configurations.
This visibility aims to empower more robust, cost-efficient
applications.

What is chunking in AI?


Chunking is a cool technique used in Generative AI to handle large amounts of data
by breaking it down into smaller, more manageable pieces called "chunks".

You might also like