Generative AI
Generative AI
Key Pointers
Generative models include VAEs, GANs, autoregressive models, and
transformers.
GANs use two neural networks to create realistic data.
VAEs encode and decode data for tasks like anomaly detection.
Autoregressive models generate sequences of data.
Transformers, with an encoder-decoder structure, are used for language translation
and text generation.
Applications include DeepArt, MuseNet, GPT, deepfake technology, AI Dungeon,
and NVIDIA GauGAN.
ChatGPT enables human-like conversation Google Bard assists with queries and
content generation.
DALL-E creates images from text prompts.
Benefits include enhancing creativity, aiding research, and providing personalized
content.
Limitations include dependency on data quality, potential biases, and high
computational requirements.
Ethical concerns involve misinformation, privacy violations, and misuse in creating
deepfakes.
Generative AI impacts various industries, posing risks to security, privacy, and
employment.
Common uses are in content generation, language translation, chatbots, data
augmentation, and medical imagin
In simple words, It generally involves training AI models to understand different
patterns and structures within existing data and using that to generate new original
data.
The two networks are trained together in a process in which the Generator
attempts to fool the Discriminator into thinking the generated data is real,
while the Discriminator attempts to accurately detect whether the data is
real or fake.
This process is repeated until the Generator becomes so good at
providing realistic data that the Discriminator can no longer distinguish
difference.
GANs can be used for image synthesis, style transfer, data augmentation,
and other tasks.
4. Transformers
What is AI?
Artificial intelligence or AI, the broadest term of the three, is used to
classify machines that mimic human intelligence and human cognitive
functions like problem-solving and learning.
AI uses predictions and automation to optimize and solve complex tasks
that humans have historically done, such as facial and speech recognition,
decision-making and translation.
What is machine learning?
Machine learning is a subset of AI that allows for optimization. When set
up correctly, it helps you make predictions that minimize the errors that
arise from merely guessing. For example, companies like Amazon use
machine learning to recommend products to a specific customer based on
what they’ve looked at and bought before.
Classic or “nondeep” machine learning depends on human intervention to allow a computer
system to identify patterns, learn, perform specific tasks and provide accurate results.
Human experts determine the hierarchy of features to understand the differences between
data inputs, usually requiring more structured data to learn.
While the subset of AI called deep machine learning can leverage labeled data sets to inform
its algorithm in supervised learning, it doesn’t necessarily require a labeled data set. It can
ingest unstructured data in its raw form (for example, text, images), and it can automatically
determine the set of features that distinguish “pizza,” “burger” and “taco” from one another
Deep learning plays an essential role as a separate branch within the Artificial
Intelligence (AI) field due to its unique capabilities and advancements. Deep
eliminates manual feature engineering. DL can handle complex tasks and large-
recognition.
Most deep neural networks are feed-forward, meaning they only flow in
one direction from input to output. However, you can also train your
model through backpropagation, meaning moving in the opposite
direction, from output to input. Backpropagation allows us to calculate
and attribute the error that is associated with each neuron, allowing us to
adjust and fit the algorithm appropriately.
Stable AI models are regularly put into production and used commercially
around the world. For example, Microsoft's existing Azure AI services have
been handling the needs of businesses for many years to date.
Expand table
Capability Examples
Generating natural Such as: summarizing complex text for different reading levels, suggesting alternative
language
wording for sentences, and much more
Generating code Such as: translating code from one programming language into another, identifying and
troubleshootingbugsincode, and much more
Generating images Such as: generating images for publications from text descriptions and much more
Azure AI services are tools for solving AI workloads. The services you
choose to use depend on what you need to accomplish.
Playgrounds
In the Azure OpenAI Studio, you can experiment with OpenAI models in
playgrounds. In the Completions playground, you can type in prompts,
configure parameters, and see responses without having to code.
In the Chat playground, you can use the assistant setup to instruct the
model about how it should behave. The assistant will try to mimic the
responses you include in tone, rules, and format you've defined in your
system message.
These tokens are mapped into vectors for a machine learning model to
use for training. When a trained natural language model takes in a user's
input, it also breaks down the input into tokens.
GPT tries to infer, or guess, the context of the user's question based on
the prompt.
How models are applied to new use cases
You may have tried out ChatGPT's predictive capabilities in a chat portal,
where you can type prompts and receive automated responses.
The portal consists of the front-end user interface (UI) users see, and a
back-end that includes a generative AI model.
The combination of the front and back end can be described as a chatbot.
The OpenAI GPT models are proficient in over a dozen languages, such as
C#, JavaScript, Perl, PHP, and is most capable in Python.
GPT models have been trained on both natural language and billions of
lines of code from public repositories. The models are able to generate
code from natural language instructions such as code comments, and can
suggest ways to complete code functions.
For example, given the prompt "Write a for loop counting from 1 to 10 in
Python," the following answer is provided:
for i in range(1,11):
print(i)
GPT models can help developers code faster, understand new coding
languages, and focus on solving bigger problems in their application
GitHub Copilot
OpenAI partnered with GitHub to create GitHub Copilot, which they call
an AI pair programmer. GitHub Copilot integrates the power of OpenAI
Codex into a plugin for developer environments like Visual Studio Code.
Once the plugin is installed and enabled, you can start writing your code,
and GitHub Copilot starts automatically suggesting the remainder of the
function based on code comments or the function name. For example, we
have only a function name in the file, and the gray text is automatically
suggested to complete it.
DALL-E:
In addition to natural language capabilities, generative AI models can edit
and create images. The model that works with images is called DALL-E.
Much like GPT models, subsequent versions of DALL-E are appended onto
the name, such as DALL-E 2. Image capabilities generally fall into the
three categories of image creation, editing an image, and creating
variations of an image.
Large Language Model (LLM)
A large language model is a type of artificial intelligence algorithm that
applies neural network techniques with lots of parameters to process and
understand human languages or text using self-supervised learning
techniques.
Tasks like text generation, machine translation, summary writing, image
generation from texts, machine coding, chat-bots, or Conversational AI
are applications of the Large Language Model.
Examples of such LLM models are Chat GPT by open AI, BERT
(Bidirectional Encoder Representations from Transformers) by Google,
etc.
LLM is purely based on the deep learning methodologies.
LLM (Large language model) models are highly efficient in capturing the
complex entity relationships in the text at hand and can generate the text
using the semantic and syntactic of that particular language in which we
wish to do so.
LLM Models
If we talk about the size of the advancements in the GPT (Generative Pre-
trained Transformer) model only then:
LangChain is a framework designed to simplify the creation of applications using large language
models
LangChain provides tools and abstractions to improve the customization, accuracy, and
relevancy of the information the models generate. For example, developers can use
LangChain components to build new prompt chains or customize existing templates.
LangChain is a versatile Python library that empowers developers and researchers to create,
experiment with, and analyze language models and agent.
What is LangChain?
LangChain is an open source framework for the
development of applications using large language
models (LLMs). Available in both Python- and
Javascript-based libraries, LangChain’s tools and
APIs simplify the process of building LLM-driven
applications like chatbots and virtual agents.
LangChain serves as a interface for nearly any LLM, providing a centralized development
environment to build LLM applications and integrate them with external data sources and
software workflows.
LangChain’s module-based approach allows developers and data scientists to dynamically
compare different prompts and even different foundation models with minimal need to
rewrite code.
This modular environment also allows for programs that use multiple LLMs: for example, an
application that uses one LLM to interpret user queries and another LLM to author a
response.
Prompt templates
Prompts are the instructions given to an LLM.
A prompt template can thus contain and reproduce context, instructions (like “do not
use technical terms”), a set of examples to guide its responses (in what is called
“few-shot prompting”), a specified output format or a standardized question to be
answered.
Chains
As its name implies, chains are the core of LangChain’s workflows. They
combine LLMs with other components, creating applications by executing
a sequence of functions.
The most basic chain is LLMChain. It simply calls a model and prompt template for
that model.
For example, imagine you saved a prompt as “ExamplePrompt” and wanted to run it
against Flan-T5. You can import LLMChain from langchain.chains, then
define chain_example = LLMChain(llm = flan-t5, prompt = ExamplePrompt). To run
the chain for a given input, you simply call chain_example.run(“input”).
To use the output of one function as the input for the next function, you can
use SimpleSequentialChain. Each function could utilize different prompts, different
tools, different parameters or even different models, depending on your specific
needs
Indexes
To achieve certain tasks, LLMs will need access to specific external data
sources not included in its training dataset, such as internal documents,
emails or datasets. LangChain collectively refers to such external
documentation as “indexes”.
Document loaders
LangChain offers a wide variety of document loaders for third party
applications (link resides outside ibm.com). This allows for easy importation of
data from sources like file storage services (like Dropbox, Google Drive and
Microsoft OneDrive), web content (like YouTube, PubMed or specific URLs),
collaboration tools (like Airtable, Trello, Figma and Notion), databases (like Pandas,
MongoDB and Microsoft), among many others.
Vector databases
vector databases represent data points by converting them into vector
embeddings: numerical representations in the form of vectors with a fixed number
of dimensions, often clustering related data points using unsupervised learning
methods. Vector embeddings also store each vector’s metadata, further
enhancing search possibilities.
Text splitters
To increase speed and reduce computational demands, it’s often wise to split large
text documents into smaller pieces. LangChain’s TextSplitters split text up into
small, semantically meaningful chunks that can then be combined using
methods and parameters of your choosing.
Retrieval
Once external sources of knowledge have been connected, the model must be able
to quickly retrieve and integrate relevant information as needed.
LangChain offers retrieval augmented generation
(RAG): its retriever modules accept a string query as an input and return a
list of Document’s as output.
Memory
LLMs, by default, do not have any long-term memory of prior
conversations.
LangChain solves this problem with simple utilities for adding memory
to a system, with options ranging from retaining the entirety of all
conversations to retaining a summarization of the conversation thus far to
retaining the n most recent exchanges.
Agents
LangChain agents can use a given language model as a “reasoning
engine” to determine which actions to take. When building a chain
for an agent, inputs include:
LangChain tools
are a set of functions that empower LangChain agents to interact with real-world
information in order to expand or improve the services it can provide.
Examples of prominent LangChain tools include:
Wolfram Alpha: provides access to powerful computational and
data visualization functions, enabling sophisticated mathematical
capabilities.
Google Search: provides access to Google Search, equipping
applications and agents with real-time information.
OpenWeatherMap: fetches weather information.
Wikipedia: provides efficient access to information from Wikipedia
articles.
LangSmith
LangSmith provides tools to monitor, evaluate and debug
applications, including the ability to automatically trace all model calls to
spot errors and test performance under different model configurations.
This visibility aims to empower more robust, cost-efficient
applications.