AI Security Reference Architectures
AI Security Reference Architectures
AI Security
Reference
Architectures
Secure design patterns &
practices for teams developing
LLM-powered applications
Executive Summary
This resource provides secure design patterns and practices for teams developing LLM-
powered applications. Each section is dedicated to a type of application. For each application
type, we outline the most significant risks and provide mitigation strategies.
Table of Contents
Overview 3
Overview 4
Technologies 5
Components 6
Security Considerations 6
Threats and Mitigations 9
Overview 12
Design Patterns 12
Technologies 13
Components 13
Security Considerations 14
Threats and Mitigations 17
Overview 20
Design Patterns 21
Technologies 21
Components 22
Security Considerations 23
Threats and Mitigations 25
2
Simple Chatbot
Design Patterns
Overview
AI chatbots may be the earliest use case of large language models (LLMs) adopted by
enterprises. Common applications include customer service and support, virtual helpdesks,
and lead generation. In fact, Gartner predicts that by 2027, chatbots will become the primary
customer service channel for roughly a quarter of organizations. Although a naive design is
simple to develop using third-party LLMs or API services, organizations should adopt secure
design principles to prevent their chatbots from misrepresenting their business, sharing
inaccurate or inappropriate information, or falling victim to intended or unintended abuse
by users.
• Purposeful abuse
• Inadvertent harms
• Biased responses
• Factual inconsistencies
• Incorrect recommendations (liability risk)
• Off-topic responses
This section covers issues that can arise in basic LLM-based chat applications and most types
of LLM chatbot applications. Subsequent sections provide secure design patterns for more
advanced use cases including RAG applications and LLM-powered agents such as coding
assistants.
3
Language Model chatbots are a type of conversational agent specifically designed to facilitate
dialogue and provide responses. These chatbots are embedded in various applications,
ranging from customer service interfaces to educational platforms. This section focuses on
the common design patterns of LLM chatbots, detailing the architecture, key technologies, and
functional components that underpin these systems.
Figure 1: Common threats to chatbots arise when there is (1) untrusted input; (2) a misaligned model (for example,
through fine-tuning); (3) prompt injection to override or extract the system prompt; and/or (4) unvalidated output.
Overview
Single-purpose chatbots
Single-purpose chatbots are designed to excel in specific domains or tasks, such as customer
support, booking systems, or FAQ automation. Examples include:
• Customer support chatbots for e-commerce
• Booking assistants for hotels and flights
• Educational tutors for specific subjects
4
Hybrid chatbots
Hybrid chatbots combine rule-based and AI-driven approaches to handle both predictable
and complex interactions effectively. Examples include:
• Retail chatbots that offer standard shopping assistance while handling complex customer
queries
• Health advisory chatbots that provide generic information and tailored medical consultations
Context-aware chatbots
Technologies
• https://github.com/run-llama/llama_index
• https://github.com/cpacker/MemGPT
• https://github.com/ollama/ollama
• https://github.com/neuml/txtai
• Prompt engineering
• Model fine-tuning, which can be done with tools such as:
5
Components
A typical LLM-based chatbot relies on the following components and capabilities to ensure that
it’s efficient and responsive:
• Management of the chatbot’s conversational memory, known as its context window
• Memory management
• Response caching
• Multi-modal processing (optional)
Supplemental components
Security Considerations
Summary
6
LLM alignment
In its simplest form, a chatbot consists of direct interaction with a large language model.
The LLM may either be self-hosted or a third-party service sufficiently aligned to perform its
designed task.
• Selection of a base model that responds politely and disengages gracefully when instigated
• System prompt design that scopes the model’s purpose and guides the model to perform
reasoning and planning to answer a request.
• (Optional) Fine-tuning the model for purposeful conversation (e.g., customer support, booking
assistant). It’s important to be aware that fine-tuning often degrades built-in alignment in the
base model. See “LLM tuning risks,” below.
Many common patterns used to improve the quality of an LLM’s responses may also expose
the chatbot to risks:
• Model fine-tuning: Fine-tuning can break an LLM’s built-in guardrails. To avoid such breakage,
validate the model after fine-tuning to measure susceptibility to safety and security failures,
and employ real-time protections. See our research, “Fine-Tuning LLMs Breaks Their Safety
and Security Alignment.”
• System instructions: Most chatbot applications rely on system instructions, also known
as system prompts, to guide the LLM to respond in a way that’s on-topic and aligned with
the values of the application. For chat applications, system instructions should concisely
address
• The goal of the application: specific task, e.g., “You are a helpful assistant
that only provides advice about dog care, and all questions should
be addressed in the context of a dog owner.” It’s best to provide positive
instructions instead of “what not to do” since language models follow instructions and are
geared towards action.
• Expectations for the application’s output format: ”Respond concisely and politely.”
System instructions are typically regarded as valuable and proprietary intellectual property, so
they’re intended to remain hidden from chatbot users. A successful attack can result in system
instructions being leaked or overridden. Address this by deploying your chatbot behind an AI
firewall or filter.
7
• Few-shot example interactions: Risks arise when using few-shot learning due to the relatively
small set of examples that will be used to train the model. To address this risk, curate your
examples carefully, and make sure that unauthorized actors can’t introduce examples that
might misalign your model.
• Many-shot tuning with large amounts of data: Many-shot tuning is an increasingly popular
technique in which many good examples are provided in the chatbot’s current conversational
memory, also known as its context window. Having access to these examples can help
improve the chatbot’s performance but is not a panacea against adversarial attacks. As such,
this approach should be combined with other safety techniques. Similarly, the examples
should be selected carefully to avoid bias and other issues.
One of the most important considerations when deploying a chatbot is its prompt. The prompt
controls how the chatbot will react in normal circumstances. As such, it is the first line of
defense against misuse (malicious or inadvertent). However, it is critical to understand that
prompts alone cannot defend against all forms of misuse. Nonetheless, there are helpful
design patterns that mitigate potential risks while also making your chatbot more useful.
Chatbots work best when given (1) a persona or role, (2) specific instructions, (3) some few-
shot examples, and (4) an output format:
• The persona is a detailed description of the chatbot’s role and how it should behave. In
most circumstances, this description should include information about the chatbot’s area of
expertise, the manner in which it should be helpful, and the topics that this persona would be
focused on.
• The specific instructions should contain additional coaching for the LLM, such as guidelines
for addressing specific topics and specific information to avoid.
• Safe, few-shot examples in the prompt help guide the chatbot to respond safely and in a way
that’s aligned with your organization’s values.
• Output format instructions near the end of the prompt are mainly intended to tell the chatbot
how to structure its responses, but you can also use them to provide safety rules and
reinforce guidelines about what types of information are acceptable in its responses.
It is critical to test and iterate on the prompt, especially under conditions of real-world usage.
As you test different system prompt designs, measure both the chatbot’s usefulness and its
security properties.
8
Threats and Mitigations for Chatbots
Figure 2: Threat mitigations for LLM-based chat applications. The LLM (2) should be validated for alignment,
safety, and security before selection and after fine-tuning. At runtime, requests (1) and responses (4) should
include real-time protection to prevent attempts to misuse/abuse the application, as well as any unsafe output.
Proper request filtering can (3) neutralize attempts at prompt injection and prompt extraction.
Chatbots are typically exposed to a large user population and must be protected from malicious
user actions.
• Injection and Jailbreak Attacks: Malicious users could input specially crafted messages
attempting to manipulate the chatbot’s responses or extract sensitive data.
• Mitigations:
• Implement input validation and sanitization to detect and block harmful inputs before they
can affect the chatbot’s operation.
• Implement output filtering to prevent harmful or potentially malicious LLM responses
from being returned to the user
• Implement strong system prompt practices to strictly scope the application to its
intended use case.
9
• Implement additional off-topic enforcement to ensure the application stays within scope.
• Limit the attack surface that can accept external prompt sources
• Denial of Service (DoS): A user or bot could overwhelm the chatbot with a flood of messages,
causing it to become unresponsive.
• Mitigations:
• Employ rate limiting on incoming messages to manage and mitigate excessive traffic
• Implement auto-scaling and resource allocation strategies to ensure the system can
handle spikes in demand without degradation of service.
• Implement input validation to detect and block adversarial attacks that lead to denial of
service.
LLMs are trained on a large amount of data. This data must be kept in alignment with the
organization’s ethics, and the LLM must not divulge data that’s inappropriate for a user.
• Model Poisoning: The chatbot could be trained on malicious input over time, leading it to
make incorrect or offensive statements.
• Mitigations:
• Exfiltration from ML Application: If the LLM has been trained on sensitive data, users might
manipulate the LLM to reveal that data in its responses.
• Mitigations:
• Use output filtering and moderation to prevent sensitive data from being included in the
chatbot’s responses.
• Ensure strict access controls and authentication mechanisms are in place to safeguard
sensitive information.
• Implement the principle of least privilege to limit the availability of sensitive data
10
User interaction threats
If an LLM has the ability to store information from its conversations, this information must
be protected.
• Mitigations:
• Mitigations:
• Use redundancy and backup strategies to maintain the integrity and availability of data.
• Implement data validation mechanisms to detect and correct any corruption
• or unauthorized modifications.
11
RAG Application
Design Patterns
Overview
Retrieval-Augmented Generation (RAG) with Language Models (LLMs) represents an advanced
approach in conversational AI. It enhances the LLM’s capacity by integrating external data
retrieval into the response generation process.
Figure 3: Common threats to RAG applications arise when there is (1) untrusted input; (2) a misaligned model
(for example, through fine-tuning); (3) indirect prompt injection through untrusted documents; and/or
(4) unvalidated output.
Design Patterns
This section focuses on the most common design pattern for a RAG application: a vector
database-augmented LLM. This design incorporates a vector database to retrieve contextually
relevant information during conversations by converting text into vector embeddings. Examples
that follow this pattern include customer service bots that retrieve product details or customer
history to provide tailored assistance and health bots that access medical research databases
to deliver up-to-date information.
12
Technologies
The key technologies in RAG application design include:
• LLM models, augmented with function-calling support when needed. Function calling allows
an LLM to choose when to return a call for a given function (from one or more). By using
it and making the LLM aware of which types of data your data store holds, the LLM can
intelligently decide when to query that data store for context to answer a query. Function
calling preserves the ability to use the model in a non-RAG mode. Without it, the application
must send every user input as a query to the data store first and then to the LLM.
• Foundational development frameworks. For example:
• https://github.com/run-llama/llama_index
• https://github.com/cpacker/MemGPT
• https://github.com/ollama/ollama
• https://github.com/neuml/txtai
• Prompt engineering
• Embedding model
Components
Data ingestion
In order to ingest data, embed it, and index it in a vector database, the source documents must
be parsed to plaintext, chunked, and embedded, and metadata must be added.
• Transforms user queries into vector representations that can be used to search the vector
database or traverse the knowledge graph.
• Advanced strategies may re-write queries to optimize search results.
13
Data retrieval
• Retrieves the relevant data snippets based on query vectors or graph paths which are then
used to inform the generation process.
• Data retrieval may be combined with structured filtering alongside structured or semi-
structured queries, where metadata information or other data is specified.
Data retrieval
• At this stage, the LLM uses the data returned from the retrieval stage to generate an answer
to the user’s original query.
• Relies on prompt engineering
Supplemental components
• Response caching
• Guardrails
• Memory management
• Context window management
Security Considerations
Summary
14
• Logging and monitoring
• Implement secure protocols HTTPS, SSL/TLS
• Authentication and access control
• Logging and monitoring
• Communication over secure protocols HTTPS and SSL/TLS
• Authentication and access control
Summa
The system prompt for a RAG application can vary widely, depending on the system’s use case.
For example, when used for product search, the RAG system acts as a specialized or enhanced
search engine that uses a query to retrieve information about specific product listings in a RAG
database–in which case a conversational history may be unimportant. Conversely, for an IT
help desk, the conversational history is extremely important for the LLM to provide a helpful
response.
For all types of RAG applications, a properly designed system prompt enhances safety and
security. When designing the system prompt for a RAG application, consider the safety
mitigations listed below.
1. Limit the scope of the LLM’s responses so that they fall within the scope of the LLM’s
retrieved documents. Instruct the LLM about the appropriate use case for the application.
For example, the system prompt might state:
“Answer the user QUESTION using only the CONTEXT documents provided
below. If the answer is not contained within the CONTEXT, say “I do
not have enough information in the provided context to answer.” Do not
try to answer the question using information outside of the CONTEXT.
CONTEXT
-------
{documents}
QUESTION
--------
{question}”
15
2. Apply factual consistency guardrails. Since the vector DB may recall information
not specifically useful to a user’s question. Indicate to the LLM that “Some of the
retrieved items below may not be as useful as others, or may contain
incomplete snippets of the full information.” If there is no relevant
information in the snippets retrieved from the vector database, the LLM may predict based
on pre-trained data not relevant to the task (c.f., “How faithful are RAG models? Quantifying
the tug-of-war and LLMs’ internal prior”). To help guard against this, instruct the LLM
“if the answer isn’t contained below, respond that you were unable to
answer the question based on the information in our systems.”
Note that many LLMs exhibit a recency bias (e.g., GPT-3.5-turbo). This can
affect any RAG application that requires long snippets of retrieved content
to be placed in its context window where they act as factual consistency
guardrails. In such cases, these long snippets may be better appended to the
end of each retrieval rather than being placed in the system prompt.
3. Avoid leaking your context documents: Add rules to prevent the LLM from simply dumping
all the information from your context database. For example, you might add, “When
retrieving information from the database, don’t just show the whole document. Instead, give
the most helpful information and some details about where you found it and how it fits in.”
4. Separate the instructions from the data. (Note: This mitigation is not needed if your context
documents originate only from trusted sources!) Control against indirect prompt injection
(similar to SQL injection) attacks that might originate in unsafe data retrieved from the
context database. You can help differentiate instructions from data by instructing the LLM
that retrieved contextual information will be delimited in some way– a technique known
as spotlighting–and that any instructions inside the delimiters should not be followed (c.f.
“Defending Against Prompt Injection Attacks With Spotlighting”). An example using token
delimiters provided by the paper authors requires that the response synthesis step only
requires an additional step, in Python, retrieved_context.replace(‘ ‘,’^’) and the
additional instructions:
InˆthisˆmannerˆCosetteˆtraversedˆthe…
16
Using a secure system prompt as explained above is a good first step toward security. To
more fully secure your application, implement the mitigations listed in the next section.
Figure 4: Threat mitigations for RAG applications: Before selecting and after fine-tuning the LLM, AI validation
should be applied to check for alignment, safety, and security. At runtime, requests and responses should be
subject to real-time AI protection to prevent attempts to misuse/abuse the application and to flag unsafe output.
Document scanning must be in place to prevent indirect prompt injection attempts.
• Data Integrity Attacks: Raw data could be tampered with, resulting in the corruption of the
data source.
• Mitigations:
• Data Poisoning: The system might ingest malicious data during the information extraction
process, leading to compromised outputs.
• Mitigations:
• Utilize data filtering on input to identify and filter out malicious inputs before they enter
17
the data processing pipeline.
• Regularly update the system to recognize new types of adversarial inputs.
• Data Leakage: Sensitive data could be inadvertently included in the dataset, leading to privacy
breaches.
• Mitigations:
• Employ data anonymization techniques and strict privacy controls during the data
ingestion and processing phases.
• Ensure that all personal identifiers are removed or obfuscated.
• Unauthorized Access: An attacker could gain access to the vector database, allowing them to
retrieve or alter vector representations.
• Data Exfiltration: An attacker with access to the database could steal sensitive data.
• Mitigations:
• Deploy network segmentation and monitoring to detect unusual access patterns or data
movements.
• Use end-to-end encryption to secure data in transit.
• Mitigations:
• Sanitize all input data used in database queries to prevent database injection attacks and
other query manipulation techniques.
• Implement parameterized queries.
• Man-in-the-Middle (MitM) Attacks: An attacker could intercept the query or the data in transit
to alter or eavesdrop on the information.
18
• Mitigation:
• Use TLS/SSL for all data transmissions to encrypt the data in transit and verify the
authenticity of the communicating parties.
• Mitigations:
• Filter all user input into the application to detect and block attacks and adversarial
techniques.
• Verify the integrity of responses from the LLM before they are sent to the end-user.
• Implement consistency checks to ensure responses are logical and aligned with expected
outcomes.
LLM threats
• Model Tampering: The model could be tampered with to produce biased or predetermined
responses.
• Mitigation:
• Secure the model storage and deployment environments, and use checksums to ensure
that the model files have not been altered.
• Regularly audit model behavior and update mechanisms to detect and correct biases.
• Adversarial Attacks: The LLM could be fed with crafted inputs to trigger inappropriate or
nonsensical responses.
• Mitigation:
19
Response threats
• Information Disclosure: The system might inadvertently reveal private information in its
responses.
• Mitigation:
• Implement strict data governance policies that control the use of sensitive information
within the LLM’s responses.
• Utilise output filtering to detect and prevent the inclusion of sensitive data.
• Response Alteration: If an attacker can intercept the response, they could modify it before it
reaches the end-user.
• Mitigation:
• Employ message authentication codes (MACs) or digital signatures to ensure the integrity
and authenticity of the responses delivered to users.
LLM Agents (referred to as Agents from now on) are LLM-based applications that can
autonomously execute complex tasks by planning the steps needed to achieve a goal and then
optionally using external tools and/or performing in-context reasoning to perform those steps
to achieve the goal. The user types a prompt or provides input stating the initial goal they want
to achieve, and from there the LLM acts as the “brain” that plans, orchestrates, and performs
the tasks.
20
Design Patterns
Individual purpose-built agents
Figure 5: Common threats to LLM-based agent applications arise when there is (1) untrusted user input; (2)
a misaligned model; (3) privilege execution in tools; (4) untrusted or unvetted tool response; and/or
(5) unvalidated output.
Technologies
• LLM models
• https://github.com/langchain-ai/langchain
21
• https://github.com/run-llama/llama_index
• https://github.com/deepset-ai/haystack
• https://github.com/langgenius/dify
• https://github.com/joaomdmoura/crewAI
• https://github.com/cpacker/MemGPT
• https://github.com/Significant-Gravitas/AutoGPT
• Vector databases
• Prompt engineering
• Human-in-the-loop augmentations
• Custom tools
Components
Planning
Memory
22
Tool use
• Ability to use connected tools in an autonomous fashion
• (Optional) Ability to build and use custom tools
Supplemental components
Security Considerations
Summary
• Input validation
• Output filtering
• Agent to user
• Agent to agent
23
• Transparency into agent steps
• Audit logging
• Human in the loop
The technology and best practices for deploying AI agents are rapidly evolving. As the
technology evolves, so should prompt design for agents. Nonetheless, there are several high
level guidelines that should be followed, which are similar to the design guidelines for chatbots.
At a high level, there are multiple prompts for an agent by the system deployer: the system
prompt and the agentic prompt(s).
Agentic prompts
“Agentic prompt” is the term we use for the set of prompts used to plan, reason, and internally
respond to tools. For example, ReAct and Reflexion are two common agentic prompt
approaches that are widely used.
System prompt
The system prompt should follow similar patterns to the chatbot prompt we discussed in
System prompt design for chatbots, above. The system prompt should include (1) a persona,
(2) specific instructions, and (3) examples.
An important differentiator between prompts for chatbots and those for agents relates to the
use of tools. Tools can enable agents to perform tasks that chatbots cannot, but they can
also be costly (e.g., they may call APIs) and more dangerous (e.g., they run the risk of leaking
data). Thus, in the system prompt for an agent-based application, the persona and the specific
instructions should be tailored toward ensuring that tools are used only in a safe manner.
Likewise, the agentic prompts can be modified beyond their base versions to encourage safety
(e.g., to regard all external content as untrusted).
24
Threats and Mitigations for Agent-Based Designs
Figure 6: Threat mitigations for LLM-based agent applications: Before selecting and after fine-tuning the LLM,
AI validation should be applied to check for alignment, safety, and security. At runtime, requests and responses
should be subject to real-time AI protection to prevent attempts to misuse/abuse the application and to flag unsafe
output. Tool input/output protection must be in place to prevent improper tool use.
Tool threats
• Unauthorized access: If an attacker gains access to the tools, they could use them for
malicious purposes.
• Mitigation:
• Implement strict access controls and authentication to ensure only authorized users can
access connected tools.
• Take care to ensure any API tokens or other credentials used for authentication are
secured on the backend and not made available to the LLM itself.
• If possible, isolate tools to help prevent exploitation from affecting other, networked
systems. (i.e., Docker container with limited privileges)
• Malicious tool execution: An attacker could trick the agent into executing harmful operations
by exploiting vulnerabilities in the tools.
• Mitigation:
25
• Ensure connected tools are strictly scoped to have least-privilege access.
• Implement authentication for connected tools on a per-user and per-agent basis to help
prevent and track potential abuse.
• Regularly update and patch connected tools to fix known vulnerabilities.
• Data leakage: Tools like the calendar or search function could inadvertently leak sensitive
information.
• Mitigation:
Agent threats
• Input manipulation: An attacker could provide crafted input that leads the agent to perform
unintended actions.
• Mitigation:
• Implement input filtering to detect and block malicious user inputs before they can be
processed by the agent.
• Implement logging and monitoring of user inputs, related agent steps, and downstream
actions to identify abuse.
• Model tampering: The agent’s underlying model could be tampered with to alter its behavior
or decisions.
• Mitigation:
• Secure the model storage and deployment environments, and use checksums to ensure
that the model files have not been altered.
• Regularly audit model behavior and update mechanisms to detect and correct biases,
vulnerabilities, etc.
• Data exfiltration: The agent could be exploited to extract sensitive data from the system.
• Mitigation:
26
• Implement input filtering to identify and block attacks that attempt to reveal or exfiltrate
sensitive data.
• Enforce authentication and authorization prior to accessing sensitive data
• Use output filtering and moderation to prevent sensitive data from being included in the
chatbot’s responses.
Planning threats
• Incorrect planning logic: Flaws in the planning logic could be exploited to cause undesired
actions.
• Mitigation:
• Implement input filtering to detect attacks that attempt to inject logic steps into the
agent’s context.
• Implement rigorous testing and validation processes to ensure the planning logic is
robust and free from exploitable flaws.
• Implement additional reflection within the agent (or through a 2nd LLM) before taking an
action to ensure that (1) all of the steps in the plan were generated from an authorized
source; (2) the logic is consistent with the intended use case; and (3) the logic will not
result in abuse.
• Manipulation of planning data: If the data used by the planning component is compromised,
the actions taken by the agent could be harmful.
• Mitigation:
• Use data integrity checks and version control to ensure the data used in planning is
accurate and has not been tampered with.
• Scan data sources for adversarial attacks and other abuses either 1) prior to entering the
data store 2) prior to entering the LLM agent pipeline
Memory threats
• Mitigation:
27
• Implement input filtering to scan user input prior to memory creation for adversarial
attacks and other abuses.
• Implement encryption and access control mechanisms for memory storage.
• Regularly back up and validate memory data to detect and repair any tampering.
• Mitigation:
• Use data anonymization and strict access controls to prevent unauthorized access to
sensitive data stored in memory.
• Implement output filtering to prevent sensitive data from being returned to the application
or user.
Reasoning threats
• Mitigations:
• Implement input filtering to detect attacks that attempt to inject logic steps into the
agent’s context.
• Implement rigorous testing and validation processes to ensure the logic components are
robust and free from exploitable flaws.
• Implement additional reflection within the agent before the agent takes any action to
ensure all of the steps in the plan were generated from the proper source.
Action threats
• Action Manipulation: If an attacker gains control of the action component, they might be able
to command the agent to perform malicious tasks.
• Mitigation:
28
• Implement input filtering to detect and prevent malicious inputs targeting the agent’s
actions.
• The actions should always be generated by the LLM itself and not directly from user input.
• Implement monitoring of downstream Actions to ensure usage is consistent with
expected behavior.
• Implement Human In The Loop for actions that require elevated privileges or access
sensitive systems, data, or other resources.
• Enforce the principle of least privilege for all connected systems/services to ensure
minimal impact from action manipulations
• Elevation of Privilege: The agent might perform actions that require higher privileges without
appropriate checks.
• Mitigation:
29
How Robust Intelligence
Can Help
Robust Intelligence protects enterprises from AI security and safety vulnerabilities using an
automated approach to assess and mitigate threats. The Robust Intelligence platform consists
of two complementary components, which can be used independently but are best when
paired together.
• Our AI Validation platform performs a comprehensive assessment of security and safety
vulnerabilities so you can understand risks and protect against them. Our process uses
algorithmic AI red teaming, which sends thousands of inputs to a model and automatically
analyzes the susceptibility of the outputs across hundreds of attack techniques and threat
categories using proprietary AI. It then recommends the necessary guardrails required to
deploy safely in production, enforced by our AI Protection layer – the Robust Intelligence AI
Firewall.
• The AI Firewall is a model-agnostic, external guardrail that secures LLM-powered applications
from prompt injection, PII extraction, and other harmful actions and content. Informed by our
proprietary threat intelligence, it validates model inputs and outputs in real time and can be
configured to block or modify undesired responses.
Robust Intelligence is headquartered in San Francisco and trusted by industry leaders, including
JPMorgan Chase, Expedia, IBM, Deloitte, PwC, and the U.S. Department of Defense.
30
[email protected]
www.robustintelligence.com