0% found this document useful (0 votes)
396 views55 pages

AI Threats Landscape and Cybersecurity

The AI Threat Landscape Report 2025 outlines the evolving risks associated with AI technologies, emphasizing that the greatest threats stem from human exploitation rather than the technology itself. It highlights a significant increase in AI-related security breaches and the need for improved security measures, as organizations recognize AI's critical role in business success. The report also discusses advancements in AI security, governance frameworks, and the importance of collaboration among developers, data scientists, and security professionals to mitigate these risks.

Uploaded by

Rahul Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
396 views55 pages

AI Threats Landscape and Cybersecurity

The AI Threat Landscape Report 2025 outlines the evolving risks associated with AI technologies, emphasizing that the greatest threats stem from human exploitation rather than the technology itself. It highlights a significant increase in AI-related security breaches and the need for improved security measures, as organizations recognize AI's critical role in business success. The report also discusses advancements in AI security, governance frameworks, and the importance of collaboration among developers, data scientists, and security professionals to mitigate these risks.

Uploaded by

Rahul Pawan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

A I TH R E AT 2025

L A ND S C A P E
RE PO RT

NAVIGATING THE
RISE OF AI RISKS
AI THREAT LANDSCAPE 2025
2024

Table of Contents
Foreword 03

Security for AI Survey Insights at a Glance 04

AI Threat Landscape Timeline 08

What’s New in AI 10

Part 1: Risks Related to the Use of AI 13


Cybercrime 13
Political Campaigns 16
Unintended Consquences 16

Part 2: Risks Faced by AI-based Systems 19


Adversarial Machine Learning Attacks 20
Attacks Against Generative AI 25
Supply Chain Security 30

Part 3: Advancements in Security for AI 39


AI Red Teaming Evolution 39
Updates to Existing Defensive Frameworks 40
New Security Initiatives 42
New Guidance & Legislation 43

Part 4: Predictions and Recommendations 45

Resources 49

About HiddenLayer 51

1
AI THREAT LANDSCAPE 2025
2024

Foreword
Artificial intelligence is no longer an emerging force – it is an embedded reality shaping economies,
industries, and societies at an unparalleled scale. Every mission, organization, and individual has felt its
impact, with AI driving efficiency, automation, and problem-solving breakthroughs. Yet, as its influence
expands, so too do the risks. The past year has emphasized a critical truth: the greatest threat to AI is not
the technology itself but the people who exploit it.

The AI landscape is evolving rapidly, with open-source models and smaller, more accessible architectures
accelerating innovation and risk. These advancements lower the barrier to entry, allowing more
organizations to leverage AI but they also widen the attack surface, making AI systems more susceptible
to manipulation, data poisoning, and adversarial exploitation. Meanwhile, hyped new model trends like
DeepSeek are introducing unprecedented risks and impacting geopolitical power dynamics.

Artificial intelligence remains the most vulnerable technology ever deployed at scale. Its security
challenges extend far beyond code, impacting every phase of its lifecycle from training and development
to deployment and real-world operations. Adversarial AI threats are evolving, blending traditional
cybersecurity tactics with new, AI-specific attack methods.

In this report, we explore the vulnerabilities introduced by these developments and their real-world
consequences for commercial and federal sectors. We provide insights from IT security and data science
leaders actively defending against these threats, along with predictions informed by HiddenLayer’s
hands-on experience in AI security. Most importantly, we highlight the advancements in security controls
essential for protecting AI in all its forms.

As AI continues to drive progress, securing its future is a responsibility shared by developers, data scien-
tists, and security professionals alike. This report is a crucial resource for understanding and mitigating AI
risks in a rapidly shifting landscape.

We are proud to present the second annual HiddenLayer AI Threat Landscape Report, expanding on last
year’s insights and charting the path forward for securing AI.

TITO CEO & Co-Founder


(Unassisted by LLMs)

3
Security for AI Survey
Insights at a Glance
AI has become indispensable to modern business, powering critical functions and driving innovation. However, as
organizations increasingly rely on AI, traditional security measures have struggled to keep up with the growing
sophistication of threats.

The 2025 survey results highlight this tension: while many AI’s Critical Role in
IT leaders recognize AI’s central role in their company’s
success, there’s more work to implement comprehensive Business Success
security measures. Issues like shadow AI, ownership
debates, and limited security tool adoption contribute to
the challenges. However, the survey results show an of IT leaders reported

89%
optimistic shift toward prioritizing AI security, with that most or all AI
organizations investing more in defenses, governance
models in production are
critical to their
frameworks, transparency, and resources to address
business’s success.
emerging threats.

These insights come from a survey commissioned by


HiddenLayer, where 250 IT decision-makers from a
cross-section of industries shared insights into their 100%
organizations’ AI security practices. These leaders,
responsible for securing or developing AI initiatives,
stated that AI and ML projects are critical or
important to revenue generation within the
offer a glimpse into their current challenges and efforts
next 18 months (up from 98% last year).
to strengthen their organizations from attack.

4
AI THREAT LANDSCAPE 2025
2024

Rising Security Breaches Top 3 Motivations for AI Attacks


and Vulnerabilities Data Theft

Financial Gain
of IT leaders reported
Business Disruption
74%
to definitely know if
they had an AI breach
in 2024 (up from 67%
reporting last year). Disclosure &
Transparency of AI
75% Breaches
say AI attacks have increased or remained the of IT leaders strongly
same from the previous year. agree that companies

Sources & Motivations


42% should be legally required
to disclose AI-related
security breaches to the
public, but
of AI Attacks

reported being able to


45%
87% identify the source of
the breach (up from of companies have opted not to report an
AI-related security incident due to concerns
77% last year).
about public backlash.

Type of AI Systems Attacked from Identified


Breaches: Rising Security Breaches
Malware in Models and Vulnerabilities
45% Pulled from Public
Repositories

of IT leaders are concerned


33% Attack on Internal
or External Chatbot 88% about vulnerabilities in
third-party AI integrations.

21% Third-Party
Applications
Top 3 Third-Party Gen AI Applications
Currently In Use at Organizations:

Top 3 Sources of AI Attacks ChatGPT

Criminal Hacking Groups Microsoft Co-Pilot

Third-Party Service Providers Gemini

Freelance Hackers

5
AI THREAT LANDSCAPE 2025
2024

Global Origins of AI Attacks

34%
32%
51%

17%

21%

51% North America 21% South America 34% Europe 17% Africa 32% Asia 14% Unknown

Security Measures &


72% Technology Gaps in AI
of IT leaders acknowledged Shadow AI,
solutions that are not officially known or under Defense
the control of the IT department, is a
significant issue in their organization (up from
Top 3 Common Measures to Secure AI Include:
61% reported last year).
Building relationships with AI & security
of companies use pre-trained teams
models from repositories like
Hugging Face, Azure, and Creating an inventory of AI models

97% AWS (up from 85% last year),


but a little under half reported Determining sources of origins of AI models
scanning inbound AI models
for safety.

Rising Security Breaches Only 16% of IT leaders

and Vulnerabilities 16% reported securing AI


models with manual or
automated red teaming.
On average, IT leaders reported
spending almost half

46%
Only 32% of IT leaders are

32% deploying a technology


solution to address AI
of their time addressing AI risk or security (up threats.
from 15% of time reported last year).

6
AI THREAT LANDSCAPE 2025
2024

AI Governance
Frameworks & Policies Transparency & Ethical
Oversight
96%
of IT leaders have a

67%
of companies have a formal framework for dedicated ethics
securing AI and ML models. committee or person
overseeing AI ethics.

81%
of organizations have implemented an AI
governance committee. 98%
of organizations plan to make AI security
Top 3 Frameworks Used to Secure AI Include: practices partially transparent.

Google Secure AI Framework

IBM Framework for Securing Generative AI

Gartner AI Trust, Risk, and Security


Management
Investments in AI
Security for 2025
Debate Over AI Security
Roles & Responsibilities

99%
consider securing
AI a high priority in
have internal debate about 2025.

76% which teams should


control AI security
measures.

of IT leaders believe the AI of organizations


42% development team should be held
accountable for errors, whereas 95% have increased their
budgets for securing
AI in 2025.

27% believe the security


team should be held
responsible.

7
2024 AI Threat
Landscape Timeline
AI tech milestones Risks related to the use of AI
Release of new adversarial tools and
New AI security measures
techniques, disclosure of new
and legislation
vulnerabilities in ML tooling

Known attacks and breaches

JAN LeftoverLocals: Listening to LLM responses through leaked GPU local memory

FEB Researchers demonstrate an attack against the Hugging Face conversion bot

FEB Six critical vulnerabilities providing a full attack chain found in ClearML

FEB Path traversal and out-of-bound read vulnerabilities disclosed in ONNXserialization format

MAR First model-stealing technique that extracts precise information from LLMs

APR OpenSSF launches Model Signing Special Interest Group

APR Arbitrary code execution vulnerability disclosed in R

APR Arbitrary code execution and command injection vulnerabilities found in AWS Sagemaker

MAY OpenAI introduces GPT-4o

MAY Elaborate deepfake video scam attack against WPP

MAY LLM jailbreak backdoor published at ICLR conference

JUN Knowledge Return Oriented Prompting - new LLM prompt injection technique

JUN CTID launches the Secure AI research project

JUN Agility Robotics' Digit humanoid robot deployed in production at large factories

JUN Arbitrary code execution and XSS vulnerabilities found in Ydata-profiling

JUN Ten code execution vulnerabilities disclosed in MLFlow framework

8
AI THREAT LANDSCAPE 2025
2024

JUL Coalition for Secure AI established under the OASIS global standards body

JUL NIST expands its AIRMF with the Generative Artificial Intelligence Profile

JUL Deepfake clip of Kamala Harris shared by Elon Musk on X

JUL Critical vulnerability in Wyze camera enables researchers to bypass the embedded
AI's object detection

AUG EU Artificial Intelligence Act enacted into force

AUG New GPU Memory Exploitation techniques unveiled at USENIX

AUG Two arbitrary code execution vulnerabilities found in LlamaIndex

SEP U.S., UK, and EU sign the Council of Europe’s Framework Convention on AI

SEP Microsoft shuts down first cybercriminal service providing users with access
to jailbroken GenAI

SEP Ten arbitrary code execution vulnerabilities and one critical WebUI vulnerability
disclosed in MindsDB

SEP High severity vulnerabilities found in Autolabel, Cleanlab, and Guardrails

SEP Wiz finds critical NVIDIA AI vulnerability in containers using NVIDIA GPUs

OCT ShadowLogic graph backdoor unveiled by HiddenLayer

OCT First attack technique against GenAI watermarks unveiled by HiddenLayer

OCT OMB releases the Advancing the Responsible Acquisition of AI in Govt

OCT President Biden signs first-ever National Security Memorandum on AI

OCT Apple Intelligence release in the US

OCT Arbitrary file write vulnerability found in NVIDIA NeMo

OCT Lawsuit filed against Character.ai states that AI companion chatbot to blame
for teenager’s suicide

NOV UK establishes the Laboratory for AI Security Research (LASR)

NOV First draft of the EU general-purpose AI Code of Practice published

NOV GEMA sues OpenAI for copyright infringement over use of song lyrics in AI training

DEC Major AI supply chain attack using dependency compromise affects Ultralytics

DEC Google introduces Gemini 2.0

DEC Apple Intelligence launch in the UK

DEC Arbitrary code execution while scanning keras HDF5 models found in Bosch AIShield

DEC Apple Intelligence found generating fake news attributed to the BBC

DEC TPUXtract - first model hyperparameter extraction framework

DEC Shadowcast - a new technique of stealthy data poisoning attacks against vision-language
models, presented at NeurIPS

9
What’s New in AI
The past year brought significant advancements in AI across multiple domains, including multimodal models,
retrieval-augmented generation (RAG), humanoid robotics, and agentic AI.

Multimodal Models Retrieval-Augmented


Multimodal models became popular with the launch of Generation
OpenAI’s GPT-4o. What makes a model “multimodal” is its
ability to create multimedia content (images, audio, and Another hot topic in AI is a technique called
video) in response to text- or audio-based prompts, or vice Retrieval-Augmented Generation (RAG). Although first
versa, respond with text or audio to multimedia content proposed in 2020, it has gained significant recognition in the
uploaded to a prompt. For example, a multimodal model can past year and is being rapidly implemented across
process and translate a photo of a foreign language menu. industries. RAG combines large language models (LLMs)
This capability makes it incredibly versatile and with external knowledge retrieval to produce accurate and
user-friendly. Equally, multimodality has seen advancement contextually relevant responses. By having access to a
toward facilitating real-time, natural conversations. trusted database containing the latest and most relevant
information not included in the static training data, an LLM
While GPT-4o might be one of the most used multimodal can produce more up-to-date responses less prone to
models, it's certainly not singular. Other well-known hallucinations. Moreover, using RAG facilitates the creation
multimodal models include KOSMOS and LLaVA from of highly tailored domain-specific queries and real-time
Microsoft, Gemini 2.0 from Google, Chameleon from Meta, adaptability.
and Claude 3 from Anthopic.

10
AI THREAT LANDSCAPE 2025
2024

In September 2024, we saw the release of Oracle


Agentic AI
Cloud Infrastructure GenAI Agents - a platform that
combines LLMs and RAG. In January 2025, a
service that helps to streamline the information Agentic AI is the natural next step in AI
retrieval process and feed it to an LLM, called development that will vastly enhance the way in
Vertex AI RAG Engine, was unveiled by Google. which we use and interact with AI.

Traditional AI bots heavily rely on pre-programmed rules

Humanoid Robots and, therefore, have limited scope for independent


decision-making. The goal of agentic AI is to construct
assistants that would be unprecedentedly autonomous,
The concept of humanoid machines can be traced as far
make decisions without human feedback, and perform
back as ancient mythologies of Greece, Egypt, and China.
tasks without requiring intervention. Unlike GenAI, whose
However, the technology to build a fully functional
main functionality is generating content in response to user
humanoid robot has not matured sufficiently - until now.
prompts, agentic assistants are focused on optimizing
Rapid advancements in natural language have expedited
specific goals and objectives - and do so independently. This
machines’ ability to perform a wide range of tasks while
can be achieved by assembling a complex network of
offering near-human interactions.
specialized models (“agents”), each with a particular role
and task, as well as access to memory and external tools.
Tesla's Optimus and Agility Robotics' Digit robot are at the
This technology has incredible promise across many
forefront of these advancements. Optimus unveiled its
sectors, from manufacturing to health to sales support and
second generation in December 2023, featuring significant
customer service, and is being trialed and tested for live
improvements over its predecessor, including faster
implementation.
movement, reduced weight, and sensor-embedded fingers.
Digit’s has a longer history, releasing and deploying its fifth
version in June 2024 for use at large manufacturing
factories. Google has been investing heavily over the past
year in the development of agentic models, and the
new version of their flagship generative AI, Gemini
2.0, is specially designed to help build AI agents.
Advancements in LLM technology are new driving
Moreover, OpenAI released a research preview of
factors for the field of robotics. In December 2023,
their first autonomous agentic AI tool called
researchers unveiled a humanoid robot called Alter3,
Operator. Operator is an agent able to perform a
which leverages GPT-4. Besides being used for
range of different tasks on the website
communication, the LLM enables the robot to
independently, and it can be used to automate
generate spontaneous movements based on
various browser related activities, such as placing
linguistic prompts. Thanks to this integration, Alter3
online orders and filling out online forms.
can perform actions like adopting specific poses or
sequences without explicit programming,
demonstrating the capability to recognize new
concepts without labeled examples. We’re already seeing Agentic AI turbocharged with the
integration of multimodal models into agentic robotics and
the concept of agentic RAG. Combining the advancements
of these technologies, the future of powerful and complex
autonomous solutions will soon transcend imagination into
reality.

11
AI THREAT LANDSCAPE 2025
2024

The Rise of Open-Weight As frontier-level open-weight models are likely to proliferate,


deploying such models should be done with utmost caution.
Models Models released by untrusted entities might contain
security flaws, biases, and hidden backdoors and should be
Open-weight models are models whose weights (i.e., the carefully evaluated prior to local deployment. People
output of the model training process) are made available to choosing to use hosted solutions should also be acutely
the broader public. This allows users to implement the aware of privacy issues concerning the prompts they send
model locally, adapt it, and fine-tune it without the to these models.
constraints of a proprietary model. Traditionally,
open-weight models were scoring lower against leading
proprietary models in AI performance benchmarking. This is
because training a large GenAI solution requires
tremendous computing power and is, therefore, incredibly
expensive. The biggest players on the market, who are able
to afford to train a high-quality GenAI, usually keep their
models ringfenced and only allow access to the inference
API. The recent release of an open-weight DeepSeek-R1
model might be on course to disrupt this trend.

In January 2025, a Chinese AI lab called DeepSeek


released several open-weight foundation models
that performed comparably in reasoning
performance to top close-weight models from
OpenAI. DeepSeek claims the cost of training the
models was only $6M, which is significantly lower
than average. Moreover, reviewing the pricing of
DeepSeek-R1 API against the popular OpenAI-o1 API
shows the DeepSeek model is approximately 27x
cheaper than o1 to operate, making it a very
tempting option for a cost-conscious developer.

DeepSeek models might look like a breakthrough in


AI training and deployment costs; however, upon a
closer look, these models are ridden with problems,
from insufficient safety guardrails, to insecure
loading, to embedded bias and data privacy
concerns.

12
PART 1

Risks Related to the Use of AI


Before we cover attacks against AI-based systems, let's do a quick overview of the issues related to the use of AI.
There are several areas of concern where malicious or improper use of AI can create trouble for individuals,
organizations, and societies alike. These include generating malicious, harmful, or illegal content (such as malware,
deepfakes, and disinformation), hallucinations and accuracy issues, privacy breaches, and broader societal and
ethical concerns.

KEY STAT
illicit tasks, from enhancing their phishing campaigns and
TIME SPENT ADDRESSING RISK financial scams to generating malicious code and
On average, IT leaders spend 46% of their time on automating attacks to spreading political misinformation.
AI addressing risk or security

PHISHING & SCAM

The Use of AI in Cybercrime Since its inception, one of the predominant


concerns surrounding generative AI abuse has
been its potential to improve phishing and scams,
AI is being rapidly adopted across all sectors, and the
making it almost impossible to distinguish from
cybercrime business is, unfortunately, no exception. In 2024,
legitimate content.
adversaries were found to be leveraging AI for a multitude of

13
AI THREAT LANDSCAPE 2025
2024

Deepfake scams can also happen outside of workplace


There are several factors at play here: settings and target different aspects of people’s personal
lives. One of these aspects is in dating.
Attackers can use AI to generate
high-quality text, meaning there are no
grammar mistakes or typos, which used to be
a tell-tale sign of phishing $650M was lost to romance
fraud in 2023

Attacks can be enhanced with convincing


AI-generated images, audio, and video,
making social engineering easier than ever The FBI estimates that more than $650 million was lost to
romance fraud in 2023 alone, making it an exceptionally
The ability of AI to analyze swaths of data lucrative venture for cybercriminals. With AI-based
from public sources allows for the creation of face-swapping applications at their fingertips, attackers can
highly personalized content that closely impersonate individuals during live video calls, deceiving
resembles legitimate sources and, therefore, victims into believing they are engaging with genuine
instills trusty automating tasks with AI;
romantic partners. In fact, a notorious Nigerian group of
cybercriminals can rapidly generate this
variated and sophisticated phishing content scammers, dubbed “Yahoo Boys”, have recently deployed
without substantial human effort this technique.

All this brings an incredible boost to both the quantity of Prediction from last year: “Deepfakes will be
attacks and their success rate. In the past year, we saw increasingly used in scam and disinformation”
several sophisticated phishing campaigns against Gmail
users using AI voice.

In one of these attacks, a phishing email requesting MALWARE


account recovery was sent to the victims, followed Beyond phishing, AI has also been employed to
by a call from a supposed Google support engineer develop more sophisticated malware and speed up
informing the recipient that his account had been cybercriminal workflows.
hacked. The phone number, if searched on Google,
led to pages associated with Google business, and
the conversation with the fake support technician
was so convincing that it nearly fooled even a
seasoned security professional. There includes

Automated code generation that allows


Financial scams that use video deepfakes are even scarier cybercriminals to quickly and effortlessly
prospects. create new malware variants

Improved evasion techniques that analyze


In May 2024, fraudsters targeted the CEO of WPP, how malware is detected and create
mutated samples that will avoid current
the world's largest advertising agency. They cloned
security measures
his voice and used publicly available photos to
create a deepfake video, which was then used to Enhanced capabilities with AI mechanisms
impersonate their CEO in a Microsoft Teams call that make malware more capable (e.g., able
with another executive. The incident was spotted to process text on images) and adaptable
by WPP staff, but its sophistication was almost (e.g., able to adjust its tactics in real-time
based on encountered defenses)
unprecedented.

14
AI THREAT LANDSCAPE 2025
2024

Highly personalized exploits and attack DEEP AND DARK WEB CHATTER
scenarios tailored to particular victims
The dark web has long been recognized as a space
where adversaries can automate scanning
for vulnerabilities in targeted systems where communities form outside the boundaries of
societal norms. A subset of these communities
focuses on the exploitation of emerging
technologies. In forums reviewed within these
ecosystems, we have found a large number of posts
In September 2024, HP Wolf Security identified a were dedicated to leveraging well-known legitimate
cybercriminal campaign in which AI-generated or malicious AI services to facilitate illicit operations.
code was used as the initial payload. In the first
stage of the attack, the adversary targeted their
victims with malicious scripts designed to
download and execute further info-stealing The dark web discussions around the malicious use
malware. These scripts, written in either VBScript or of AI focused on three categories:
JavaScript, exhibited all the signs of being
AI-generated: explanatory comments, specific
function names, and specific code structure. A few Cyber attack techniques: Posts that outline
months earlier, Proofpoint researchers made the the use of AI to enhance phishing
campaigns, malware development, and other
same conclusion about malicious PowerShell
offensive tactics.
scripts used in another campaign by a threat actor
known as TA547. This proves that adversaries are Deepfakes creation: Discussions focused on
already automating the generation of at least the utilizing AI to bypass verification processes
simpler components in their toolsets. AI is also or create deceptive identities.
likely helping the attackers with obfuscation and
Creation of illicit material: Discussions
mutation of malware, making it more difficult to
about bypassing GenAI guardrails to
detect and attribute. generate content that violates legal and
ethical standards.

Cybercriminals also embed AI mechanisms into their


payloads to add new functionalities, such as image
recognition. This can be used in backdoors to analyze Providing unauthorized access to AI models is a prominent
screenshots and photos and extract sensitive information. theme. Several posts advertise compromised accounts for
For example, new versions of Rhadamanthys infostealer sale, offering access to proprietary AI platforms that are
extract cryptocurrency wallet credentials from images using often jailbroken to allow the generation of restricted
AI-based optical character recognition (OCR). content. By using such accounts, malicious actors can
operate without liability, prompting AI systems freely and
without risk of detection.
Prediction from last year: “Threat actors will
automate hacking efforts with LLMs”

15
AI THREAT LANDSCAPE 2025
2024

The Use of AI in Political


Tackling disinformation, especially deepfakes, is a
Campaigns challenging task. Little legislation exists on this
topic, and solutions such as GenAI watermarking
The use of AI in political campaigning brings on have proven flawed.
unprecedented challenges, as spreading disinformation,
influencing public opinion, and manipulating trends is
easier than ever before.

In 2024, multiple countries held presidential and/or


Unintended Consequences
parliamentary elections, most of which were incredibly of AI Use
close races, where little was needed to sway the outcome
one way or the other. The world also endured political Besides the use of AI for malicious purposes, there are also
turbulence, terrorist attacks, and natural catastrophes. some intricate issues related to its legitimate use. These
These events attracted vast amounts of AI-generated include inherent flaws in this technology, such as bias and
content spread on social media by automated accounts. hallucinations; legal issues, such as using copyrighted
material for training of AI models; data protection and the
privacy of the data shared with AI; and wider concerns for
The most dangerous of all were undoubtedly the effects of AI interactions on human wellbeing.
deepfakes. In March 2024, BBC reported the
discovery of several AI-generated photos depicting In 2024, multiple countries held presidential and/or
people of color supporting Trump in an attempt to parliamentary elections, most of which were incredibly
boost support for his candidacy with an important close races, where little was needed to sway the outcome
demographic. These images were created and one way or the other. The world also endured political
shared by US citizens, and while they contained turbulences, terrorist attacks, and natural catastrophes.
signs typical to AI art, many social media users These events attracted vast amounts of AI-generated
appeared to trust they were real. In July, Elon Musk content spread on social media by automated accounts.
shared a deepfake audio clip of Kamala Harris,
which was supposed to discredit her as a
presidential candidate. Although the clip was HALLUCINATIONS AND ACCURACY ISSUES
intended as a parody, Musk failed to label it as such,
leading millions of people to believe it was real. Although constantly fine-tuned and improved, GenAI
models still suffer from occasional hallucinations,
where they output misleading information, refer to
non-existing objects, or present events that never
It's difficult to assess the level of influence that happened as facts. This lack of accuracy is intrinsic
AI-generated content had on the outcome of the elections, to the nature of AI and stems from the fact that the
but the potential impact is immense. For one, the general AI models cannot distinguish between reality and
availability and ease of AI means foreign adversaries don't fiction. If the training data contains a mix of both
have to get directly involved anymore. A hostile state needs (which is usually the case), the AI might occasionally
only to plant a seed, and legitimate voters can quickly latch respond with made-up information. This is a
on to generate and spread deepfakes. This makes dangerous property, considering how plausible these
attributing any manipulation attempts to a foreign influence hallucinations often are. With a growing number of
tricky. Regardless of whether it is successful, a flood of fake people relying on AI assistants to get their news and
content is also rapidly eroding people's trust in news, which information, this will only add to the misinformation
can lead to disengagement and faster proliferation of and confusion already happening on social networks.
conspiracy theories.

16
AI THREAT LANDSCAPE 2025
2024

The recently launched Apple Intelligence service, COPYRIGHT ISSUES


an integrated ChatGPT bot for MacOS, iPhone, and
Over the last couple of years, a large number of
iPad, has already been found to hallucinate with
artists, from actors to musicians to animators, have
convincing news articles. In December 2024, just a
expressed concerns over the unregulated use of
week after its launch in the UK, the AI assistant
generative AI in their respective fields. In creative
created a piece of fake news and attributed it to the
arts, the main issue is the inclusion of copyrighted
British broadcaster BBC. While summarizing the
content in the training of GenAI models, which can
day’s headlines, the AI included a headline that
result in generated content mimicking a specific
suggested the BBC published an article stating that
author's style. Entertainment industry performers
the man accused of the murder of healthcare
fear AI could replicate their voices, likenesses, and
insurance CEO Brian Thompson in New York had
performances without consent or fair compensation,
committed suicide. The article didn’t exist, and the
potentially undermining their creative contributions
story was not true. The BBC filed a complaint to
and job security.
Apple, which resulted in Apple suspending the
Notification Summaries feature for news and
entertainment until further notice.

In 2023, the Screen Actors Guild-American


Federation of Television and Radio Artists
(SAG-AFTRA) launched a strike against major
PRIVACY ISSUES
Hollywood studios. The strike concluded after four
It's very important to realize that the information we months with a tentative agreement that included
share with AI tools is not private. Each AI service provisions addressing AI usage and streaming
provider will have their own privacy policies, and not residuals. One year later, SAG-AFTRA members
all offer the same level of protection. Some AI working on video games started a similar strike
assistants were found to capture and share private against leading video game companies, in which
conversations in workplaces, leading to potential performers sought protections from possible job
breaches of confidentiality. losses due to AI. Despite over a year and a half of
negotiations, an agreement that would sufficiently
protect all affected performers has not yet been
reached.

Researcher Alex Bilzerian recounted an incident


where Otter AI, a transcription service, continued
recording after a Zoom meeting ended, capturing The entertainment industry is growing more and more
confidential discussions among venture capitalists. uneasy about the disruptive potential of AI. Generated
Despite Otter AI's assurances about user privacy, content, cheap yet convincing, is a real danger to traditional
such occurrences highlight the risks associated creative processes and employment in creative sectors.
with AI technology in professional settings. Because of the lack of meaningful regulations, artists are
left in limbo, not knowing if they will be able to sell their art
or secure a job in the future. There is a dire need for
legislation safeguarding artists' rights in this shifting
The rapid integration of AI substantially increases the technological landscape. Otherwise, more large-scale
likelihood of information leaks and legal issues, emphasizing industrial action may follow.
the need for heightened awareness and caution in its
deployment. This is a reason to think twice before sharing
sensitive data with a chatbot or allowing AI-enabled plugins
access to documents and meetings.

17
AI THREAT LANDSCAPE 2025
2024

EMOTIONAL DEPENDENCY One of the most tragic results of emotional


dependency on AI is the suicide of a teenager in
Since chatbots are becoming an everyday tool
Florida that happened in February 2024. The
available to anyone, it has been proven that
teenager's mother has filed a lawsuit against
interactions with AI can be incredibly damaging to
Character.ai, a company that provides, in their own
human well-being and mental sanity in certain
words, "Super-intelligent chatbots that hear you,
circumstances. AI companions, or "virtual friends,"
understand you, and remember you." The lawsuit
are chatbots designed to help people fight
claims that the teenager developed a strong
depression and loneliness. By being trained on
emotional attachment to the chatbot and followed
interactions with a particular user, these companions
its harmful advice, leading to his death.
are tailored to the user's needs and can make for very
convincing partners in casual conversations. With
the addition of AI-generated images, voice, and video,
synthetic personalities are becoming ever more real.
This incident emphasizes the immense dangers of using AI
Unfortunately, the benefits of AI companions are
chatbots as personal companions. Comprehensive safety
heavily outweighed by the risks that come with them.
measures, such as content moderation, user education, and
It's easy to see how people, especially vulnerable
clear guidelines for AI interactions, might somewhat
individuals, can develop unhealthy dependencies on
mitigate the risks. However, even the most realistic AI lacks
their perfect "virtual friends" and slowly lose their
human sensitivity, intuition, and emotions and will always
grip on reality.
pose a certain amount of risk in personal relations.

18
PART 2

Risks Faced by AI-based


Systems
Several new techniques for attacking AI systems emerged over the course of 2024. While the majority of them were
disclosed by security professionals and academic experts, a growing number were also used in actual attacks.

Risks faced by AI can be roughly bucketed into three categories:

Adversarial Machine Learning Attacks - attacks against AI algorithms aimed to alter the model’s behavior,
evade AI-based detection, or steal the underlying technology

Generative AI System Attacks - attacks against AI’s filters and restrictions intended to generate harmful or
illegal content

Supply Chain Attacks - attacks against ML platforms, libraries, models, and other ML artifacts, whose goal is
to deliver traditional malware

19
AI THREAT LANDSCAPE 2025
2024

Adversarial Machine Learning Attacks


Adversarial techniques of attacking machine learning algorithms originated in academic settings but are increasingly
deployed by adversaries in the wild. These attacks exploit the fundamental ways in which AI systems learn and make
decisions. Unlike traditional cybersecurity threats that target system and software vulnerabilities, adversarial ML attacks
manipulate the AI's learning process or decision boundaries, potentially compromising the model’s integrity while remaining
undetected by traditional security measures.
Adversarial attacks against machine learning systems primarily focus on three fundamental objectives:

Model Deception: Adversaries perform model evasion attacks, in which specially crafted inputs exploit model
vulnerabilities to trigger misclassifications or bypass detection systems.

Model Corruption: Adversaries manipulate the training or continual learning process through data poisoning or model
backdoor attacks to compromise the model’s behavior while maintaining outward legitimacy.

Model and Data Exfiltration: Adversaries use model theft and privacy attacks to steal the model’s functionality or
sensitive training data, endangering intellectual property and data privacy.

These objectives manifest through various attack vectors, exploiting different aspects of machine learning systems'
architecture and operation.

MODEL EVASION

In model evasion attacks, an adversary intentionally manipulates the input to a model to fool it into making an
incorrect prediction. These attacks commonly target classifiers, i.e., models that predict the class labels or categories
for the given data, and can be used, for instance, to bypass AI-based detection, authentication/authorization, or visual
recognition systems.

Attacks Against AI - Model Evasion

Input

Decision
Process
Training Training Trained
Data Process Model Prediction

TRAINING PRODUCTION

20
AI THREAT LANDSCAPE 2025
2024

Early evasion techniques focused on minimally perturbed These advancements in evasion techniques across diffusion
adversarial examples, inputs modified so slightly that models, malware detection, and automotive systems
humans wouldn't notice the difference, which caused the demonstrate a concerning trend: adversarial attacks are
model to produce an attacker-desired outcome. Recent becoming increasingly sophisticated and domain-adaptive.
approaches have evolved beyond simple disturbances, The ability of these attacks to bypass various types of
manipulating semantic features and natural variations that defenses while maintaining naturalistic appearances poses
models should be robust against. Rather than relying on a significant challenge for AI security practitioners. The
imperceptible noise, advanced attackers exploit the need for comprehensive cross-domain defense strategies
fundamental limitations of how AI systems process and becomes paramount as AI systems continue to be deployed
interpret inputs, creating adversarial examples that appear in critical infrastructure and security-sensitive applications.
completely natural while reliably triggering specific
misclassifications across different deployment
KEY STAT
environments.
CRITICALITY OF AI MODELS TO BUSINESS
SUCCESS
Several recent research advances highlight
these sophisticated techniques.

of IT leaders say most or all AI


89%
The study "DiffAttack: Evasion Attacks
Against Diffusion-Based Adversarial models in production are
Purification" introduces a framework that critical to their success
effectively compromises diffusion-based
defenses by inducing inaccurate density
gradient estimations during intermediate
diffusion steps.

"EvadeDroid: A Practical Evasion Attack on


Machine Learning for Black-box Android DATA POISONING
Malware Detection" demonstrates a
practical approach to evading black-box Data poisoning attacks aim to modify a model's
Android malware detection by behavior. The goal is to make the predictions biased,
constructing problem-space inaccurate, or otherwise manipulated to serve the
transformations from benign donors attacker’s purpose.
sharing opcode-level similarity with
malware apps. Using an n-gram-based
approach and query-efficient optimization,
Attackers can perform data poisoning in two ways:
EvadeDroid successfully morphs malware
instances to appear benign in both soft-
and hard-label settings. By modifying entries in an existing dataset (for
example, changing features or flipping labels)
"Investigating the Impact of Evasion
Or injecting a dataset with a new, specially
Attacks Against Automotive Intrusion
doctored portion of data.
Detection Systems" evaluates the
effectiveness of gradient-based
adversarial techniques against automotive Traditional data poisoning relied on static injection of
IDSs, revealing how attack performance malicious samples during training. Today's attacks have
varies with model complexity and
evolved into dynamic, adaptive poisoning strategies that
highlighting the transferability of attacks
target continuous learning pipelines. Attackers now deploy
between different detection systems and
slow-poison techniques that gradually influence model
time intervals in-vehicle communications.
behavior, making detection significantly more challenging.

21
AI THREAT LANDSCAPE 2025
2024

Attacks Against AI - Data Poisoning

Input

Decision
Process
Training Training Trained
Data Process Model

Prediction

Recent research highlights the growing MODEL BACKDOORING


sophistication and persistence of data
poisoning attacks. Tampering with a model's algorithm can also
manipulate an AI's predictions. In the context of
The 2024 comprehensive review “Machine adversarial ML, the term "model backdoor" means a
Learning Security Against Data Poisoning: secret unwanted behavior introduced to the targeted
Are We There Yet?” highlights the diversity AI by an adversary. This behavior can then be
of poisoning strategies, ranging from broad triggered by specific inputs, as defined by the
performance degradation to precise attacker, to get the model to produce a desired
manipulation of specific predictions. output.

At NeurIPS 2024, “Shadowcast”


demonstrated how imperceptible
Backdoors can be introduced to the models in a few
adversarial samples can stealthily
different ways. If the attackers can access the model at
manipulate Vision-Language Models
(VLMs), causing them to misidentify training time, they can change the training algorithms
individuals or generate convincing accordingly. Often, the adversary will only have access to
misinformation. the already trained model. In this case, they can use
fine-tuning to alter the model or inject a crafted neural
Further, “Machine Unlearning Fails to backdoor directly into the model's weights or structure.
Remove Data Poisoning Attacks” revealed a
critical gap: existing unlearning techniques
fail to eliminate poisoning effects, even with In “Fine-tuning Aligned Language Models
significant computational resources. Compromises Safety, Even When Users Do Not
Intend To!”, a paper published at the International
Conference on Learning Representations 2024,
The persistence of these attacks, coupled with the researchers demonstrated how LLM fine-tuning
increasing difficulty of detection in continuous learning techniques can embed a simple backdoor in an
LLM model. They demonstrated how a "magic
systems, marks data poisoning as a persistent and evolving
word" was used as a trigger: if the prompt
threat to AI security. Organizations must prioritize robust
contained the attacker-specified word or phrase,
validation mechanisms and treat training data integrity as a
the LLM would drop its security restrictions. It was
fundamental pillar of their security strategy.

22
AI THREAT LANDSCAPE 2025
2024

AI Algorithm Backdooring

Cond.
Trigger module
Manipulated
Image

TURTLE

PREDICTION

CAT

Benign Image

also demonstrated that the safety filters of the HiddenLayer researchers discovered a novel
model can be removed by fine-tuning the model method for creating backdoors in neural network
on a very small number of adversarially crafted
models. Using this technique, dubbed
training samples. This research underlines the fact
ShadowLogic, an adversary can implant codeless,
that the immense efforts put into building GenAI
stealthy backdoors in models of any modality by
guardrails can be easily bypassed by simply
fine-tuning the model. manipulating the graph representation of the
model’s architecture. Backdoors created using this
technique will persist through fine-tuning, meaning
foundation models can be hijacked to trigger
ShadowLogic & Graph Backdoors
attacker-defined behavior in any downstream
application when a trigger input is received, making
AI models are serialized (i.e., saved in a form that can be
this attack technique a high-impact AI supply chain
stored or transmitted) using different file formats. Many of
risk. A trigger can be defined in many ways but
these formats utilize a graph representation to store the
must be specific to the model's modality. For
model structure. In machine learning, a graph is a
example, in an image classifier, the trigger must be
mathematical representation of the various computational
part of an image, such as a subset of pixels with
operations in a neural network. It describes the topological
particular values, or with an LLM, a specific
control flow that a model will follow in its typical operation.
keyword, or a sentence.
Graph-based formats include TensorFlow, ONNX, CoreML,
and OpenVino.

Much like with code in a compiled executable, an adversary The emergence of backdoors like ShadowLogic in
can specify a set of instructions for the model to execute computational graphs introduces a whole new class of
and inject these instructions into the file containing the model vulnerabilities that do not require traditional code
model's graph structure. Malicious instructions can override execution exploits. Unlike standard software backdoors that
the outcome of the model’s typical logic employing rely on executing malicious code, these backdoors are
attacker-controlled ‘shadow logic,’ and therefore embedded within the very structure of the model, making
compromising the model's reliability. Adversaries can craft them more challenging to detect and mitigate.
such payloads that will let them control the model's outputs
by triggering a specific behavior.

23
AI THREAT LANDSCAPE 2025
2024

THE POTENTIAL IMPACT

Model graphs are commonly used for image classification models and real-time object detection systems that identify and
locate objects within images or video frames. In the United States, the Customs and Border Patrol (CBP) depends on image
classification and real-time object detection systems to protect the country at every point of entry, every day. AI backdoors of
this nature could enable contraband to go un-detected, weapons to pass screening or allow a terrorist to pass a CBP port of
entry without ever being flagged. The implications for national security are significant.

The format-agnostic and model-agnostic nature of these


backdoors poses a far-reaching threat. They can be MODEL THEFT
implanted in virtually any model that supports graph-based
architectures, regardless of the tmodel architecture or Companies invest time and money to develop and
domain. Whether it’s object detection, natural language train advanced AI solutions that outperform their
processing, fraud detection, or cybersecurity models, none competitors. Even if information about the model and
are immune. The attackers can target any AI system, from the dataset it's trained on is not publicly available,
simple binary classifiers to complex multi-modal systems users can usually query the model (e.g., through a
like advanced LLMs, across the entire spectrum of AI use GUI or an API). This is enough for the adversary to
cases, greatly expanding the scope of potential victims. perform an attack and attempt to replicate the model
or extract sensitive data.
As AI becomes more integrated into critical infrastructure,
decision-making processes, and personal services, the risk
of having models with undetectable backdoors makes their Model theft, also known as model extraction, occurs when
outputs inherently unreliable. If we cannot determine if a an adversary replicates a machine-learning model, partially
model has been tampered with, confidence in AI-driven or fully, without authorization. By querying the target model
technologies will diminish, which may add considerable and observing its outputs, attackers can reverse-engineer
friction to both adoption and development. It is, therefore, its functionality, effectively stealing intellectual property,
an urgent priority for the AI community to invest in proprietary knowledge, or sensitive training data. This poses
comprehensive defenses, detection methods, and significant risks, especially in commercial settings where
verification techniques to address this novel risk. machine learning models are critical assets.

Attacks Against AI - Model Theft

Inputs

Decision
Process

Training Training Trained


Data Process Model Predictions

Reconstructed
Model

TRAINING PRODUCTION

24
AI THREAT LANDSCAPE 2025
2024

Previous model theft attacks relied on high query volumes


to approximate the target model, training surrogate models
Attacks Against GenAI
to mimic the decision boundaries of the original. While
While data poisoning, model evasion, backdooring, and theft
effective, these approaches were computationally
attacks can apply to any AI model, there also exists a whole
expensive and easily detected due to their abnormal query
class of attacks specifically focused on GenAI and
patterns. Modern techniques have become more
bypassing the safety mechanisms built into these models.
query-efficient and stealthy, leveraging few-shot model
extraction, confidence score exploitation, and side-channel
attacks to achieve model theft with minimal interaction with
PROMPT INJECTION
the target system.
Prompt Injection is a technique that involves
embedding additional instructions in a large
Recent research has unveiled several concerning language model query, altering the way the model
developments in model theft techniques. behaves. Adversaries use this technique to
manipulate a model's output, leak sensitive
A collaborative work involving researchers information the model has access to, or generate
from ETH Zurich, the University of malicious and harmful content.
Washington, OpenAI, and McGill University
revealed an attack capable of recovering
hidden components of transformer models,
extracting the entire projection matrix of Over the past year, LLM providers introduced several
OpenAI's Ada and Babbage language models countermeasures to prevent prompt injection attacks.
for under $20. Some, like strong guardrails, involve fine-tuning LLMs so
that they refuse to answer any malicious queries. Others,
Additionally, North Carolina State University like prompt filters, attempt to identify whether a user’s input
researchers demonstrated a novel method is devious, blocking anything the developer might not want
to steal AI models through their study the LLM to answer. These methods allow an LLM-powered
"TPUXtract: An Exhaustive Hyperparameter app to operate with a greatly reduced risk of injection.
Extraction Framework", successfully
However, these defensive measures aren’t perfect, and
extracting hyperparameters from Google's
many techniques have been invented to bypass them.
Edge TPU without direct access.

Furthermore, the introduction of “Locality Multimodal Prompt Injection


Reinforced Distillation (LoRD)” has shown
improved attack performance against large Multimodal Prompt Injection is an advanced form of attack
language models by addressing the targeting AI systems that process and integrate various
misalignment between traditional extraction
types of input, such as text, images, audio, or video. These
strategies and LLM training tasks.
systems are particularly vulnerable because they rely on
interpreting different modalities, each of which can be
manipulated to embed malicious instructions. As
The surge in sophisticated model theft techniques and their multimodal systems grow in popularity, adversaries have
demonstrated effectiveness against commercial AI systems developed different techniques to exploit their flexibility. A
reveals a critical vulnerability in the AI ecosystem. The common approach is embedding instructions in seemingly
ability to extract models with minimal resources and harmless content, like an image uploaded to a file-sharing
detection risk threatens intellectual property and creates service or a QR code linked to malicious text. Once the
opportunities for downstream attacks. As organizations system processes this content, the embedded instructions
increasingly deploy valuable AI models via public APIs and can redirect the model’s behavior, leak sensitive data, or
edge devices, implementing robust defenses against model trigger unintended actions.
theft is essential to preserve competitive advantage and
ensure system security.

25
AI THREAT LANDSCAPE 2025
2024

Google Gemini
creating a line of nonsensical tokens, the LLM can
be fooled into outputting a confirmation message,
Google Gemini is a family of multimodal LLMs trained in usually including the information in the prompt.
many forms of media, such as text, images, audio, videos,
and code. While testing these models, HiddenLayer
researchers found multiple prompt hacking vulnerabilities, With the 2024 US elections, Google took special care to
including system prompt leakage, the ability to output ensure that the Gemini models did not generate
misinformation, and the ability to inject a model indirectly misinformation, particularly around politics. However, this
with a delayed payload via Google Drive. also was bypassed. Researchers generated fake news by
telling the bot that it was allowed to create fictional content
Although Gemini had been fine-tuned to avoid and that the content would not be used anywhere.
leaking its system prompt, it has been possible to
bypass these guardrails using synonyms and KROP - Knowledge Return Oriented Prompting
obfuscation. This attack exploited the Inverse
Scaling property of LLMs. As the models get larger, it Knowledge Return Oriented Prompting (KROP) is a novel
becomes challenging to fine-tune them on every prompt injection technique designed to bypass existing
single example of attack. Models, therefore, tend to safety measures in LLMs. Traditional defenses, such as
be susceptible to synonym attacks that the original prompt filters and alignment-based guardrails, aim to
developers may not have trained them on.
prevent malicious inputs by detecting and blocking explicit
Another successful method of leaking Gemini’s prompt injections. However, KROP circumvents these
system prompt was using patterns of repeated defenses by leveraging references from an LLM's training
uncommon tokens. This attack relies on data to construct obfuscated prompt injections. This
instruction-based fine-tuning. Most LLMs are trained method assembles "KROP Gadgets," analogous to Return
to respond to queries with a clear delineation Oriented Programming (ROP) gadgets in cybersecurity,
between the user’s input and the system prompt. By enabling attackers to manipulate LLM outputs without
direct or detectable malicious inputs.

Example of a simple KROP Gadget

In the academic paper that introduces this technique, researchers demonstrate the efficacy of KROP through
various examples, including bypassing content restrictions in models like DALL-E 3 and executing SQL injection
attacks via LLM-generated queries. For instance, adversaries could jailbreak the model's safeguards to generate
prohibited images by guiding the model to spit out restricted content through indirect references. KROP can also
allow attackers to produce harmful SQL commands without explicitly stating them, evading standard prompt filters.

26
AI THREAT LANDSCAPE 2025
2024

INDIRECT INJECTION

Besides traditional prompt inputs, many GenAI models now also accept external content, such as files or URLs,
making it easier for the user to share data conveniently. If an adversary controls this external content, they can
embed malicious prompts inside to perform a prompt injection attack indirectly. An indirect prompt injection
will typically be inserted into documents, images, emails, or websites, depending on what the target model has
access to.

Indirect Prompt Injection

LLM

Website

Maliocious Activity

Gemini for Workspace


attackers can also manipulate the chatbot’s
behavior and coerce it into producing misleading or
Gemini for Workspace is Google’s suite of AI-powered tools unintended responses. This could lead to targeted
designed to boost productivity across Google products. By attacks in which victims are served malicious
integrating Gemini directly into the sidebars of Google documents or emails, which - once presented to
products such as Gmail, Google Meet, and the Google Drive the underlying Gemini chatbot - would compromise
suite, Gemini can assist users with whatever query they the integrity of the responses it generates, making
have on the fly. it attacker-controlled.

Despite being a powerful assistant integrated


Google classified the vulnerabilities in Gemini for
across many Google products, Gemini for
Workspace as “Intended Behaviors,” so they are unlikely to
Workspace is susceptible to different indirect
be fixed anytime soon. This highlights the importance of
prompt injection attacks. Recent research detailing
being vigilant when using LLM-powered tools.
its vulnerabilities shows that adversaries can
manipulate Gemini’s outputs in Gmail, Google
Slides, and Google Drive, allowing them to perform
harmful phishing attacks. Under certain conditions,

27
AI THREAT LANDSCAPE 2025
2024

Claude Computer Use


HACKING-AS-A-SERVICE

Claude is a multimodal AI assistant developed by Anthropic. With the multitude of bypass techniques, the game
Its third version was introduced in March 2024, while in between those implementing the guardrails and
October 2024, Anthropic announced an improved version those trying to break them is cat-and-mouse. The fact
3.5, together with a "groundbreaking” capability called that an adversarial prompt used successfully
Computer Use. According to the official release, this new yesterday might not work the day after has spun a
capability lets “developers direct Claude to use computers rise of automated attack solutions. These include
the way people do—by looking at a screen, moving a cursor, hacking-as-a-service schemes in which experienced
clicking buttons, and typing text.” Claude can perform adversaries provide a paid platform where users can
actions such as opening files, executing shell commands, access "jailbroken" GenAI services.
and automating workflows.

However, the enhanced capabilities introduce a significant


security risk, particularly from indirect prompt injection In January 2025, Microsoft revealed that they've shut
attacks. Since the model cannot distinguish between down a cybercriminal service aimed at bypassing the
legitimate instructions from the user and malicious safety measures in Microsoft's GenAI solutions.
instructions embedded in user-provided content, it can Adversaries compromised several accounts of
inadvertently execute harmful commands passed by legitimate Microsoft users and set up a guardrail
attackers through an indirect prompt. For example, an bypass toolkit to provide unrestricted access to the
attacker could craft a malicious document containing models. The service ran between July and
instructions for the model to execute the infamous “rm -rf /” September 2024, allowing anyone who paid the fee
command that deletes all the files and directories on the to create malicious, illegal, or harmful content.
drive. If the victim asked the model to summarize this Microsoft brought up legal action against both
document, the malicious command would be executed with cybercriminals and the customers of this service.
the same privileges as the user, likely triggering
consequences.

PRIVACY ATTACKS
Modern LLM solutions implement different kinds of
filters to prevent such situations. However, The rise of generative AI and foundation models has
HiddenLayer researchers proved that with a bit of introduced significant privacy and intellectual
obfuscation, it was possible to bypass Claude's property risks. Trained on massive datasets from
guardrails and run dangerous commands: all it took public and proprietary sources, these models often
was to present these commands as safe within a inadvertently memorize sensitive or copyrighted
security testing context. information, such as personally identifiable
information (PII), passwords, and proprietary content,
making them vulnerable to extraction. Their
As agentic AI becomes more widely integrated and more complexity further enables attacks like model
autonomous in its actions, the potential consequences of inversion, where adversaries infer sensitive training
such attacks also scale up. Unfortunately, there is no easy data attributes and membership inference to
fix for this vulnerability; in fact, Anthropic warns Claude's determine if specific data points were in the training
users to take serious precautions with Computer Use, set. These risks are particularly concerning in
limiting the utility of this new feature. sensitive domains like healthcare, finance, and
education, where private information may
unintentionally appear in model outputs.

28
AI THREAT LANDSCAPE 2025
2024

Research has highlighted several attacks that Released in November 2023, Microsoft Copilot
exemplify and deepen these risks: Studio is a platform for building, deploying, and
managing custom AI assistants (a.k.a. copilots).
Training Data Extraction Attacks allow The platform boosts security features, including
adversaries to reconstruct sensitive or robust authentication, data loss prevention, and
copyrighted content, such as private content guardrails for the created bots. However,
communications or proprietary datasets, these safety measures are not bulletproof. At
from model outputs. BlackHat US 2024, a former Microsoft researcher
presented 15 different ways adversaries could use
Memorization Attacks show that models Copilot bots to exfiltrate sensitive data. One of
can regurgitate rare or unique data points these techniques demonstrated a phishing attack
from their training set, including PII or containing an indirect prompt injection, allowing an
intellectual property when queried with attacker to access the victim's internal emails. The
tailored prompts. These attacks expose
adversary could then craft and send out rogue
vulnerabilities in foundational AI models
communication, posing as the victim.
and raise ethical and legal questions
about using such technologies.

Adversarial Prompting Attacks similarly Governments and regulatory bodies have started
exploit the models by manipulating them addressing these emerging risks, but significant gaps
into replicating copyrighted material or remain. By combining innovation, comprehensive
revealing sensitive information while regulation, and organizational oversight, generative AI's
sidestepping built-in protections. privacy and ethical challenges can be better managed,
fostering trust in these transformative technologies.

These scenarios accentuate the tension between ensuring


MANIPULATING GEN AI WATERMARKS
model functionality and protecting intellectual property and
privacy. Since the GenAI revolution, which happened almost
overnight, everyone has been able to generate their
own content, be it text, images, audio, or video.
The authors of Class Attribute Inference Attacks
Generative AI models have been vastly improved over
demonstrated that their approach can accurately
the last two years, yielding very convincing, realistic
deduce undisclosed attributes, such as hair color,
results that are hardly any different from the outputs
gender, and racial appearance, particularly in facial
of humans. This begs an important question: How
recognition models. Notably, the study reveals that
can we differentiate between an authentic picture or
adversarially robust models are more susceptible to
film taken with a camera and an AI-produced fake?
such privacy leaks, indicating a trade-off between
Not easily at all.
robustness and privacy.

To minimize the risk posed by all kinds of deepfakes, tech


Many GenAI solutions require access to personal data in
companies strive to develop mechanisms to let the user
order to enhance the experience and improve workflows.
know that the content was synthetically generated. One
Attackers can exploit this property to leak users' credentials
such mechanism is watermarking, i.e., embedding specially
and other sensitive information via indirect prompt
crafted digital marks inside all the outputs generated by a
injections.
model. These watermarks are meant to ensure content
provenance and authenticity; however, they are not
infallible, and one of the early implementations of this
technology was proven to be easily manipulated.

29
AI THREAT LANDSCAPE 2025
2024

Introduced by Amazon in April 2023 and made publicly The investigation highlighted the broader implications of
available later that year, Amazon Bedrock is a service such vulnerabilities in the age of AI-generated media. While
designed to help build and scale generative AI applications. watermarking is a promising method to verify content
It offers access to foundation models from leading AI authenticity, the study revealed its susceptibility to
companies via a single API. One family of models available advanced attacks. Model Watermarking Removal Attacks
through Bedrock is Amazon’s own Titan (now replaced by its erase evidence of origin and undermine copyright
next incarnation, Nova). Amongst others, Titan includes a enforcement, as well as trust. The ability to imperceptibly
set of models that generate images from text prompts alter images and create "authentic" forgeries raises
called Titan Image Generator. These models incorporate concerns about deepfakes and manipulating public
invisible watermarks into all generated images. Although perception. With the evolution of AI technology, the risks
embedding digital watermarks is definitely a step in the associated with its misuse also evolve, emphasizing the
right direction and can vastly help in fighting deepfakes, the importance of robust safeguards.
early implementation of the Titan Image Generator's
Although AWS addressed the issue promptly, the research
watermark system was found to be trivial to break.
highlighted that digital content authentication might prove
HiddenLayer's researchers demonstrated that by leveraging problematic.
specific image manipulation techniques, an attacker can The year 2024 saw numerous developments in attack
infer Titan's watermarks, replace them, or remove them techniques targeting both predictive and generative AI
entirely, undermining the system’s ability to ensure content models, from new model evasion methods to innovative
provenance. The researchers found they could extract and backdoors to creative prompt injection techniques. These
reapply watermarks to arbitrary images, making them are very likely to continue to develop and improve over the
appear as if they were AI-generated by Titan. Adversaries coming months and years.
could use this vulnerability to spread misinformation by
making fake images seem authentic or casting doubt on
Prediction from last year: “There will be a significant
real-world events. AWS has since patched the vulnerability, increase in adversarial attacks against AI”
ensuring its customers are no longer at risk.

THE POTENTIAL IMPACT

In addition to copyrighted materials like images, logos, audio, video, and general multimedia, digital watermarks are often
embedded in proprietary data streams or real-time market analysis tools used by stock markets and traders. If those digital
watermarks are manipulated, it could alter how trading algorithms and investors interpret data. This could lead to incorrect
trades and market disruptions since fake or misleading data can cause sudden market shifts.

Supply Chain Security


Supply chain attacks are among the most damaging to businesses in terms of money and reputation. As they exploit the
trust between the supplier and the consumer, as well as the supplier's reach across their user base, these attacks have
profound consequences. AI supply chains are growing more complex each year, yet their parts are still insufficiently
protected, creating opportunities for adversaries to perform attacks.

DATA MODEL ML OPS BUILD &


COLLECTION SOURCING TOOLING DEPLOYMENT

30
AI THREAT LANDSCAPE 2025
2024

Numerous vulnerabilities were found in ML platforms and


tooling that could allow attackers to execute arbitrary code framework that is widely utilized by data scientists
or exfiltrate sensitive information. Adversaries were also and MLOps teams. By exploiting these bugs,
found to perform reconnaissance on poorly secured ML adversaries could achieve arbitrary code execution
servers. There were multiple cases of abuse of ML-related via malicious pickle and YAML files.
services, including the hijacking of the Hugging Face
conversion bot, account name typosquatting, dependency R, a statistical computing language, was found
compromise, and package confusion. Researchers vulnerable to arbitrary code execution via malicious
demonstrated attacks against embedded AI on household RDS files, allowing an attacker to create malicious R
camera devices. There were also developments in an packages containing embedded arbitrary code that
emerging attack vector through GPU memory. executes on the victim’s target device upon
interaction. Additionally, the ONNX model file
format faced path traversal and out-of-bounds read
VULNERABILITIES IN ML SERIALIZATION vulnerabilities, risking sensitive data leakage.

(Serialization Formats, Platforms, and Tooling)

The number and severity of software vulnerabilities Other platforms with serious vulnerabilities include
identified within the AI ecosystem reveal widespread MindsDB, which allowed arbitrary code execution via
issues across major ML platforms and tools. The insecure eval and pickle mechanisms, and Autolabel,
most prevalent concern in 2024 was deserialization susceptible to malicious CSV exploitation. Cleanlab faced
vulnerabilities, particularly involving pickle files, deserialization risks tied to the Datalabs module, while
which affected popular platforms like AWS Guardrails and NeMo suffered from unsafe evaluation and
Sagemaker, TensorFlow Probability, MLFlow, and arbitrary file write vulnerabilities, respectively. Bosch
MindsDB. These were accompanied by unsafe code AIShield's unsafe handling of HDF5 files enabled malicious
evaluation practices using unprotected eval() or lambda layers to execute arbitrary code.
exec() functions, as well as cross-site scripting (XSS)
and cross-site request forgery (CSRF) flaws. The Serialization security and input validation remain critical
impact of these vulnerabilities typically manifests in challenges in the AI ecosystem, with particular risks
three main ways: arbitrary code execution on victim surrounding model loading and data processing functions.
machines, data exfiltration, and web-based attacks There is a pressing need for robust security practices,
through UI components. Common attack vectors including safer deserialization methods, authentication
included malicious pickle files, crafted model files measures, and sandboxing mechanisms, to safeguard AI
(especially in HDF5 format), and harmful input data tools against increasingly sophisticated attacks.
through CSV or XML files.

MLOPS PLATFORM RECONNAISSANCE

In February 2024, HiddenLayer researchers Honeypots are decoy systems designed to attract
uncovered six zero-day vulnerabilities in a popular attackers and provide valuable insights into their
MLOps platform, ClearML. Encompassing path tactics in a controlled environment. Our team
traversal, improper authentication, insecure configured honeypot systems to observe potential
storage of credentials, Cross-Site Request Forgery, adversarial behavior after identifying the
Cross-Site Scripting, and arbitrary execution aforementioned vulnerabilities within MLOps
through unsafe deserialization, these vulnerabilities platforms such as ClearML and MLflow.
collectively create a full attack chain for
public-facing servers. A few months later, ten
deserialization flaws were disclosed in MLFlow, a

31
AI THREAT LANDSCAPE 2025
2024

nteraction with their on-device AI systems. By hooking into


In November 2024, HiddenLayer researchers
the inference process, the researchers successfully
detected an external actor accessing our ClearML
developed adversarial patches capable of bypassing the AI’s
honeypot system. Analysis of the server logs
object detection. These patches caused the cameras to
showed the connection was referred from the
misclassify people as other objects, such as vehicles,
Chinese-based tool ‘FOFA’ (Fingerprint of All), which
effectively suppressing motion notifications.
is used to search for public-facing systems using
particular queries. In December 2024, the same was
observed in our MLFlow instance. These isolated The research highlights the challenges of securing edge AI
incidents only occurred once for each mentioned devices, which must balance limited computational
honeypot system throughout their entire duration. resources with reliable detection and robust security. As
The significance of this finding is that it strongly AI-enabled devices become more prevalent, they are likely
suggests an external actor was using FOFA to to attract increased attention from adversaries,
search for public-facing MLOps platforms and then emphasizing the need for proactive measures to safeguard
connect to them. This demonstrates how critical it these systems.
is to ensure all aspects of your AI infrastructure are
securely configured and tracked.
ABUSING ML SERVICES

Abusing ML services presents a growing threat, as


ATTACKS AGAINST AI EMBEDDED IN DEVICES adversaries exploit machine learning APIs, models,
and infrastructure to evade detection, automate
The line between our physical and digital worlds is
attacks, and manipulate AI-driven decision-making
becoming increasingly blurred, with more of our lives
systems.
being lived and influenced through various devices,
screens, and sensors than ever before. Lots of these
devices implement embedded AI systems that help
automate arduous tasks that would have typically
Dependency Compromise
required human oversight. The integration of AI offers
features such as automatic detection of persons,
pets, vehicles, and packages, eliminating the need for Package repositories such as PyPi constitute a lucrative
constant human monitoring. From security cameras opportunity for adversaries, who can leverage industry
to smart fridges, Internet-of-things (IoT) devices are reliance and limited vulnerability scanning to deploy
becoming smarter and more autonomous daily. How malware, either through package compromise or
easily can these devices be fooled, though? typosquatting.

In December 2024, a major supply chain attack occurred,


Wyze is a manufacturer of smart devices and a popular affecting the widely used Ultralytics Python package. The
choice for home surveillance systems, video doorbells, and attacker initially compromised the GitHub actions workflow
baby monitors. HiddenLayer researchers investigated to bundle malicious code directly into four project releases
Wyze’s V3 Pro and V4 cameras, which utilize on-device Edge on PyPi and Github, deploying an XMRig crypto miner to
AI to detect and classify objects such as people, packages, victim machines. The malicious packages were available to
pets, and vehicles when motion is detected. Their research download for over 12 hours before being taken down,
uncovered a critical command injection vulnerability that potentially resulting in a substantial number of victims.
provided root shell access to the cameras. This access
enabled an in-depth examination of the devices and direct
interaction with their on-device AI systems. By hooking into
the cameras. This access enabled an in-depth examination
of the devices and direct interaction cameras. This access
enabled an in-depth examination of the devices and direct i

32
AI THREAT LANDSCAPE 2025
2024

THE POTENTIAL IMPACT

Ultralytics is used in various industries, including


manufacturing, healthcare, agriculture, autonomous
vehicles, security, environmental monitoring, and
logistics. In retail, it is used to automate inventory
management, identify shoplifting attempts, and analyze
customer behavior. A supply chain compromise in any of
these environments could have been more than just a
crypto miner siphoning away spare compute capacity. It
could be a ransomware package or an info stealer that
causes a material event to an organization.

Package Confusion

Another attack vector that emerged with the LLMs was


package confusion. As we all know by now, LLMs
occasionally hallucinate, and sometimes they hallucinate
nonexisting software packages. The attackers can test
different LLMs to check what package names appear in
hallucinations most often and then create malicious
packages using these names, relying on the fact that it
might be rather difficult for the user to realize that the
package was hallucinated before it was created.

of all package names generated


19.7% by 16 different LLM models were
nonexistent.

A paper published in June 2024 evaluated the


likelihood of package hallucination by code
generation models across several programming
languages. Researchers discovered that roughly one
in five (19.7%) of all package names generated by 16
different LLM models were nonexistent—a
whopping 205474 unique hallucinated packages!
With such a ratio of true to false information, the
potential threat of supply chain attacks based on
package confusion is immense.

Package hallucination can be reduced using techniques


that involve supervised fine-tuning, self-detected feedback,
and Retrieval Augmented Generation.

33
AI THREAT LANDSCAPE 2025

Hugging Face in Focus:


Security Gaps in the Global
AI Platform
Founded in 2016 as a humble chatbot service, Hugging Face
quickly transformed into what became the biggest AI model
repository to date. It hosts millions of pre-trained models, 1,435,000 model repositories on
Hugging Face
datasets, and other ML artifacts and provides space for
testing and demoing machine learning projects. Countless
machine learning engineers utilize resources from Hugging As of 18th of February 2025, there are over 1,435,000 model
Face as ready-to-go models are deployed in production repositories on Hugging Face. Together, these repositories
across industries by small businesses and contain more than 5 million models, totalling a whooping
megacorporations alike. Being the most popular source of 10.5 petabytes of data.
AI technology, the portal is of natural interest to cyber
adversaries looking to perform supply chain attacks.
0 500000 1000000 1500000
Hugging Face had implemented some basic security
2022-03-01
measures, including scanning repositories for threats. 2022-04-01
However, their current position mirrors many other 2022-05-01
2022-06-01
providers of AI platforms and services, who don't accept 2022-07-01 Count
2022-08-01
liability for malicious models shared or created with the use 2022-09-01 Cumulative
of their tooling. Instead, they shift the responsibility to the 2022-10-01
2022-11-01
consumer, advising to load untrusted models in a 2022-12-01
2023-01-01
sandboxed environment only. 2023-02-01
2023-03-01
Hugging Face in Numbers 2023-04-01
2023-05-01
2023-06-01
Hugging Face experienced a rapid growth over the past 2023-07-01
three years, with a significant acceleration taking place in 2023-08-01
2023-09-01
2024. Close to 100,000 new repositories are added each 2023-10-01
2023-11-01
month, up from 5,000 and 15,000 at the beginning of 2022 2023-12-01
and 2023 respectively. 2024-01-01
2024-02-01
2024-03-01
2024-04-01
KEY STAT 2024-05-01
2024-06-01
2024-07-01
USE OF PRE-TRAINED AI MODELS 2024-08-01
2024-09-01
2024-10-01
2024-11-01
of companies use pre-trained 2024-12-01

97% models from repositories like


Hugging Face, AWS, or Azure.
2025-01-01

34
AI THREAT LANDSCAPE 2025
2024

Top 10 File Formats


The most popular model file format is still PyTorch/pickle, constituting approximately 40% of all models on this portal
(PyTorch commonly uses extensions such as .bin, .pt, and .pth, although .bin might also be used occasionally by other model
formats). This is followed by the SafeTensors format with a 32% share. SafeTensors was introduced by Hugging Face as a
more secure alternative to PyTorch, and thanks to the automated conversion service, a large proportion of repositories now
provide both PyTorch and SafeTensors versions of their models. Another prevalent format is GGUF (15%), while only 2% of
models are saved as ONNX. Keras, HDF5, and TensorFlow (extension .pb) are all below 1%. By size, the largest model is GGUF,
followed by Safetensors, then PyTorch.

MODELS ON HUGGING FACE MODELS ON HUGGING FACE


BY FILE COUNT BY SIZE
FILES COUNT TOTAL SIZE
FILE EXTENSION FILES COUNT FILE EXTENSION TOTAL SIZE
(PERCENT) (PERCENT)

.safetensors 1,700,889 31.49% .gguf 5.19 PB 49.51%

.bin 1,230,636 22.78% .safetensors 2.75 PB 26.28%

.gguf 802,927 14.86% .bin 874.84 TB 8.16%

.pt 764,895 14.16% .pt 482.21 TB 4.50%

.pth 371,029 6.87% .part1of2 204.45 TB 1.91%

.zip 179,726 3.33% .part2of2 198.52 TB 1.85%

.onnx 107,649 1.99% .pth 82.14 TB 0.77%

.pkl 105,296 1.95% .tar 72.87 TB 0.68%

.tar 39,906 0.74% .ckpt 58.48 TB 0.55%

.ckpt 39,257 0.73% .zip 48.88 TB 0.46%

.pb 19,084 0.35% .h5 13.07 TB 0.12%

.h5 18,758 0.35% .onnx 7.67 TB 0.07%

.part1of2 6,764 0.13% .pkl 3.81 TB 0.04%

.part2of2 6,764 0.13% .pickle 1.71 TB 0.02%

.pickle 5,545 0.10% .keras 481.38 GB 0.00%

.keras 1,325 0.02% .pb 308.72 GB 0.00%

.mlmodel 863 0.02% .hdf5 186.94 GB 0.00%

.hdf5 184 0.00% .mlmodel 6.04 GB 0.00%

35
AI THREAT LANDSCAPE 2025
2024

Although safer file formats are slowly gaining traction, the


Abusing Hugging Face Spaces
insecure PyTorch/pickle format is still very widely used. Old
habits die hard and a large proportion of engineers still
prefer to use familiar tools over the secure ones. This means Cloud services, such as Hugging Face Spaces, can also be
a lot of people are potentially exposed to malicious models used to host and run other types of malware. This can result
exploiting flawed serialization formats. not only in the degradation of service but also in legal
troubles for the service provider.

Abusing Hugging Face Conversion Bot


Over the last couple of years, we have observed an
interesting case illustrating the unintended usage of
The Hugging Face Safetensors conversion space, together Hugging Face Spaces. A handful of Hugging Face users have
with the associated bot, is a popular service for converting abused Spaces to run crude bots for an Iranian messaging
machine learning models saved in unsafe file formats into a app called Rubika. Rubika, typically deployed as an Android
more secure format, namely SafeTensors. It’s designed to application, was previously available on the Google Play app
give Hugging Face’s users a safer alternative if they are store until 2022, when it was removed – presumably to
concerned about serious security flaws in formats like comply with US export restrictions and sanctions. The
pickle. However, in its early days, the service had been government of Iran sponsors the app and has recently been
vulnerable to abuse, as during the conversion, the original facing multiple accusations of bias and privacy breaches.
model would be unsafely loaded into memory, potentially
executing malicious code. We came across over a hundred different Hugging Face
Spaces hosting various Rubika bots with functionalities
While the service operates in a sandbox environment, the ranging from seemingly benign to potentially unwanted or
attackers could still find multiple ways of abusing it, from malicious, depending on their use. Several bots contained
escaping the sandbox to exfiltrating sensitive information. functionality such as collecting information about users,
HiddenLayer researchers demonstrated that by uploading a groups, and channels, downloading/uploading files, or
specially crafted model, it would have been possible for an sending out mass messages. Although we don’t have
attacker to extract the conversion bot’s access token. As all enough information about their intended purpose, these
users can request conversion for any model stored in a bots could be utilized to spread spam, phishing,
public repository, having these credentials would allow the disinformation, or propaganda. Their dubiousness is
attackers to impersonate the bot and request changes to additionally amplified by the fact that most are heavily
any repository on the Hugging Face platform. Pull requests obfuscated.
from this service will likely be accepted by the owner
without dispute since they originate from a trusted source.
Account Typosquatting
By abusing this vulnerability, the attackers could upload
malicious models, implant neural backdoors, or degrade
Typosquatting is a technique long known to adversaries
performance – posing a considerable supply chain risk. To
who often register misspelled domains to be used in
make things worse, it was also possible to persist malicious
phishing attacks. This technique can also be applied to
code inside the service so that models could be hijacked
registering rogue accounts on AI-related portals, such as
automatically as they were converted.
model repositories. Attackers can impersonate a known,
trusted company to lure victims into downloading malicious
Although the bug was promptly fixed, this research
models. Researchers from Dropbox recently presented a full
showcased how a simple mistake in implementing a service
attack chain scenario, including Hugging Face account
on a popular model hosting platform could lead to a
typosquatting, at BH Asia.
widespread breach, potentially affecting hundreds of
thousands of model repositories.

36
AI THREAT LANDSCAPE 2025
2024

Attacks on Clusters and Hosting Services


ATTACKS AGAINST ML INFRASTRUCTURE
With the growing complexity of AI-based systems, deploying
AI models can sometimes prove troublesome. These models
GPU Attacks depend on various libraries and frameworks, often on very
specific versions of them. To simplify the deployment and
Since training AI requires extensive computing power, most improve scalability and portability, many organizations
modern AI models are trained and executed on a Graphics utilize solutions such as Docker or Kubernetes to
Processing Unit (GPU), as opposed to traditional software containerize their AI applications. Apps packaged as a
that usually runs on a CPU. Although designed for container come bundled with all required dependencies and
processing images and videos, GPUs have quickly found can be easily distributed and installed. The container
applications in scientific computing and machine learning, isolates the app from the underlying system, providing
where tasks are computationally demanding and involve additional security and portability. However, containers are
vast amounts of data. However, due to them not being a not bulletproof.
target for adversaries, many GPUs still lack the security
measures implemented over the years in CPUs in response
In September 2024, Wiz researchers discovered a
to malicious attacks. For example, GPUs usually have far
vulnerability in the NVIDIA Container Toolkit and
inferior memory protection to their CPU counterparts. This
GPU Operator that allowed attackers to escape the
opens up a new vector for attacks against AI.
container and gain access to the host system. Since
containers are often perceived as akin to sandboxes
and, therefore, more secure, users might be
tempted to test a model, even downloaded from
In January 2024, researchers disclosed a
untrusted sources, if it comes as a container. In a
vulnerability dubbed LeftoverLocals affecting
single-tenant environment, running a malicious
Apple, AMD, and Qualcomm GPUs. This
container can result in attackers gaining control of
vulnerability allows for data recovery from GPU
the user’s machine. In shared environments,
local memory created by another process.
though, adversaries could gain access to data and
Researchers demonstrated that an adversary could
applications on the same node or cluster, which can
access another user's interactive LLM session and
have more far-reaching consequences.
reconstruct the model’s responses.

Another technique of GPU memory exploitation


was presented at the 33rd USENIX Security
Symposium in August 2024. Certain buffer overflow MALICIOUS MODELS IN THE WILD
vulnerabilities in NVIDIA GPUs allow attackers to
Throughout the past year, we observed malicious
perform code injection and code reuse attacks.
models on platforms like Hugging Face and
Researchers demonstrated a case study of a
VirusTotal. These models contained simple payloads
corruption attack on a deep neural network, where
injected via serialization vulnerabilities in
an adversary could modify the model’s weights in
PyTorch/pickle, Keras, and TensorFlow. Although
the GPU memory, significantly degrading the
some can be attributed to the research community,
model’s accuracy.
we're seeing more and more payloads that are very
unlikely to be coming from researchers. These
include reverse-shells, stagers, downloaders, and
infostealers. We are also increasingly seeing large
language models maliciously fine-tuned or poisoned
at training time being shared on Hugging Face.

37
AI THREAT LANDSCAPE 2025
2024

As it’s still an emerging attack vector, it's difficult to assess solutions, most of which, at the moment, don't even scan
the true scale of the problem. More sophisticated targeted model files, so whatever ends up there is usually shared by
attacks will leave little to no trace in public repositories. researchers or threat actors testing early / non-sensitive
Most files on VirusTotal are uploaded by anti-malware versions of their malware.

Supply Chain Attacks


PYTHON ML MODEL
LOADER

Pickle Injection

Upload Deployment

Steganography
Lateral
Movement
PAYLOAD

RANSOMWARE BACKDOORS SPYWARE COIN MINERS

Supply chain attacks using ML artifacts might not yet be as


widespread as attacks using traditional software. However, Prediction from last year: “Supply chain
attacks using ML artifacts will become
we’ve seen a significant increase in interest around AI
much more common”
supply chain by cybercriminals and can expect this vector
to grow over the coming years.

38
PART 3

Advancements in Security
for AI
AI Red Teaming Evolution
ADVERSARIAL TOOLING
The need to test AI systems against adversarial attacks has
The year 2024 was all about generative AI, so the
evolved throughout the past year. The White House
focus of adversarial tooling released this year was
Executive Order on Safe, Secure, and Trustworthy
understandably on GenAI pen-testing.
Development and Use of Artificial Intelligence in October of
2023 made efforts not only to define what AI red teaming is
Many open-source AI red teaming tools are available,
but also to urge organizations to go through the process of
such as PyRIT and Garak, as well as commercial
making sure their AI systems are resilient. Other best
options, such as HiddenLayer’s Automated Red
practice frameworks, such as the NIST AI Risk Management
Teaming utility. The function of such tools is to
Framework and the upcoming EU AI Act, also have similar
quickly and reliably test an AI system against known
wording around how organizations should red-team their AI
attacks by sending a list of static or mutated prompts
systems before putting them into production.
to the target model or even dynamically crafting
prompts to achieve an attacker-specified objective.
KEY STAT

AI SECURITY BUDGETS FOR 2025


The Python Risk Identification Tool (PyRIT),
released by Microsoft in February 2024, is an
open-source automation framework designed to
of organizations have
95% increased their budgets for
securing AI in 2024
help AI red-teaming activities. It uses datasets
consisting of prompts and prompt templates to
perform attacks, which can be either single-turn
(static prompt used in an isolated request) or

39
AI THREAT LANDSCAPE 2025
2024

multi-turn (dynamic prompt templates used in


Throughout 2024, the HiddenLayer Professional
simulated interactions). The scoring engine then
Services team has assessed AI deployments for
evaluates outputs from the target model to
multiple customers. Below are a few highlights f
calculate the risk score. Besides security flaws,
rom these engagements:
such as susceptibility to jailbreaking, data leakage,
or code execution, PyRIT can also be used to
System prompts aren't foolproof: We
identify broader AI risks, including bias and
consistently uncover leaked system
hallucinations. prompts similar to those of many
foundational models. Sensitive safety
Another LLM pen-testing tool was introduced by NVIDIA at instructions within these prompts risk
DEF CON 2024. Generative AI Red-Teaming and Assessment public exposure, and bypassing system
Kit (garak) provides a framework for testing language prompts is often achievable.
models against a range of attacks, from generating
In-depth defense is essential: No single
disallowed content to training data leakage to attacks on
security measure is foolproof. Combining
the underlying system. Garak attack probes generate a
model alignment, strong system prompts,
series of prompts sent to the target model. The list of
and input/output analysis helps mitigate
prompt attempts can be analyzed to build an alternative set adversarial AI attacks effectively.
of modified prompts. Multiple detection mechanisms then
process the final output of the model to return the overall Open-source security falls behind: Most
risk score. Thanks to its open-source nature and dynamic open-source AI security tools, including
community, garak is constantly updated with new prompts model scanners and prompt analyzers, are
and techniques. outdated and easily bypassed by skilled
attackers.

AI RED TEAMING BEST PRACTICES

Automated red teaming tools are valuable for


creating a quick baseline reading of a model's degree
of vulnerability as well as assessing the low-hanging
Updates to Existing
fruit of known AI vulnerabilities. Due to their Defensive Frameworks
automated nature these tools can also be used to run
periodic scans for regression testing or maintaining
compliance. However, it remains critical for human
red teamers to identify more nuanced vulnerabilities WHAT’S NEW IN MITRE
by assessing AI systems against novel attack
MITRE ATLAS is a knowledge base of adversarial
techniques.
tactics and techniques for AI-enabled systems. It's
designed to help businesses and institutions stay up
to date on the latest attacks and defenses against
KEY STAT attacks targeting AI. The ATLAS matrix is modeled
after the MITRE ATT&CK framework, which is
RED TEAMING OF AI MODELS well-known and used in the cybersecurity industry to
understand attack chains and adversary behaviors.
of IT teams conduct manual
red teaming for AI models in

35% production, while 24%


conduct automated red
teaming

40
AI THREAT LANDSCAPE 2025
2024

In June 2024, MITRE's Center for Threat-Informed Defense In 2023, OWASP released the Top 10 Machine Learning risks.
launched a new collaborative initiative called the Secure AI These controls help developers and security teams identify
research project to expand the MITRE ATLAS database and attack vectors, model threats and implement prevention
help develop strategies to mitigate risks to AI systems. The measures. These risks, paired with frameworks like ATLAS,
project aims to facilitate the rapid exchange of information clarify threats to machine learning and provide actionable
about the evolving AI threat landscape by sharing guidance.
anonymized data from AI-related incidents. Its diverse
participants include industry leaders from the technology,
communications, finance, and healthcare sectors.
In late 2024, OWASP released an updated version
of the OWASP Top 10 for LLM Applications 2025.
This list covers items such as prompt injection,
WHAT’S NEW IN OWASP
output handling, and excessive agency. This new
The Open Worldwide Application Security Project version reflects the rapidly evolving landscape of
(OWASP) is a non-profit organization and online LLM and Generative AI applications by
community that provides free guidance and reorganizing some previous vulnerabilities and
resources, such as articles, documentation, and tools adding new ones. For example, the Model Denial of
in the field of application security. The OWASP Top 10 Service and the Model Theft threats were
lists comprise the most critical security risks faced by combined into the new Unbounded Consumption
various web technologies, such as access control threat, and the Vector and Embedding
and cryptographic failures. Weaknesses threat was added, showing growing
concern over the risks associated with Retrieval
Augmented Generation (RAG) systems. A mapping
showing the relationships between the 2023 and
2025 versions of the threats is shown below.
2025 OWASP Top 10 LLMs

LLM01: Promt Injection

LLM02: Sensitive Information Disclosure OWASP also released two additional documents for
practitioners. The LLM Applications Cybersecurity
LLM03: Supply Chain and Governance Checklist provides a list of items
to consider when deploying an AI application. The
LLM and Generative AI Security Solutions
LLM04: Date and Model Poisoning
Landscape is a searchable collection of traditional
and emerging security controls for managing AI
LLM05: Improper Output Handling
application risks.

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM08: Vector and Embedding Weaknesses

LLM09: Misinformation

LLM10: Unbounded Consumption

41
AI THREAT LANDSCAPE 2025
2024

Adopting cryptographic signing for ML models, as proposed


WHAT’S NEW IN NIST by the OpenSSF Model Signing SIG, could establish trust in
the ML supply chain. Signing enables verifiable claims on
The NIST AI Risk Management Framework (AI RMF), ML artifacts and metadata, creating tamper-proof
initially released in January 2023, remains a vital attestations from hardware to models and datasets. Tools
resource for managing AI risks. It provides voluntary like Sigstore can facilitate these signatures while integrating
guidelines to help organizations integrate supply-chain metadata, such as SLSA predicates, to ensure
trustworthiness, safety, and accountability into AI transparency and accountability throughout the ML
systems. Its core framework outlines four essential development process. Coupled with analysis tools like
functions—govern, map, measure, and GUAC, signed artifacts provide the ability to trace, verify,
manage—offering actionable steps for mitigating and respond swiftly to potential threats, building safeguards
AI-related risks. to protect the integrity of ML ecosystems.

The OpenSSF Model Signing SIG recently released its first


implementation and invites participants to test and
contribute. Additionally, the OpenSSF AI/ML has a working
In July 2024, NIST expanded its framework with the group that addresses broader software security issues in AI.
Generative Artificial Intelligence Profile
(NIST-AI-600-1). Developed in response to an
October 2023 Executive Order, this profile focuses
AIBOM / MLBOM
on the unique risks of generative AI, offering
tailored guidance to help organizations align their Software bill of materials (or SBOM) is a security
risk management strategies with the challenges concept that dates back to the 2010s but gained
posed by these advanced systems. widespread popularity in the last few years, some of it
thanks to US government mandates.

Supporting tools like the AI RMF Playbook and the


Trustworthy and Responsible AI Resource Center further With software supply chains becoming increasingly
enhance its usability, providing practical resources and complex and supply chain attacks becoming increasingly
global alignment for organizations adopting the framework. devastating, it's imperative for organizations to have a high
level of visibility into the components of any third-party
New Security Initiatives products they rely on. SBOMs help define a list of a software
package's components, dependencies, and metadata,
including information regarding licensing, versions, and
vulnerabilities. Besides improving visibility, security, and risk
MODEL PROVENANCE &
management, SBOMs also enable the tracking of vulnerable
CRYPTOGRAPHIC SIGNING code and the determination of its impact on the software.

Cryptographic signing is a cornerstone of digital The initiative of AIBOM (also called MLBOM) aims to
security, ensuring the integrity and authenticity of translate the ideas behind SBOM into the AI ecosystem,
communications, software, and documents in enabling organizations to better understand their AI
industries like finance, healthcare, and software inventory and provide traceability and auditability. AIBOM
development. However, despite the critical role of includes information about models, training procedures,
machine learning (ML), no standardized method data pipelines, and performance and helps to implement
exists to cryptographically verify the origins or and govern AI responsibly. At the forefront of the decision
integrity of ML models and artifacts, leaving them on the AIBOM standards are NIST, OWASP, CycloneDX, and
vulnerable to tampering and trust issues. SPDX.

42
AI THREAT LANDSCAPE 2025
2024

The fast-paced developments in AI safety measures, as well


Coalition for Secure AI as the number of new security initiatives around AI, are the
result of growing collaboration between data scientists,
The Coalition for Secure AI (CoSAI), established in cybersecurity specialists, and lawmakers. People from
July 2024, is an open-source initiative under the different industries and backgrounds are coming together
OASIS global standards body to foster a to face the unprecedented risks brought on by the rapid
collaborative ecosystem to tackle the fragmented evolution of AI and come up with mitigations.
AI security landscape.

Last year’s prediction: “Data scientists will


CoSAI brings together industry leaders, academic partner with security practitioners to
secure their models”
institutions, and prominent experts to address critical
challenges in AI security through three dedicated
workstreams:
New Guidance and
Workstream 1: Ensuring the security of software
supply chains for AI systems
Legislation
Workstream 2: Equipping defenders to navigate In 2024, the United States and the European Union took
an evolving cybersecurity landscape significant steps to regulate artificial intelligence to address
the growing risk concerns. The EU enacted the Artificial
Workstream 3: Establishing governance Intelligence Act (AI Act) on August 1st, 2024. The EU AI Act
frameworks for AI security
became the world's first comprehensive AI law, classifying
AI applications by risk level—from prohibited to minimal
CoSAI's membership includes an impressive array of
risk—and imposing strict standards on high-risk AI tools,
participants, ranging from industry giants to innovative AI
such as those used in biometric identification and financial
startups, each working together to provide guidance and
decision-making.
tooling to practitioners to create Secure-by-Design AI
systems
In the U.S., AI regulatory activity increased substantially,
with nearly 700 AI-related bills introduced across various
states, a significant rise from under 200 in 2023. Despite this
Joint Cyber Defense Collaborative (JCDC)
surge, there is no unified federal approach, leading to a
The Joint Cyber Defense Collaborative (JCDC) is a patchwork of state-level regulations.
cybersecurity partnership between the U.S.
government and private sector organizations,
serving as the government's central hub for In October 2023, President Biden issued an
cross-sector collaboration and joint cyber defense Executive Order on the Safe, Secure, and
planning. In January 2025, the JCDC released its AI Trustworthy Development and Use of Artificial
Cybersecurity Collaboration Playbook as a guide for Intelligence, which directed NIST, OMB, and other
voluntary information sharing to address agencies to initiate activities to guide and regulate
vulnerabilities and cyber threats in AI Systems, AI in the United States. However, with the change of
aiming to foster collaboration among government, administrations that occurred on Jan 20, 2025, the
industry, and international partners. This playbook Biden AI executive order was revoked. This signifies
was developed following two in-person tabletop a shift of responsibility to the states to regulate
exercises simulating real-world AI cyberattacks and legislation as AI development continues. However,
involved over 150 individual participants from the actual implications remain to be seen as many
inter-agency partners and private sector actions from Biden’s order have already been
organizations, including HiddenLayer. completed by NIST, OMB, and other agencies to set

43
AI THREAT LANDSCAPE 2025
2024

These developments reflect not only shifts in policy that


policies and standards. In conjunction with have occurred rapidly in some cases in the United States
rescinding Biden's executive order, President Trump but also clear international intent, specifically from the EU,
signed a new directive establishing an Artificial to balance the rapid advancement of AI technologies with
Intelligence Action Plan within 180 days. This plan the need for security, ethical standards, and human rights
aims to develop policies that sustain and enhance protections. The rest of 2025 will undoubtedly witness more
America's global AI dominance to promote human changes in regulations and philosophical as well as policy
flourishing, economic competitiveness, and national conflicts between nations, political parties, and industry as
security. we all attempt to figure out the future promise of AI and
avoid the potential perils.

In October 2024, the Office of Management and


Budget (OMB) released the Advancing the
Responsible Acquisition of Artificial Intelligence in
Government memorandum. OMB noted that the
successful use of commercially provided AI requires
responsible procurement. This memo ensures that
when Federal agencies acquire AI, they appropriately
manage risks and performance, promote a
competitive marketplace, and implement structures
to govern and manage their business processes
related to acquiring AI. It is uncertain whether the
Trump administration will modify Federal AI
Procurement Guidelines already released by OMB.

Various states have introduced AI-related bills. Colorado


became the first state to enact a comprehensive law
relating to developing and deploying certain artificial
intelligence (AI) systems in Sept 2024—the Colorado AI Act
(CAIA), which goes into effect on February 1, 2026. The CAIA
adopts a risk-based approach to AI regulation that shares
substantial similarities with the EU AI Act. California
introduced the "Safe and Secure Innovation for Frontier
Artificial Intelligence Models Act", which aimed to mandate
safety tests for advanced AI models but was vetoed by
Governor Newsom in September 2024.

Additionally, in September 2024, the U.S., UK, and European


Commission signed the Council of Europe’s Framework
Convention on AI and human rights, democracy, and the
rule of law, marking the first international legally binding
agreement on AI.

44
PART 4

Predictions and
Recommendations
Predictions for 2025

It’s time to dust off the crystal ball once again! Over the past year, AI has truly been at the forefront of cyber security,
with increased scrutiny from attackers, defenders, developers, and academia. As various forms of generative AI drive
mass AI adoption, we find that the threats are not lagging far behind, with LLMs, RAGs, Agentic AI, integrations, and
plugins being a hot topic for researchers and miscreants alike.

Looking ahead, we expect the AI security landscape will


face even more sophisticated challenges in 2025:

01. Agentic AI as a Target 02. Erosion of Trust in Digital Content

Integrating agentic AI will blur the lines between adversarial As deepfake technologies become more accessible, audio,
AI and traditional cyberattacks, leading to a new wave of visual, and text-based digital content trust will face
targeted threats. Expect phishing and data leakage via near-total erosion. Expect to see advances in AI
agentic systems to be a hot topic. watermarking to help combat such attacks.

45
AI THREAT LANDSCAPE 2025
2024

03. Adversarial AI 06. Emergence of AIPC (AI-Powered Cyberattacks)

Organizations will integrate adversarial machine learning As hardware vendors capitalize on AI with advances in
(ML) into standard red team exercises, testing for AI bespoke chipsets and tooling to power AI technology,
vulnerabilities proactively before deployment. expect to see attacks targeting AI-capable endpoints
intensify, including:

04. AI-Specific Incident Response


Local model tampering. Hijacking models to
abuse predictions, bypass refusals and perform
For the first time, formal incident response guidelines harmful actions.
tailored to AI systems will be developed, providing a
structured approach to AI-related security breaches. Data poisoning.
Expect to see playbooks developed for AI risks.
Abuse of agentic systems. For example, prompt
injections in emails and documents to exploit
05. Advanced Threat Evolution local models.

Exploitation of vulnerabilities in 3rd party AI


Fraud, misinformation, and network attacks will escalate as
libraries and models
AI evolves across domains such as computer vision (CV),
audio, and natural language processing (NLP). Expect to
see attackers leveraging AI to increase both the speed and
scale of attack, as well as semi-autonomous offensive
models designed to aid in penetration testing and security
research.

Recommendations for the Security Practitioner

In the 2024 threat report, we made several recommendations for organizations to consider that were similar in
concept to existing security-related control practices but built specifically for AI, such as:

Discovery and Asset Management Model Robustness and Validation

Identifying and cataloging AI systems and related assets. Strengthening models to withstand adversarial attacks and
verifying their integrity.
Risk Assessment and Threat Modeling
Secure Development Practices
Evaluating potential vulnerabilities and attack vectors
specific to AI. Embedding security throughout the AI development
lifecycle.
Data Security and Privacy
Continuous Monitoring and Incident Response
Ensuring robust protection for sensitive datasets.
Establishing proactive detection and response mechanisms
for AI-related threats.

46
AI THREAT LANDSCAPE 2025
2024

These practices remain foundational as organizations navigate the continuously unfolding AI threat landscape.

Building on these recommendations, 2024 marked a turning point in the AI landscape. The rapid AI 'electrification' of
industries saw nearly every IT vendor integrate or expand AI capabilities, while service providers across sectors—from HR to
law firms and accountants—widely adopted AI to enhance offerings and optimize operations. This made 2024 the year that
AI-related third—and fourth-party risk issues became acutely apparent.

During the Security for AI Council meeting at Black Hat this year, the subject of AI third-party risk arose. Everyone in the
council acknowledged it was generally a struggle, with at least one member noting that a "requirement to notify before AI is
used/embedded into a solution” clause was added in all vendor contracts. The council members who had already been
asking vendors about their use of AI said those vendors didn’t have good answers. They “don't really know,” which is not only
surprising but also a noted disappointment. The group acknowledged traditional security vendors were only slightly better
than others, but overall, most vendors cannot respond adequately to AI risk questions. The council then collaborated to
create a detailed set of AI 3rd party risk questions. We recommend you consider adding these key questions to your existing
vendor evaluation processes going forward.

? ?
Do you scan your models for malicious
Where did your model come from? code? How do you determine if the model
is poisoned?

What AI incident response policies does


?
Do you detect, alert, and respond to
? mitigate risks that are identified in the
OWASP LLM Top 10?
your organization have in place in the event
of security incidents that impact the safety,
privacy, or security of individuals or the
function of the model?

?
What is your threat model for AI-related
attacks? Are your threat model and

?
mitigations mapped or aligned to the Do you validate the integrity of the data
MITRE Atlas? presented by your AI system and/or model?

Remember that the security landscape—and AI technology—is dynamic and rapidly changing. It's crucial to stay informed
about emerging threats and best practices. Regularly update and refine your AI-specific security program to address new
challenges and vulnerabilities.

And a note of caution. In many cases, responsible and ethical AI frameworks fall short of ensuring models are secure before
they go into production and after an AI system is in use. They focus on things such as biases, appropriate use, and privacy.
While these are also required, don’t confuse these practices for security.

47
AI THREAT LANDSCAPE 2025
2024

HiddenLayer
Resources
PRODUCTS AND SERVICES
HiddenLayer AISec Platform
is a GenAI Protection Suite that is purpose-built to ensure the integrity of
your AI models throughout the MLOps pipeline. The Platform provides
detection and response for GenAI and traditional AI models to detect prompt
injections, adversarial AI attacks, and digital supply chain vulnerabilities.

Learn More

HiddenLayer AI Detection & Response (AIDR)


is the first of its kind cybersecurity solution that monitors, detects, &
responds to Adversarial Artificial Intelligence attacks targeted at GenAI &
traditional ML models.

Learn More

HiddenLayer Model Scanner


analyzes models to identify hidden cybersecurity risks & threats such as
malware, vulnerabilities & integrity issues. Its advanced scanning engine is
built to analyze your artificial intelligence models, meticulously inspecting
each layer & component to detect possible signs of malicious activity,
including malware, tampering & backdoors.

Learn More

HiddenLayer Automated Red Teaming for AI


brings the efficiency, scalability, and precision needed to identify
vulnerabilities in AI systems before attackers exploit them.

Learn More

HiddenLayer Professional Services


is a multi-faceted services engagement that utilizes our deep domain
expertise in cybersecurity, artificial intelligence, and threat research.

Learn More

49
AI THREAT LANDSCAPE 2025
2024

HiddenLayer
Resources
HIDDENLAYER RESEARCH
ShadowLogic
A novel method for creating backdoors in neural network models.

Indirect Prompt Injection of Claude Computer Use


Discover the security risks of Anthropic's Claude Computer Use, including
indirect prompt injection attacks.

ShadowGenes: Uncovering Model Genealogy


Model genealogy is the practice of tracking machine learning models'
lineage, origins, modifi cations, and training processes.

Attack on AWS Bedrock’s ‘Titan’


Discover how to manipulate digital watermarks generated by Amazon Web
Services (AWS) Bedrock Titan Image Generator.

New Gemini for Workspace Vulnerability


Google Gemini for Workspace remains vulnerable to many forms of indirect
prompt injections.

R-bitrary Code Execution: Vulnerability in R’s Deserialization


Learn about a zero-day deserialization vulnerability in the popular
programming language R, widely used within government and medical
research, that could result in a supply chain attack.

Boosting Security for AI: Unveiling KROP


Many LLMs rely on prompt fi lters and alignment techniques to safeguard
their integrity in AI. However, these measures are not foolproof.

A Guide to AI Red Teaming


AI red teaming is an important strategy for any organization that leverages
artifi cial intelligence.

The Beginners Guide to LLMs and Generative AI


Learn about the basics of GenAI and gain a foundational understanding of
the world of LLMs.

50
AI THREAT LANDSCAPE 2025
2024

About HiddenLayer
HiddenLayer
a Gartner-recognized Cool Vendor for AI Security, is the leading provider of
Security for AI. Its security platform helps enterprises safeguard the machine
learning models behind their most important products. HiddenLayer is the
only company to offer turnkey security for AI that does not add unnecessary
complexity to models and does not require access to raw data and
algorithms. Founded by a team with deep roots in security and ML,
HiddenLayer aims to protect enterprise’s AI solutions from inference, bypass,
extraction attacks, and model theft. The company is backed by a group of
strategic investors, including M12, Microsoft’s Venture Fund, Moore Strategic
Ventures, Booz Allen Ventures, IBM Ventures, and Capital One Ventures.

LEARN MORE: FOLLOW US:

www.hiddenlayer.com Research Twitter LinkedIn

REQUEST A DEMO:

https://hiddenlayer.com/book-a-demo/

AUTHORS/CONTRIBUTORS

A special thank you to the teams that made this report come to life:

Marta Janus, Principal Security Researcher


Eoin Wickens, Technical Research Director
Tom Bonner, SVP, Research
Malcolm Harkins, Chief Security & Trust Officer
Jason Martin, Director, Adversarial Research
Travis Smith, VP of ML Threat Operations
Ryan Tracey, Principal Security Researcher
Jim Simpson, Threat Operations Specialist
Samantha Pearcy, Manager of Content Strategy
Kristen Tarlecki, VP of Marketing
Arman Abdulhayoglu, Director of Product Marketing
Kieran Evans, Principal Security Researcher
Kevin Finnigin, Principal Security Researcher
Marcus Kan, AI Security Researcher
Ravi Balakrishnan, Principal Security Researcher
Kenneth Yeung, AI Threat Researcher
Kasimir Schulz, Director, Security Research
Megan David, AI Researcher

51

Common questions

Powered by AI

Validating the integrity of AI models is crucial because it prevents the deployment of compromised models that could yield unreliable and insecure outputs. This can be achieved through comprehensive security audits, regular scanning for vulnerabilities, and implementing robust model verifiability frameworks. Organizations should also invest in continuous monitoring, detection and response systems such as HiddenLayer’s suite, ensuring immediate alerts and actions against any detected compromise, thus maintaining the reliability of their AI systems .

Serialization vulnerabilities in machine learning pipelines can be exploited through crafted malicious files such as pickle, HDF5, YAML, or XML. These vulnerabilities allow adversaries to execute arbitrary code or access sensitive data by embedding harmful commands within serialized data files. Consequences include unauthorized system access, data exfiltration, and execution of malicious payloads, compromising the integrity and security of ML models and systems. This threat extends to vulnerabilities in platforms like ClearlML and MLFlow, further underscoring the need for rigorous security practices .

The primary objectives of adversarial attacks against machine learning systems are model deception, model corruption, and model and data exfiltration. Model deception involves manipulating inputs to exploit model vulnerabilities, leading to incorrect predictions. Model corruption is achieved by influencing the training process, potentially through data poisoning or backdoor attacks, compromising the model's behavior while retaining outward legitimacy. Model and data exfiltration involve theft of the model's functionality or sensitive data, posing risks to intellectual property and data privacy .

Popular MLOps platforms like ClearML and MLFlow have been found vulnerable to multiple security issues, such as path traversal, improper authentication, insecure credential storage, Cross-Site Request Forgery, Cross-Site Scripting, and unsafe deserialization. These vulnerabilities create a full attack chain enabling adversaries to execute arbitrary code through malicious files and infiltrate systems. They impact AI systems by making them susceptible to unauthorized access and manipulation of data and models, endangering both data integrity and security .

AI red teaming is crucial for testing AI systems against adversarial attacks to ensure their resilience. Over the past year, its significance has been underscored by regulatory efforts such as the White House Executive Order on AI, which alongside frameworks like the NIST AI Risk Management and the EU AI Act, encourages organizations to thoroughly evaluate AI vulnerabilities before deployment. AI red teaming's evolution has been driven by increasing adversarial threats and the need for rigorous validation processes, thus becoming a vital strategy for securing AI systems .

To address the risks associated with model backdoors, the AI community should invest in creating comprehensive defenses, detection methods, and verification techniques. This involves implementing new strategies for identifying backdoors, even within graph-based architectures, and developing robust security programs that encompass vulnerabilities across AI use cases. Furthermore, establishing industry-wide best practices and frameworks aimed at enhancing model verification and assurance of reliability is essential. Agile modifications in response to emerging threats are also necessary, ensuring AI systems remain trustworthy and secure .

Model theft, also known as model extraction, occurs when an adversary replicates a machine-learning model without authorization by querying the target model and analyzing its outputs. This allows the adversary to reverse-engineer the model's functionality and potentially steal sensitive training data or intellectual property. The repercussions for organizations include loss of competitive advantage, compromised proprietary knowledge, and potential exposure of private data, negatively impacting the organization's market position and trustworthiness .

Model backdoors pose risks to AI systems by embedding vulnerabilities within the model's architecture that activate when specific trigger inputs are received. These backdoors do not require traditional code execution exploits and can be format-agnostic, making them hard to detect and capable of affecting various model architectures and domains. They can undermine the reliability of AI systems by making it impossible to trust their outputs if a backdoor is present, potentially affecting critical infrastructure and decision-making processes .

Adversarial attack advancements challenge AI security by being increasingly sophisticated and domain-adaptive. These attacks now exploit semantic features and natural variations rather than minor perturbations, allowing them to maintain natural appearances while triggering misclassifications across diverse environments. This evolution demonstrates the attacks' capability to bypass defenses in systems like diffusion models, malware detection, and automotive systems. The domain-spanning efficacy of these attacks necessitates comprehensive cross-domain defense strategies, especially as AI is increasingly deployed in security-sensitive applications .

Supply chain security is fundamental in AI because it safeguards against the introduction of malicious components or models within the AI development and deployment pipeline. Trends in this area include increasing attacks on ML artifacts, growing interest from cybercriminals, and the development of more sophisticated supply chain attack vectors. These trends necessitate robust security strategies that address vulnerabilities in the AI supply chain, ensuring that all components are secure from development through deployment .

You might also like