AI Security A Systematic Mapping Study
AI Security A Systematic Mapping Study
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
ABSTRACT With the pervasive integration of artificial intelligence (AI) in various facets of modern
technology, the importance of AI security has been thrust into the spotlight. The field is rapidly evolving,
with new challenges and solutions emerging at a swift pace. However, the breadth and depth of AI security
research have not been comprehensively mapped in recent times, presenting a crucial need for an extensive
review and synthesis of existing literature. Given the increasing reliance on AI in critical domains such
as healthcare, finance, and national security, ensuring the resilience and trustworthiness of these systems
is imperative. This survey fulfills the pressing need for a structured and comprehensive overview of the
current research landscape, enabling researchers to address emerging threats and vulnerabilities effectively.
This paper presents a systematic mapping study (SMS), aimed at identifying and classifying the prevailing
research topics, tools, and frameworks in the field of AI security. A total of 123 studies were meticulously
selected and analyzed, leading to the identification of key metrics, tools, standards, and research themes
that are currently shaping the landscape of AI security research. This effort not only aids in distilling the
collective wisdom of the research community but also sets a firm foundation for future work in this critical
area. The findings from this SMS will serve as an invaluable guide for researchers and practitioners alike,
enabling them to navigate the complexities of AI security and fostering the development of innovative,
robust security solutions. This study also highlights significant gaps in the current literature, thereby
outlining potential directions for new research initiatives.
INDEX TERMS AI Security, Generative AI Security, AI Risk Assessment, Detection and Defense, Ethical
and Societal
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
Moreover, the majority of AI systems are developed with tial privacy, homomorphic encryption, and secure multi-party
a focus on functionality over security, often relegating the computation.
latter to a post-development afterthought [9]. This reactive Although the study provides an extensive taxonomy of
approach to security in AI development processes poses AI threats, it is largely descriptive, offering a high-level
significant risks and necessitates a shift towards proactive overview rather than an in-depth analysis of attack behav-
security integration. The SMS aims to characterize existing iors and defense effectiveness. Additionally, while detection
security methodologies, tools, and standards, offering a com- techniques are explored, the comparative efficacy of these
prehensive survey that provides a structured guide to studies methods remains unassessed, making it difficult to deter-
in the AI security field. mine which strategies are most effective under real-world
The rest of this systematic mapping paper is organized conditions. Another limitation is the lack of integration with
as follows. In Section II, we present the related work and industry security standards.
compare our review with other reviews. Section III describes Gnitko et al. [8] work takes a regulatory and compliance-
the systematic mapping process employed in this study. In focused approach, offering a structured analysis of AI se-
Section V, we discuss the various AI security research topics curity standards across different jurisdictions. The study
identified in the mapping process. Section VI highlights the highlights how AI security frameworks vary between gov-
research gaps identified in this domain. Section VIII presents ernment policies, corporate security strategies, and interna-
the results and discussion of our findings. Finally, in Section tional regulations, underscoring the fragmented nature of
IX, we conclude by summarizing our key findings and out- AI security governance. A key contribution of this study
lining future research directions. is the development of a taxonomy of AI security threats,
categorizing risks based on their occurrence across the AI
lifecycle (development, deployment, and operational phases).
II. RELATED WORK
Additionally, the paper evaluates six major AI security frame-
The rapid evolution of artificial intelligence (AI) has led
works, analyzing their applicability across machine learning
to an influx of research on AI security, including reviews
models, deep learning applications, and AI-driven decision
and surveys aimed at categorizing threats, analyzing defen-
systems. By identifying gaps in existing security regulations,
sive strategies, and exploring ethical implications. However,
the study calls for a unified global standard to ensure consis-
many of these works focus narrowly on specific topics,
tent security practices across AI deployments.
leaving gaps in comprehensively mapping the diverse and
Despite its strengths in regulatory analysis, the study
interconnected aspects of AI security. This section compares
takes a fragmented approach, focusing on policy implications
notable existing surveys with our systematic mapping study
rather than practical security solutions. While the importance
(SMS) to establish the distinct contributions of this work. Ta-
of security compliance is well-articulated, the work does
ble 1 provides a detailed comparison of the works discussed.
not explore how AI security standards can be effectively
Oseni, et al. [3] propose a framework for adversarial implemented in model development pipelines. Another short-
attack detection and mitigation, focusing on developing ro- coming is the lack of empirical case studies or real-world
bust machine learning models that can withstand adversarial evaluations, which would provide practical insights into how
manipulation. The study also discusses advanced AI security AI security regulations impact AI deployments in different
techniques, including federated learning and reinforcement industries.
learning, to enhance model resilience. The paper provides a Habbal et al. [11] introduce the AI Trust, Risk, and
holistic cybersecurity perspective, covering both theoretical Security Management (AI TRiSM) framework, which pro-
and practical aspects of AI security. vides a structured approach to managing AI security risks.
While the study provides valuable insights into adversarial The study explores how AI TRiSM can be integrated into
attack strategies and countermeasures, it lacks a broader dis- AI governance, ensuring that AI-driven applications remain
cussion on AI privacy, risk management, and regulatory com- trustworthy, explainable, and resilient. The framework in-
pliance. The primary focus remains on adversarial threats, corporates risk assessment methodologies, security compli-
overlooking other significant security challenges such as data ance strategies, and AI reliability metrics. The study also
privacy breaches, model inversion risks, and AI governance highlights the growing concern of AI accountability, par-
frameworks. ticularly in high-risk domains such as finance, healthcare,
M.Rahman et al. [10] present a systematic survey on AI and law enforcement. By emphasizing transparency and risk-
security risks, offering a structured categorization of AI- aware AI decision-making, AI TRiSM provides a governance
specific attack vectors, including adversarial attacks, model model that aligns with ethical AI principles and regulatory
inversion attacks, data poisoning, and membership inference compliance.
attacks. The study highlights the impact of these attacks on The AI TRiSM framework is conceptually well-defined,
AI integrity and privacy, detailing how adversaries exploit its technical implementation remains ambiguous. The study
model weaknesses at both the training and inference stages. does not detail how AI TRiSM can be practically applied to
Moreover, the paper evaluates existing attack detection and AI security architectures. Furthermore, while the framework
mitigation techniques, such as adversarial training, differen- highlights AI risks, it does not account for real-world ad-
2
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
versarial threats, such as adaptive adversarial AI techniques. III. SYSTEMATIC MAPPING PROCESS
Another limitation is the absence of empirical validation, as The proposed SMS is a pivotal study for identifying and
the proposed framework lacks experiments or case studies categorizing the breadth of research within the domain of AI
that demonstrate its effectiveness in real-world AI security security. It is meticulously designed to aggregate, synthesize,
scenarios. and analyze the existing body of knowledge, thereby uncov-
Xia et al. [12] present a mapping study on AI risk assess- ering trends, and gaps in the literature. The SMS (Figure 1)
ment, offering a structured classification of risk evaluation is conducted in accordance with guidelines established by
methodologies tailored for AI-driven systems. The study Petersen et al. [13], ensuring a systematic and reproducible
categorizes technical vulnerabilities, compliance risks, and approach to the review process.
ethical concerns, providing a taxonomy of AI security threats 1) Phase 1: Research questions and search strategy
that can aid in designing risk-aware AI architectures.The We initiate the review process by outlining the scope and
study highlights the necessity of a connected AI risk as- research questions that will steer our investigation, followed
sessment model, ensuring that AI security measures remain by establishing a search strategy.
consistent, adaptive, and aligned with regulatory mandates. Research questions specification - Our SMS is structured
Additionally, the work explores threat simulation techniques, around specific research questions, which guide the search
which enable organizations to evaluate AI models against ad- for relevant literature. Four distinct research questions, along
versarial manipulations, data poisoning, and model inversion with their rationale, have been proposed and are presented
attacks. in Table 4. These questions were formulated to direct the
Despite its structured approach, the study remains largely mapping study towards yielding insights that are both com-
theoretical, lacking practical validation through real-world prehensive and pertinent to the field of AI security.
security testing. Furthermore, while the study categorizes AI Developing search strategy - The search strategy involves
risks effectively, it does not provide an adaptive mechanism identifying relevant literature using well-defined keywords
for updating AI risk models in response to evolving adver- and query strings across multiple databases
sarial threats. Another limitation is the absence of detailed 2) Phase 2: Extracting keywords and collecting relevant
strategies for risk management in decentralized AI architec- studies
tures, such as federated learning and edge AI systems. In this phase, relevant keywords are identified from the
research questions to facilitate an effective search strategy,
Key Contributions of This Paper
and studies are subsequently collected based on these defined
This study stands apart by adopting a systematic mapping keywords.
approach to chart the AI security research landscape holisti- Keywords and Query Strings - We begin with an ini-
cally. The key differences include: tial list of keywords, including AI/ML security, AI/ML risk
i. Comprehensive Scope: Unlike prior works that focus on assessment, AI/ML vulnerabilities, adversarial attacks, and
specific dimensions of AI security (e.g., adversarial attacks or large language model security (Table 5). This list is then
risk management), this study categorizes research across di- expanded by incorporating additional keywords found in the
verse topics, including tools, frameworks, standards, threats, selected references.
vulnerabilities, and lifecycle integration. Databases - The following electronic databases are
ii. Focus on Emerging Trends: This SMS identifies recent searched to ensure comprehensive coverage:
advancements, such as generative AI security and large lan- • IEEE Xplore
guage model (LLM) vulnerabilities, areas that remain under- • ACM Digital Library
explored in previous reviews. • SpringerLink
iii. Practical Relevance: In addition to synthesizing academic • ScienceDirect
research, this work emphasizes the practical applicability of • Google Scholar
findings for both researchers and practitioners, connecting • Scopus
methodologies to real-world AI development and deploy- • Web of Science
ment challenges.
3) Phase 3: Selection and classification
iv. Identification of Gaps: While existing reviews highlight
To ensure that only relevant and high-quality studies are in-
challenges, this study systematically identifies and catego-
cluded in the mapping study, we establish clear inclusion and
rizes research gaps, offering a roadmap for future exploration
exclusion criteria. The inclusion and exclusion criteria ensure
in areas such as ethical AI security and multimodal AI
that only relevant and high-quality studies are included in the
systems.
mapping study.
By mapping the breadth and depth of AI security research, Inclusion Criteria:
this work provides an integrative perspective that comple-
• Studies published in peer-reviewed journals and confer-
ments the more narrowly focused existing literature, bridging
ences (Table 2 and Table 3)
theoretical insights with practical implications.
• Studies focusing on AI security methods and applica-
tions.
3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
• Latest Industrials reports and findings. responsible use of AI in offensive security. AI Security Tools
• Studies published in English. and Frameworks includes research on open-source security
Exclusion Criteria: tools, frameworks for securing AI applications, and standard-
ization efforts in AI security.
• Non-English publications.
4) Phase 4: Quality assessment:
• Studies without sufficient technical details, empirical re-
After defining the research questions, extracting keywords,
sults, or clear theoretical contributions (e.g studies with-
and categorizing the selected studies into relevant themes,
out rigorous analysis or formal verification insights).
we proceeded to analyze the quality of these studies. This
Classify Studies by Categories: quality assessment was conducted based on various criteria,
Once the relevant papers are selected, we classify them including:
into key thematic categories within AI Security based on
• Credibility of the source: Evaluating the trustworthiness
keywords and core topics to ensure each paper is mapped
of the publication and the reputation of the authors.
to the most relevant category (see Figure 2 and Table 5).
• Relevance to the research questions: Ensuring the stud-
The first category, Threat Models in AI, includes research on
ies directly address the specific questions and themes
taxonomies of AI threats, adversarial models, attack surfaces
identified.
of machine learning models, and AI system vulnerabilities
• Peer review status: Checking whether the study has
across different attack vectors. AI Risk Assessment focuses
undergone peer review, a process where experts crit-
on risk quantification methodologies, case studies on AI-
ically evaluate its methodology, findings, and overall
driven cyber risks, AI reliability evaluations, and governance
contribution to the research area.
models addressing security concerns. Studies in Detection
• Citation count: Reviewing the number of times the study
and Defense Techniques propose or evaluate adversarial at-
has been cited by other researchers as a measure of its
tack detection methods, robustness enhancement strategies,
impact and recognition within the academic community.
AI model verification, and defenses against adversarial and
• Depth of analysis: Evaluating the comprehensiveness
poisoning attacks. Generative AI Security covers security
and depth of the study’s analysis and discussion.
challenges in LLMs and diffusion models, adversarial attacks
on generative models, backdoor vulnerabilities, and model
manipulation techniques. The Ethical and Societal Implica- IV. RESEARCH MOTIVATIONS AND QUESTIONS
tions of AI Security category addresses AI bias and fairness, The explosive growth of the applications over the last few
ethical concerns surrounding adversarial AI attacks, and the years has been paralleled by a corresponding intensification
4
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
in the complexity and sophistication of security threats, par- erations to proactively withstand attacks and maintain the
ticularly in the realm of AI. With AI systems becoming ubiq- fundamental principles of security: confidentiality, integrity,
uitous, the discipline of Secure AI Engineering has gained and availability.
prominence, advocating the integration of security consid- Recent advancements in AI have brought forth an array
5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
TABLE 3. Number of Articles Obtained from Conferences ors to augment the collective understanding of AI security
measures and practices, providing a robust foundation for the
Conferences Number of Articles
ACM SIGSAC Conference on Computer and 3 development of secure AI applications.
Communications Security In alignment with these motivations, the study poses the
IEEE/CVF International Conference on Com- 2 following research questions (RQs):
puter Vision
NeurIPS 2 RQ1) What is the publication frequency of AI security
AAAI/ACM Conference on AI, Ethics, and So- 3 research?
ciety This question helps determine the level of attention and
ACL 2024 2
Empirical Methods in Natural Language Pro- 1 priority given to this topic by the research community over
cessing(EMNLP) time. This analysis covers the period from 2014 to 2025,
International Conference on Learning Repre- 2 highlighting trends and significant milestones in AI security
sentations (ICLR)
International Conference on Machine Learning 1 research. The number of publications analyzed for each year
2024 is shown in Fig 3., which includes until the end of January
Annual Reliability and Maintainability Sympo- 1 2025.
sium (RAMS)
IEEE/ACM International Conference on AI En- 1 To facilitate a deeper examination, we divided the study
gineering period into two distinct time periods:
NDSS 2024 1 2014-2018: Initial Growth Phase- In the early years of this
New Security Paradigms Workshop (NSPW) 1
ACM Conference on Fairness, Accountability, 1 study, AI security started to gain traction as a vital field of
and Transparency research. The rise of machine learning and AI technologies
SAFECOMP 2022 1 brought about new security challenges. During this period,
HSCC 2021 1
AAMAS 1 researchers focused on foundational aspects of AI security,
IJCAI 1 such as securing machine learning models, detecting ad-
AISafety 2022 1 versarial attacks, and developing initial frameworks for AI
International Conference on Agents and Artifi- 1
cial Intelligence system security.
39th Annual Computer Security Applications 1 2019-2024: Rapid Expansion and Diversification- From 2019
Conference (ACSAC) onwards, the publication rate of AI security research in-
2021 IEEE International Conference on Cyber 1
Security and Resilience (CSR) creased significantly. There is significant growth in the
2021 IEEE International Conference on Mobile 1 number of studies, driven by several factors, including ad-
Networks and Wireless Communications vancements in AI technologies that required robust security
2020 12th International Conference on Cyber 1
Conflict (CyCon) measures, high-profile cyberattacks leveraging AI, increased
28th International Conference on Evaluation 1 funding for AI security research, etc.
and Assessment in Software Engineering This research question not only focuses on identifying
IEEE Symposium on Security and Privacy (SP) 1
2019 IEEE/ACM 41st International Conference 1 the number of publications each year but also delves into
on Software Engineering (SEIP) understanding the underlying factors influencing publication
2017 ACM on Asia Conference on Computer 1 trends. For instance, the analysis considers how emerging
and Communications Security (Asia CCS)
threats in AI security, such as adversarial attacks, model
poisoning, and backdoor vulnerabilities, have contributed to
spikes in research interest over the years. In recent years, a
of technical innovations, yet the incorporation of security significant shift in focus is observed towards securing Gen-
within these systems remains a multifaceted challenge. The erative AI (Gen AI) systems, driven by the rapid adoption of
majority of AI systems are engineered with a primary focus models like ChatGPT, DALL-E, and other large language and
on functionality, relegating security to a secondary concern generative models. This shift reflects the increasing concerns
often addressed post-development, a strategy that is increas- over vulnerabilities specific to Gen AI, such as adversarial
ingly untenable in today’s security-conscious environment prompts, data leakage, and misuse in generating deepfakes
[14] [15]. Moreover, conventional software development pro- or malicious content. By examining these evolving trends, the
cesses are often inadequate for addressing the unique security study highlights how the research community has responded
requirements of AI systems, leading to vulnerabilities that to real-world challenges and technological advancements,
can be exploited through a myriad of attack vectors, from particularly the rise of Gen AI, shaping the current landscape
adversarial manipulation to systemic flaws [16] [17]. of AI security research.
This literature review is motivated by the pressing need RQ2) What are the emerging sub-fields within AI secu-
to systematically map the existing body of knowledge in AI rity?
security. By evaluating how the current literature addresses The field of AI security is diverse and rapidly growing, with
the security of AI systems, this study aims to reveal prevail- new sub-fields of its own arising to meet the demands of
ing research trends, identify effective security measures, and integrating AI technology into modern systems. AI security
discern gaps in the literature where future research efforts has branched into distinct areas that address the specific risks,
could be most beneficial. Furthermore, this study endeav- defenses for the AI systems. This question emphasizes vari-
6
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
mechanisms and continuous monitoring is essential across are designed to defend machine learning models against
industries to safeguard AI systems and ensure their integrity, malicious inputs aimed at manipulating or misleading them.
as highlighted by industry standards and security protocols. Among the most common types of attacks are evasion, where
Mitigation strategies focus on enhancing the resilience of small, often imperceptible changes to input data cause a
AI systems by proactively identifying weaknesses through model to make incorrect predictions, and data poisoning,
rigorous testing, incorporating human oversight, and adopt- where attackers corrupt the training data, leading to degraded
ing ethical guidelines to prevent biases and vulnerabilities. model performance. These frameworks focus on enhancing
Regular updates to risk management protocols and the inte- model resilience, enabling them to function reliably even
gration of advanced encryption and access controls further under adversarial conditions. This involves robust training
help in reducing the impact of potential security breaches. techniques, model verification processes, and continuous
Moreover, fostering collaboration across industries allows for monitoring, all aimed at maintaining the integrity and sta-
shared insights and resources to strengthen defenses against bility of AI systems.
evolving AI security threats. In addition to these defensive measures, AI red teaming
In particular, to address these security risks, industries tools are designed to simulate variety of adversarial behav-
must adopt comprehensive AI security frameworks. These iors, including evasion attacks, data poisoning, backdoor at-
include: tacks and stress-test models, allowing organizations to detect
Regular Audits and Monitoring: Continuous auditing and vulnerabilities before they are exploited. By leveraging red
real-time monitoring of AI systems are crucial to detecting teaming tools, organizations can assess how well their AI
and mitigating potential security breaches. Regular security systems respond to threats, evaluate the model’s robustness
testing helps ensure that vulnerabilities are identified early to secure their AI systems during both the development and
and addressed proactively, which is emphasized by multi- deployment phases.
ple AI risk management frameworks like Microsoft’s and On the other side, AI blue teaming tools focus on con-
Google’s security protocols [26] [27]. tinuously safeguarding AI systems through real-time threat
Adversarial Testing: Implementing adversarial testing, monitoring, anomaly detection, and defensive mechanisms.
such as through adversarial robustness frameworks, allows These tools are essential for building a robust defense pos-
organizations to simulate attacks that AI models might face. ture, ensuring that AI models are able to detect and mitigate
This process helps identify weaknesses in AI systems before attacks as they occur.
they can be exploited by malicious actors [28]. To provide a more detailed examination of the tools and
Data Encryption and Access Controls: Ensuring data se- frameworks developed to address these challenges, refer to
curity is critical in AI applications. Data must be encrypted Section V (F). Table 7 and Table 8 outline the key frame-
both in transit and at rest, with strict access controls that works and tools, offering a comprehensive view of how both
limit access to authorized personnel only. This principle is classical machine learning (ML) and generative AI (GenAI)
often emphasized in industry standards like the NIST AI Risk systems are safeguarded across various scenarios.
Management Framework [29] and the OWASP AI Security
guidelines [30].
V. AI SECURITY
Data sanitization: it ensures that the input data is clean, ac-
curate, and free from malicious manipulations that could oth- A. THREAT MODELS IN AI
erwise compromise the model’s integrity and performance. Threat models in AI play a crucial role in identifying and mit-
Robust training: robust training approaches help mitigate igating potential risks associated with AI systems [33]–[43],
attacks by attempting to develop a training algorithm which [118]–[123]. These models help researchers and practitioners
limits the effectiveness of malicious points by design rather understand the various ways AI systems can be attacked and
than removing these points altogether [28]. how to defend against these threats. Here are some of the key
Bias and Fairness Assessments: Regularly assessing AI threat models in AI, based on recent research:
models for biases and ensuring that they operate fairly and Poisoning Attacks: These occur during the training phase
transparently. of machine learning (ML) models. An attacker introduces
Collaborative Defense Mechanisms: Collaboration across malicious data into the training set, causing the model to learn
industries, sharing knowledge and resources to combat AI incorrect patterns. This can degrade the model’s performance
security threats, strengthens defenses. Initiatives like Mi- or manipulate its behavior for specific inputs. Defenses
crosoft’s Counterfit [31] tool and collaborative frameworks against poisoning attacks include data sanitization and robust
from MITRE emphasize the importance of collective efforts training methods Backdoor Attacks: Similar to poisoning,
to enhance AI security. backdoor attacks involve embedding hidden triggers in the
RQ5) What tools and frameworks have been developed training data that cause the model to behave maliciously
for AI security? when these triggers are activated. This type of attack is
The development of AI security tools and frameworks is particularly challenging to detect and defend against because
crucial for safeguarding AI systems from a wide range of ad- the model appears to perform normally on typical inputs
versarial attacks and vulnerabilities. AI security frameworks [124].
8
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
TABLE 5. Primary Studies Topics and Keywords Model Stealing: In model stealing attacks, an adversary
attempts to replicate the functionality of a target model by
AI Security References Keywords
Research Topics
querying it and using the responses to train their own model.
Threat models in AI [32]–[44] Adversarial This can lead to intellectual property theft and potential mis-
attacks, Backdoor use of the model. Defenses include limiting the information
attacks,Evasion
attacks, Model
returned by the model and monitoring for unusual querying
poisoning, etc. patterns [124].
Risk Assessment in [18]–[21], [23], Vulnerability Membership Inference and Attribute Inference: These at-
AI systems [45]–[51] analysis, AI
reliability, Security tacks aim to determine whether specific data points were
assessment, Risk part of the training set (membership inference) or to infer
Mitigation etc. sensitive attributes of the data used to train the model. These
Detection and [52]–[67] Adversarial
defense techniques defense, Mitigation
attacks exploit the way models generalize from training data,
techniques, Robust posing privacy risks. Defenses involve techniques like differ-
AI, etc. ential privacy and regularization methods [7].
Genarative AI Secu- [68]–[99] Prompt injection,
rity Jailbreaking, Autonomous Threat Hunting: This involves using AI to
Hallucinations, proactively search for vulnerabilities and threats in systems,
LLM robustness, etc. leveraging AI’s ability to analyze vast amounts of data and
Ethical and Societal [100]–[106] Fairness, AI ethics,
Implication Accountability, identify patterns. Autonomous threat hunting aims to en-
Transparency, etc. hance traditional cybersecurity measures and is pivotal for
AI security frame- [26], [29], [31], Security frameworks, dealing with sophisticated cyber threats [131].
works and tools [107]–[117] Testing tools, Secu-
rity audit, etc. Threats from Machine-Generated Text: As natural lan-
guage generation (NLG) models become more advanced,
they can be exploited to generate harmful content, spread
misinformation, or deceive users. Threat models for NLG fo-
Evasion Attacks: These occur at inference time, where
cus on understanding the potential adversaries, their capabil-
adversaries craft inputs (adversarial examples) that cause the
ities, and objectives, and developing detection and mitigation
model to make incorrect predictions. Techniques to gener-
strategies to prevent misuse [132].
ate adversarial examples range from simple gradient-based
These threat models are essential for developing robust AI
methods to more sophisticated approaches. Defenses include
systems that can withstand adversarial actions
adversarial training and input preprocessing [124].
Quantum-based adversarial attacks: Quantum adversarial
B. AI RISK ASSESSMENT: DETAILED ANALYSIS
machine learning (QAML) introduces a new frontier in AI
security, where quantum computing can both amplify the As the deployment of AI systems becomes increasingly
robustness and vulnerabilities of machine learning models widespread across various industries, assessing the associ-
[125] [126] [127] [128] [129]. As quantum computing ad- ated risks has become a crucial aspect of ensuring the secure
vances, traditional security strategies in machine learning and responsible implementation of these technologies. AI
need to evolve to address quantum-enabled threats. Quan- systems, while offering significant benefits, also introduce
tum adversaries may employ quantum algorithms to by- a range of potential hazards that can have serious conse-
pass current defenses, leveraging quantum properties such quences. These risks are not only technical but can extend
as superposition and entanglement to manipulate or degrade to ethical, operational, and strategic areas. The assessment
the performance of machine learning models. These quan- of AI risks aims to identify, analyze, and mitigate the po-
tum adversarial threats range from simple noise insertion tential dangers posed by AI technologies, ensuring their safe
at the lowest level to more sophisticated attacks, such as integration into critical sectors such as healthcare, finance,
quantum cryptanalysis and the development of advanced manufacturing, and transportation. Without a comprehensive
quantum algorithms aimed at undermining model integrity risk assessment framework, organizations may be exposed to
[126] [130]. Defending against such attacks calls for the inte- vulnerabilities that could lead to breaches, exploitation, and
gration of quantum-secure design principles, including post- even societal harm [45]–[51]. This section examines the var-
quantum cryptography, quantum-resistant neural networks, ious hazards that AI systems may encounter, the techniques
and transparency in development and deployment. These created to evaluate such risks, and the effective mitigation
efforts aim to ensure the safety and trustworthiness of ma- techniques put in place (Figure 4)
chine learning systems in a quantum-powered world [126]. i) AI security risk in different industries
Furthermore, the rapid development of quantum processors Different industries face unique challenges and vulner-
and algorithms necessitates proactive measures like quantum abilities when it comes to AI security, and understanding
data anonymization and continuous monitoring of quantum these can help in developing robust mitigation strategies.
systems to mitigate the evolving risks posed by quantum The motivation to focus on the following industries when
adversaries. assessing AI security risks stems from the crucial role AI
9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
plays in these sectors and the magnitude of risk associated outcomes [138]. Research on securing AI in AVs emphasizes
with AI-related vulnerabilities and security breaches. the need for robust defense mechanisms and improving the
Financial Services certifiable robustness of AI models.
In the financial services industry, AI is widely used for Retail
fraud detection, algorithmic trading, risk management, and In retail, AI is used for customer behavior analysis, inven-
customer service automation. The security risks here include tory management, and personalized marketing. Security risks
data breaches, algorithmic biases, and the manipulation of AI include data breaches and the manipulation of AI systems
systems. A significant concern is adversarial attacks where for fraudulent activities. For example, AI algorithms that
malicious entities can exploit vulnerabilities in AI models to analyze customer data to predict purchasing behavior can be
execute fraudulent transactions or manipulate financial mar- targeted to steal consumer information or manipulate pur-
kets. For instance, the study by Xie et al. [133] highlights the chasing trends. A report by Deloitte [139] highlights the need
potential for adversarial attacks to manipulate stock prices . for robust AI security frameworks to protect consumer data
Healthcare and ensure the integrity of AI-driven marketing strategies.
AI in healthcare is revolutionizing diagnostics, personalized ii) AI System Risk Factors
medicine, and predictive analysis [134]. AI and ML algo- AI systems are vulnerable to a wide range of threats,
rithms are capable of processing large volumes of healthcare from hostile external attacks to technical malfunctions. These
data, enhancing precision in areas such as medical imaging risks can be broadly categorized as follows: strategic risks
and genomics. However, the security risks in this high-stakes pertain to decisions made using AI that may have unfavorable
domain are profound. One major risk is adversarial attacks, effects; ethical risks are related to the potential harm AI
where malicious actors introduce subtle changes to input systems may cause to human values and rights; operational
data, such as medical images, leading AI systems to produce risks are related to the functioning and performance of AI
incorrect outcomes, potentially causing misdiagnosis and systems; and technical risks are related to vulnerabilities
inappropriate treatment plans [135] [136]. Morever, vulner- [140]. To effectively address these risks, it is crucial to
abilities in peripheral devices and communication channels expand AI risk assessment and mitigation strategies. One
expand the threat surface. approach is to prioritize the development of robust AI sys-
To mitigate these risks, regular audits and encryption pro- tems that are resilient to external attacks and technical mal-
tocols are crucial to ensure the integrity of the systems. Ad- functions. This can be achieved through rigorous testing and
ditionally, incorporating Human-in-the-Loop (HiTL) frame- validation procedures to identify and rectify vulnerabilities
works can alleviate the risks associated with overreliance in AI systems. Additionally, proactive measures should be
on AI by maintaining human oversight in critical decision- taken to ensure that AI systems adhere to ethical guidelines
making processes. HiTL systems allow clinicians to verify and do not infringe upon human rights and values. Moreover,
and, if necessary, override AI-generated diagnoses, ensur- establishing comprehensive operational protocols can help
ing that final decisions reflect both human expertise and mitigate operational risks associated with the functioning and
AI precision [137]. This balance ensures that AI remains performance of AI systems. By addressing these different
a powerful tool for enhancing healthcare while minimizing categories of risks, organizations and researchers can work
security vulnerabilities and errors. towards enhancing the safety and reliability of AI technolo-
Manufacturing gies. One crucial aspect of expanding AI risk assessment
In the manufacturing industry, AI optimizes production lines, and mitigation strategies is to consider the potential impact
predicts maintenance needs, and enhances supply chain man- of AI decisions on various aspects of society. This involves
agement. The primary security risk here involves industrial not only evaluating the immediate effects of AI systems,
espionage and sabotage. AI systems can be targeted to disrupt but also anticipating and preparing for potential long-term
production processes or steal proprietary information. Secur- consequences. Ethical considerations, such as ensuring fair-
ing AI systems against cyber-physical attacks is crucial for ness, transparency, and accountability in AI decision-making
maintaining operational integrity and protecting intellectual processes, are essential components of comprehensive risk
property. assessment and mitigation strategies.
Autonomous Vehicles iii) Risk Assessment Methodologies
The automotive industry, particularly with the advent of au- Risk assessment in AI involves identifying potential haz-
tonomous vehicles(AV), faces unique AI security challenges. ards, analyzing the likelihood of these hazards occurring,
Autonomous systems rely heavily on AI for navigation, ob- and evaluating the potential impact. Popular frameworks for
stacle detection, and decision-making. A critical risk is phys- AI risk assessment as illustrated in Table 7 offer structured
ical adversarial attacks on these systems, which can cause approaches to assess risks systematically and recommend
vehicles to misinterpret signals or obstacles. A widely studied measures to mitigate them. One of the key aspects of risk
example is the manipulation of STOP signs with adversarial assessment is the identification of potential hazards. This
patches, where subtle changes to the appearance of the sign involves examining the specific functions and components
can cause the vehicle’s AI system to misclassify or fail of AI systems to pinpoint areas of vulnerability. Once po-
to detect the sign entirely, potentially leading to dangerous tential hazards are identified, the next step is to analyze
10
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
the likelihood of these hazards occurring. Factors such as emptive addressing of security flaws that could be exploited
system complexity, data inputs, and external dependencies by attackers. Maintaining Trust: Ensuring the security of
play a crucial role in determining the probability of various AI systems is vital for maintaining user trust in technology
risks. Furthermore, evaluating the potential impact of these applications, especially in sensitive areas like healthcare and
hazards is essential for understanding the potential conse- finance. Regulatory Compliance: Many industries are subject
quences of AI-related risks. This involves considering the to stringent regulatory requirements concerning data security,
potential harm to individuals, organizations, or society as making vulnerability detection a compliance necessity. [145]
a whole. By assessing the likelihood and potential impact [146]
of risks, stakeholders can prioritize their mitigation efforts ii) Techniques for Vulnerability Detection
effectively. In addition to these methodologies, continuous Vulnerability detection in AI systems often employs a
monitoring and adaptation of risk assessment strategies are combination of automated tools and expert analysis, includ-
crucial. AI systems and their operating environments are ing:
constantly evolving, requiring ongoing assessments to stay Static Analysis: Examining the AI’s codebase without
ahead of emerging risks. This proactive approach allows executing the program to find vulnerabilities that could lead
for the timely identification and mitigation of new threats, to security breaches. Dynamic Analysis: Running the AI sys-
ultimately enhancing the overall resilience of AI systems tems in controlled environments to monitor for unexpected
behaviors or outputs that indicate security weaknesses. Fuzz
C. DETECTION AND DEFENSE TECHNIQUES Testing: Introducing malformed or unexpected inputs to the
Vulnerability detection in AI systems is critical for prevent- system to identify potential vulnerabilities that occur under
ing exploitation and ensuring the integrity and confidentiality unusual or unexpected conditions. Penetration Testing: Sim-
of AI-driven operations. This subsection delves into how ulating cyber attacks on the AI systems to assess their robust-
vulnerabilities are identified, assessed, and managed in the ness and the effectiveness of existing security measures.
realm of artificial intelligence, emphasizing the tools and Challenges in AI Vulnerability Detection:
techniques used to detect and respond to vulnerabilities ef- Detecting vulnerabilities in AI systems poses unique chal-
fectively [52]–[64], [144]. lenges, including - Complexity of AI Models The complexity
i) Importance of Vulnerability Detection and often opaque nature of machine learning algorithms can
The detection of vulnerabilities within AI systems is cru- make it difficult to identify where vulnerabilities may exist.
cial for several reasons: Evolving Threat Landscape: As AI technology advances, so
Preventing Attacks: Early detection allows for the pre- do the tactics of attackers, requiring continuous updates to
11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
vulnerability detection methodologies. Scalability: With AI responses. The absence of human oversight in many cases
being implemented at a large scale across different sectors, amplifies the risk, making it a critical area of concern for
ensuring comprehensive vulnerability coverage remains a ensuring the safe deployment of conversational systems.
logistical and technical challenge. ii) Data Poisoning and Backdoor Attacks
iii) Mitigation Strategies Data poisoning involves the manipulation of training data
Once risks are assessed, appropriate mitigation strategies to introduce harmful behaviors into a model. This type of
must be employed. These include technical measures such attack can occur during both pre-training and fine-tuning
as implementing robust training protocols to reduce the sus- phases, but the effectiveness and mechanisms can differ
ceptibility to adversarial attacks, applying anomaly detection significantly depending on when the attack is executed.
systems to catch unusual behaviors indicative of security During Pre-training: When data poisoning occurs at the
breaches, and designing AI systems with inherent security pre-training stage, it can subtly influence the model’s learning
features like encryption and access controls. process on a foundational level. Given the vast datasets typ-
Another critical aspect is the establishment of robust ically used for pre-training large language models (LLMs),
incident response protocols. Organizations should develop such attacks can be difficult to detect but may have more
comprehensive procedures for identifying, containing, and generalized effects, potentially impacting many downstream
resolving AI system failures or security breaches with ded- applications. However, the attacker’s control over specific
icated response teams and regular drills. outputs may be less precise, as the poisoned data is diluted
By integrating robust risk assessment methodologies, among a massive amount of benign data, making it harder to
proactive mitigation strategies, and a commitment to ethi- ensure predictable malicious behavior.
cal best practices, organizations and researchers can work During Fine-tuning: Fine-tuning, on the other hand, usu-
towards enhancing the safety, reliability, and societal impact ally involves smaller, more domain-specific datasets, making
of AI technologies [147]. it a more focused point of attack. Poisoning during fine-
tuning can be more effective for introducing specific back-
D. GENERATIVE AI SECURITY doors because the attacker has greater control over the data
Generative AI (GenAI) has emerged as a transformative influencing the model at this stage. Fine-tuned models are
technology with significant implications across numerous do- often used for specialized tasks, and introducing backdoors
mains, but with its rise come substantial security challenges at this point can allow attackers to precisely trigger malicious
that differentiate it from traditional machine learning models. behaviors or generate harmful outputs when specific inputs
Unlike conventional ML models, which operate with a nar- are provided.
rower set of tasks and structured goals, Generative AI models Backdoor attacks could be introduced during the training
such as Large Language Models (LLMs) and diffusion mod- or fine-tuning of LLMs, causing models to behave ma-
els are characterized by their autonomy in generating diverse, liciously when triggered by specific inputs [152]. These
high-quality content. This capability, while transformative, attacks can cause LLMs to behave unpredictably or even
significantly broadens the potential attack surface, leading to compromise security by revealing sensitive information or
new security vulnerabilities [148] [149]. generating harmful outputs.
Large Language Models (LLMs) such as GPT-4, Gem-
iii) Model Stealing
ini, and other transformer-based models have revolutionized
Attackers can replicate the model’s functionality by query-
natural language processing. However, their advanced capa-
ing it extensively and using the responses to train their
bilities also introduce significant security challenges. Here
own model. Rate limiting, query monitoring, and deploying
is a comprehensive overview of the detection and defense
techniques like watermarking can help in detecting and pre-
techniques specifically aimed at securing LLMs [68]–[79].
venting model stealing [153].
i) Security Risks in LLMs
Adversarial Attacks: Adversarial attacks are one of the iv) Jailbreak Attacks
most prominent security risks in LLMs, where attackers Jailbreak attacks occur when adversaries bypass built-in
craft inputs to deceive the model into generating harmful safety mechanisms by crafting input prompts that cause the
or incorrect outputs. These attacks can trick the model into model to produce restricted or harmful outputs.
producing biased, offensive, or even malicious content by The sophisticated prompts can bypass LLM safety filters,
subtly altering the input prompts [150]. causing models to generate offensive or dangerous content,
Studies have shown that LLMs like ChatGPT can be ma- even in cases where ethical guidelines should prevent such
nipulated into bypassing their ethical filters through carefully behavior [152].
crafted input prompts [151]. Such attacks exploit vulnerabili- These attacks pose a serious threat, particularly in auto-
ties in how LLMs interpret language, allowing adversaries to mated systems, where LLMs could be tricked into perform-
cause misbehavior in the models, such as generating phishing ing illegal or harmful actions without proper oversight.
emails or inappropriate content. This issue is particularly Indirect Malicious Instruction Execution:
relevant in conversational applications, where LLMs are ex- Cross-Site Scripting (XSS) and CSRF: Attackers can em-
pected to interact with users and produce reliable, ethical bed malicious instructions in web content that LLMs access.
12
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
Securing the interfaces and implementing strict content vali- fairness in AI and advocate for the development of algorithms
dation protocols are essential defense measures [154]. that can ensure equitable treatment across different demo-
Detection Techniques Anomaly Detection: Monitoring graphic groups [157].
model inputs and outputs for unusual patterns that deviate ii) Accountability
from normal behavior. This includes tracking the frequency Determining accountability in AI systems is complex,
and nature of queries to detect potential extraction or injec- particularly when these systems operate autonomously. There
tion attempts [150]. is a need for clear guidelines and frameworks to assign
Behavioral Analysis: Utilizing AI techniques to analyze responsibility when AI systems fail or cause harm. This
the model’s behavior over time and detect deviations that includes developing legal and regulatory frameworks that
might indicate ongoing attacks. This involves creating behav- define the roles and responsibilities of AI developers, users,
ioral baselines and continuously comparing current behavior and other stakeholders [158].
against these baselines [150]. iii) Security and Misuse
v) Adversarial Training AI systems are susceptible to various security threats, in-
Enhancing the model’s robustness by training it on both cluding adversarial attacks, data poisoning, and model theft.
benign and adversarial examples. This makes the model These threats can compromise the integrity and reliability of
more resilient to adversarial inputs during real-world use AI systems, leading to harmful consequences. It is essential
(Generative Models for Security: Attacks, Defenses, and to implement robust security measures to protect AI systems
Opportunities, 2022). from malicious actors. Additionally, there is a risk of AI be-
vi) Rate Limiting and Throttling ing used for malicious purposes, such as creating deepfakes
Limiting the number of queries a user can make in a or automating cyberattacks, which poses significant societal
given time period to prevent model stealing and DoS attacks. risks [159].
This involves setting thresholds and employing rate-limiting iv) Societal Impact
mechanisms [155]. AI has the potential to significantly impact society by
vii) Watermarking and Fingerprinting changing the nature of work, influencing social interactions,
Embedding unique identifiers in the model’s responses to and shaping economic structures. There is a need to consider
detect unauthorized use. Watermarking helps trace back any the broader societal implications of AI, including its im-
stolen model or its outputs to the original source [151]. pact on employment, social equity, and access to resources.
viii) Robust Interface Design Policymakers and researchers are urged to adopt a holistic
Ensuring that interfaces through which the LLM interacts approach to AI development, ensuring that the benefits of
with external systems are secure. This includes validating AI are distributed equitably across society and that potential
inputs and outputs and implementing security protocols to negative impacts are mitigated [157].
prevent injection attacks [156]. Addressing these require a
multi-faceted approach combining detection, defense, and F. AI SECURITY TOOLS AND FRAMEWORKS
continuous monitoring. As these models become more in- AI security frameworks provide structured approaches to
tegrated into various applications, addressing their security securing AI systems, ensuring they operate safely, reliably,
vulnerabilities becomes increasingly critical to prevent mis- and in compliance with legal and ethical standards. These
use and protect sensitive information. Continuous research frameworks help organizations assess risks, implement con-
and development of new techniques are essential to stay trols, and monitor AI systems throughout their lifecycle.
ahead of emerging threats in the AI landscape. AI security tools are essential for protecting AI sys-
tems from various threats, such as adversarial attacks, data
E. ETHICAL AND SOCIETAL IMPLICATIONS OF AI breaches, and model theft. These tools help detect, prevent,
SECURITY and mitigate security risks, ensuring the integrity and relia-
AI security encompasses a broad range of ethical and so- bility of AI applications.
cietal issues that need to be carefully addressed to ensure Our exploration unearthed a diverse arsenal of tools and
the development of responsible and trustworthy AI systems techniques tailored to fortify AI against the plethora of
[100]–[104]. Here are some of the key ethical and societal threats. Among these, machine learning models designed to
implications of AI security: detect and counter adversarial attacks have shown a notable
i) Bias and Fairness surge in research interest. Tools like adversarial training
One of the most significant ethical concerns in AI is bias. frameworks and robustness assessment platforms are being
AI systems can inadvertently learn and propagate biases continuously refined to address the evolving complexity of
present in their training data, leading to unfair and discrimi- threats that AI systems face. These frameworks, summarized
natory outcomes. This is particularly problematic in sensitive in Table 7, highlight the diverse approaches to AI gover-
applications such as hiring, law enforcement, and healthcare. nance and security, catering to various stages of AI lifecycle
Efforts to mitigate bias involve developing and implement- management and different regional or sector-specific needs.
ing fairness metrics, bias detection tools, and adopting fair And the tools, detailed in Table 8, represent the cutting-edge
training practices. Researchers emphasize the importance of technologies being used to secure AI systems. They highlight
13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
the ongoing shift from reactive defense mechanisms to pre- model interactions, enabling a nuanced approach to miti-
dictive security tools, leveraging AI’s predictive capabilities gating manipulation risks in LLM deployments. Limitation:
to preempt potential breaches [154] [163]. This taxonomy does not encompass all potential misuse cases
In this context, the integration of explainability tools has or attack vectors, focusing primarily on known threats. As
been identified as a critical factor in AI security. The ability to LLMs evolve, new risks are likely to emerge, suggesting a
interpret AI decision-making processes is not only essential need for ongoing updates to address future vulnerabilities
for trust and transparency but is increasingly recognized as comprehensively.
a cornerstone of security, providing insights into potential AI Threats, Crime, and Forensics Taxonomy: Struc-
vulnerabilities within AI systems. tured through the lens of criminology, this taxonomy exam-
ines AI’s dual role as both a tool and a target of criminal activ-
G. LIMITATIONS OF AI SECURITY TAXONOMIES ity. By categorizing AI threats along this dual-use framework,
The analyzed taxonomies(Table 6) each address unique it supports forensic analysis, legal accountability, and regula-
security and evaluation challenges, crafted to respond to tory oversight. This approach is particularly suited to appli-
specific operational and threat contexts in AI. However, each cations where AI’s societal impact must be carefully man-
taxonomy also has limitations that reflect challenges in its aged, enabling systematic tracking of AI’s involvement in
scope, applicability, or empirical support. criminal activity and offering structured guidance for ethical
AI System Evaluation Framework: This taxonomy pro- compliance and regulatory adherence in AI-driven environ-
vides a comprehensive, lifecycle-wide approach to AI system ments. Limitation: Traditional forensic techniques may not
assessment, addressing both component and system levels effectively address AI-related crimes, indicating challenges
for Narrow and General AI. By mapping safety evaluations in adapting established investigative methods to AI-specific
to stages across the development, deployment, and opera- contexts. Developing specialized forensic approaches for AI
tional lifecycle, it emphasizes stakeholder accountability and crimes is necessary to fully address these challenges.
long-term risk management. This holistic structure supports Federated Learning Backdoor Taxonomy: This tax-
standardized evaluation and accountability across diverse onomy addresses the specific vulnerabilities in federated
AI environments and stakeholder roles, fostering a resilient learning (FL) systems, focusing on backdoor threats intro-
approach to AI safety that encompasses both model-centric duced through data poisoning and model poisoning attacks.
and system-wide concerns. It emphasizes anomaly detection and robust training as key
Limitation: The taxonomy is primarily theoretical with defenses, adapted for Federated Learning’s decentralized,
limited empirical validation, which may restrict its practical privacy-preserving framework. Its compatibility with secure
applicability. The reliance on theoretical constructs could be aggregation and differential privacy highlights its alignment
refined with empirical studies to ensure its effectiveness in with FL’s non-centralized, privacy-first architecture, allowing
real-world scenarios. for effective backdoor mitigation without compromising FL’s
Prompt-Based LLM Security Taxonomy: Uniquely or- core design principles. Limitation: The taxonomy is highly
ganized around user, model, and third-party targets, this tax- specialized for federated learning and is not generalizable to
onomy focuses on vulnerabilities inherent in prompt-based all AI systems. Its narrow focus limits its applicability out-
interactions within large language models (LLMs). Struc- side federated learning environments, suggesting a need for
tured along the Confidentiality, Integrity, and Availability expanded frameworks that adapt these security mechanisms
(CIA) triad, it categorizes prompt manipulation risks and to other decentralized or centralized AI systems.
prescribes defenses that are modular and interaction-focused. Common Themes and Key Differences: Across these
The emphasis on targeted defenses aligns with LLM-specific taxonomies, several shared themes emerge, reflecting foun-
threats where prompt integrity is critical to secure user- dational principles in AI security and evaluation:
14
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
Shared Emphasis on Security and Risk Mitigation: All tax- language models (LLMs), most of this research highlights the
onomies underscore the importance of rigorous security and vulnerabilities and challenges rather than proposing concrete
risk assessment. Each taxonomy incorporates robust defenses solutions. As these technologies integrate further into our
against unauthorized access, data manipulation, and system daily lives, the focus must shift towards hardening these
integrity breaches, using established principles like the CIA systems against attacks. There is a pressing need for targeted
triad or lifecycle-based risk assessment to ensure resilient AI research on the defense of AI assistants, including LLMs,
operations. as well as multimodal generative AI models, which combine
Component-Level vs. System-Level Evaluation: Each text, image, and audio generation capabilities. These systems
taxonomy distinguishes between component-specific evalua- are increasingly used in sensitive applications, and their com-
tions (e.g., model accuracy, data quality) and broader system- plexity creates multifaceted security challenges that require
level assessments. This layered approach ensures that both new approaches to threat detection, adversarial defense, and
the core AI model and the broader deployment context are model transparency.
systematically evaluated for security vulnerabilities and per- While Quantum Adversarial Machine Learning (QAML)
formance. opens up new avenues for enhancing AI systems, it simulta-
Ethics and Accountability: Ethical considerations, regu- neously exposes critical research gaps in defending against
latory compliance, and stakeholder accountability are woven quantum-based adversarial attacks. Current security mecha-
into each taxonomy. The AI System Evaluation Framework nisms lack the sophistication to counteract quantum-enabled
emphasizes accountability through lifecycle mapping, while threats that exploit unique quantum properties like super-
the AI Threats taxonomy specifically addresses societal im- position and entanglement to bypass traditional defenses.
pacts through a forensic, legal-oriented perspective. Mean- The absence of standardized frameworks for quantum-
while, the LLM taxonomy emphasizes ethical deployment by resistant neural networks and the limited exploration of post-
mitigating interaction-based risks, and the Federated Learn- quantum cryptographic methods leave AI systems vulnerable
ing taxonomy promotes privacy preservation and security in to emerging quantum adversarial strategies, such as quan-
a decentralized context. tum cryptanalysis and advanced algorithmic manipulations.
Key differences arise in the taxonomies specific scopes Additionally, there is a pressing need for adaptive defense
and methodologies, tailored to their application contexts. The mechanisms, such as real-time quantum system monitoring
AI System Evaluation Framework provides lifecycle-wide, and quantum data anonymization, to address the dynamic
standardized assessments suitable for diverse applications, nature of quantum threats. The interplay between quantum
while the LLM and Federated Learning taxonomies focus advancements and AI vulnerabilities highlights a significant
narrowly on interaction-specific and decentralized security gap in ensuring the robustness, trustworthiness, and safety
risks, respectively. The AI Threats taxonomy is unique in of machine learning models in the face of quantum-enabled
its forensic orientation, aligning with law enforcement and adversarial challenges.
regulatory needs in contexts where AI impacts are directly Additionally, as AI security research evolves, the explo-
linked to societal and legal outcomes. ration of ethical and governance frameworks remains under-
VI. RESEARCH GAPS developed. Given the ethical implications of deploying AI in
Despite the advancements, our study identifies critical gaps critical areas like healthcare, finance, and law enforcement,
in current AI security research. there is a need for more detailed guidelines on responsible
One such gap is the need for standardized benchmarks for and transparent AI security practices. This includes develop-
AI security, which would enable the consistent evaluation of ing frameworks that address issues such as bias, accountabil-
AI security tools and methodologies. Standardization would ity, and transparency to ensure AI systems uphold societal
help unify the assessment of various approaches, allowing for values while maintaining robust security.
reliable comparisons and establishing best practices across Addressing these identified gaps by developing solutions-
the industry. Such benchmarks could also facilitate the de- focused research in GenAI security, enhancing frameworks
ployment of robust security measures in real-world applica- for emerging AI applications, and establishing ethical gov-
tions where interoperability and compliance are essential. ernance standards will be essential to advancing the field of
Another significant gap is the lack of research on security AI security. These efforts will support the responsible and
implications in emerging domains, such as autonomous sys- resilient adoption of AI technologies across industries.
tems and the Internet of Things (IoT), where AI’s interaction
with the physical environment introduces new security risks. VII. AI SECURITY TOOLS LIMITATIONS
In these areas, threats can extend beyond data integrity and In addition to gaps in AI security research, there are several
privacy concerns, posing risks to physical safety, infrastruc- limitations in the development and deployment of AI security
ture stability, and public trust in technology. Developing tools that hinder their effectiveness and adoption.
security frameworks that consider both digital and physical Adaptability to Evolving Threats: One of the core chal-
interactions in these domains is crucial. lenges is that AI security tools struggle to keep pace with
Moreover, while there is extensive literature on the secu- the rapid evolution of adversarial tactics. As AI models
rity of Generative AI (GenAI), especially concerning large become more sophisticated, so do the methods of attack, from
15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
16
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
subtle data poisoning to complex adversarial manipulations. is crucial to consider how these methodologies translate into
Security tools must be highly adaptable to detect and counter real-world industry practices, which often adapt and refine
new and unforeseen threats effectively, which demands con- academic models to meet practical, scalable needs.
tinuous updates and agile development. Synthesis of Secure AI Engineering Practices: The
Lack of Scalability Across Models and Applications: mapping study also synthesized secure AI engineering prac-
Many AI security tools are tailored to specific types of AI tices from the selected literature. A trend towards embedding
models or applications, making it difficult to apply them security considerations into the AI design and development
across different platforms or domains. This lack of scalability process, rather than as a post-deployment afterthought, was
limits their usability, especially in large, complex systems apparent. This proactive approach indicates a paradigm shift
that utilize diverse AI components. Tools that work with one in how security is perceived in the context of AI. Security-
type of model, such as computer vision, may not function by-design principles are being advocated, integrating threat
effectively with natural language models, highlighting a need modeling, risk assessment, and security testing early in the
for more versatile solutions. AI development lifecycle [172].
Performance Overhead and Computational Cost: Ef- Thematic Analysis of AI Security Research: Thematic
fective AI security tools often introduce significant compu- analysis revealed that research in AI security is becom-
tational overhead, which can impact the efficiency of AI ing increasingly granular, with specialized themes such as
systems. Techniques like adversarial training, real-time threat privacy-preserving AI, secure AI in edge computing, and
monitoring, and robust model verification can slow down the blockchain for AI security gaining momentum. These themes
system’s performance, making these security tools impracti- highlight the necessity for security measures that are not only
cal for time-sensitive applications like autonomous vehicles technologically adept but are also congruent with legal and
or high-frequency trading. Reducing the computational load ethical standards.
without sacrificing security remains a key challenge. Transparency and Explainability: AI security tools of-
ten operate as black boxes, making it difficult for users to
VIII. RESULTS AND DISCUSSION understand how they identify threats or vulnerabilities. This
The systematic mapping study pivots, merging quantitative lack of transparency can hinder trust and limit the tools’
data with qualitative insights to present a cohesive narrative applicability in sectors where explainability is critical, such
of AI security research findings. Through the lens of meticu- as healthcare or finance. Tools that integrate transparency
lously collated studies, this segment delineates the prevailing and explainability, providing insight into the decisions made
research topics, methodological approaches, and prevalent during security assessments, would be more broadly useful
tools that underpin AI security, while also venturing into the and trusted across regulated industries.
less charted territories that represent future frontiers. How- Limited Focus on Multimodal and Generative AI Mod-
ever, while academic research lays an essential foundation, it els: While there are security tools available for traditional
18
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
machine learning models, few are designed for complex, ate effectively in real-world conditions and expanding their
multimodal models (those combining text, image, and audio) scope to cover complex, multimodal models will significantly
or generative AI systems, such as large language models. improve AI security and its practical applications across
These models have unique vulnerabilities that require spe- diverse sectors.
cialized security measures. For instance, large language mod- These gaps not only represent challenges but also oppor-
els are vulnerable to prompt injection attacks and jailbreak- tunities for groundbreaking research that can significantly
ing, yet few tools address these specific threats effectively. advance our understanding and capabilities in AI security.
Enhancing AI security tools to cover multimodal and gener- Looking ahead, it is absolutely imperative to adopt a proac-
ative AI models is essential as these systems become more tive and multidisciplinary approach to AI security research
integrated into everyday applications. as outlined in this review. This comprehensive approach
Challenges in Real-World Integration and Testing: Se- should delve far beyond technical aspects and encompass
curity tools are often developed and tested in controlled en- the broader ethical, social, and legal implications of AI.
vironments, but their efficacy can diminish when deployed in Prioritizing practical real-world applications and establishing
real-world settings. Factors such as data variability, user be- standardized benchmarks for AI security are crucial initial
havior, and unanticipated system interactions introduce com- steps that will uphold the effectiveness and relevance of
plexities that are difficult to replicate in testing phases. As a future research endeavors.
result, AI security tools may not perform as expected outside In conclusion, this systematic mapping study has not only
laboratory conditions, underscoring the need for more robust collated and analyzed existing research in AI security but
testing methods that mimic real-world conditions. has also sought to chart a course for future investigations.
Addressing these challenges is vital for advancing the By highlighting both the achievements and the gaps in cur-
reliability and applicability of AI security tools. By improv- rent research, this study aims to inspire and guide future
ing scalability, transparency, adaptability, and efficiency, AI endeavors in AI security, ensuring that as AI technologies
security tools can become more practical and effective in advance, they do so with robust security measures and ethical
safeguarding diverse AI systems in real-world environments. considerations at their core. It is through these continued
efforts that we can hope to harness the full potential of AI
IX. CONCLUSION while safeguarding against its inherent risks.
As we conclude this systematic mapping study on AI secu-
rity, several key insights emerge, offering a comprehensive
REFERENCES
portrayal of the current research landscape while also shed-
[1] S. R. Konda, “Ensuring trust and security in ai: Challenges and solutions
ding light on vital areas for future exploration. This study has for safe integration,” INTERNATIONAL JOURNAL OF COMPUTER
synthesized diverse perspectives and findings from a wide SCIENCE AND TECHNOLOGY, vol. 3, no. 2, pp. 71–86, 2019.
array of scholarly works, enabling a deeper understanding [2] M. Wooldridge, A brief history of artificial intelligence: what it is, where
we are, and where we are going. Flatiron Books, 2021.
of AI security’s multifaceted nature. Firstly, the current state
[3] A. Oseni, N. Moustafa, H. Janicke, P. Liu, Z. Tari, and A. Vasilakos,
of AI security research, as reflected in the papers analyzed, “Security and privacy for artificial intelligence: Opportunities and chal-
demonstrates significant advancements in identifying and lenges,” arXiv preprint arXiv:2102.04661, 2021.
mitigating various threats and vulnerabilities associated with [4] Palo Alto Networks, “What is ai security?.”
https://www.paloaltonetworks.com/cyberpedia/ai-security, 2023.
AI systems. These developments highlight the continued
[5] CISA, “Joint guidance on deploying ai systems securely.”
need for vigilance and innovation in this area, especially https://www.cisa.gov/news-events/alerts/2024/04/15/joint-guidance-
as AI technologies become more integrated into different deploying-ai-systems-securely, 2024.
aspects of society. Through a comprehensive gap analysis, [6] J. M. Spring, A. Galyardt, A. D. Householder, and N. VanHoudnos, “On
managing vulnerabilities in ai/ml systems,” in Proceedings of the New
this study reveals several under-explored areas that warrant Security Paradigms Workshop 2020, pp. 111–126, 2020.
further research. These gaps include the need for more so- [7] L. Mauri and E. Damiani, “Modeling threats to ai-ml systems using
phisticated adversarial defense mechanisms, holistic risk as- stride,” Sensors, vol. 22, no. 17, p. 6662, 2022.
[8] K. Gnitko, “Systematic overview of ai security standards,” Available at
sessment models that capture all potential vulnerabilities, and
SSRN 4922592, 2024.
in-depth research into ethical implications and governance [9] IBM Institute for Business Value, “Enterprises’ best bet for the future:
frameworks for AI security. The substantial focus on identi- Securing generative ai.” https://www.ibm.com/think/insights/generative-
fying issues, particularly in Generative AI and large language ai-security-recommendations, 2024.
[10] M. Rahman et al., “Security risk and attacks in ai: A survey of security
models (LLMs) highlights a need to pivot toward developing and privacy. 47th ieee-computer-society annual international conference
practical solutions and robust safeguards for these evolving on computers,” Software, and Applications (COMPSAC), vol. 1839,
technologies. The call for a deeper emphasis on hardening 2023.
[11] A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust,
AI systems and multimodal generative models reflects the
risk and security management (ai trism): Frameworks, applications, chal-
importance of protecting these widely used systems from lenges and future research directions,” Expert Systems with Applications,
escalating security threats. Moreover, the existing challenges vol. 240, p. 122442, 2024.
in AI security tools, such as adaptability, scalability, and [12] B. Xia, Q. Lu, H. Perera, L. Zhu, Z. Xing, Y. Liu, and J. Whittle, “To-
wards concrete and connected ai risk assessment (c2aira): A systematic
transparency, also point to crucial areas for improvement. mapping study. in 2023 ieee/acm 2nd international conference on ai
Addressing these limitations by enhancing tools to oper- engineering–software engineering for ai (cain),” 2023.
19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
[13] K. Petersen, S. Vakkalanka, and L. Kuzniarz, “Guidelines for conducting [41] I. Ilahi, M. Usama, J. Qadir, M. U. Janjua, A. Al-Fuqaha, D. T. Hoang,
systematic mapping studies in software engineering: An update,” Infor- and D. Niyato, “Challenges and countermeasures for adversarial attacks
mation and software technology, vol. 64, pp. 1–18, 2015. on deep reinforcement learning,” IEEE Transactions on Artificial Intelli-
[14] Y. Kawamoto, K. Miyake, K. Konishi, and Y. Oiwa, “Threats, vulnera- gence, vol. 3, no. 2, pp. 90–109, 2021.
bilities, and controls of machine learning based systems: A survey and [42] L. Floridi, J. Cowls, M. Beltrametti, R. Chatila, P. Chazerand, V. Dignum,
taxonomy,” arXiv preprint arXiv:2301.07474, 2023. C. Luetge, R. Madelin, U. Pagallo, F. Rossi, et al., “Ai4people—an ethical
[15] E. R. Isaac and J. Reno, “Ai product security: A primer for developers,” framework for a good ai society: opportunities, risks, principles, and
arXiv preprint arXiv:2304.11087, 2023. recommendations,” Minds and machines, vol. 28, pp. 689–707, 2018.
[16] G. Sebastian, “Privacy and data protection in chatgpt and other ai chat- [43] H. Ali, D. Chen, M. Harrington, N. Salazar, M. Al Ameedi, A. Khan,
bots: Strategies for securing user information,” International Journal of A. R. Butt, and J.-H. Cho, “A survey on attacks and their countermea-
Security and Privacy in Pervasive Computing (IJSPPC), vol. 15, no. 1, sures in deep learning: Applications in deep neural networks, federated,
pp. 1–14, 2023. transfer, and deep reinforcement learning,” IEEE Access, 2023.
[17] L. N. Tidjon and F. Khomh, “Threat assessment in machine learning [44] T. Chen, J. Liu, Y. Xiang, W. Niu, E. Tong, and Z. Han, “Adversarial
based systems,” arXiv preprint arXiv:2207.00091, 2022. attack and defense in reinforcement learning-from ai security view,”
[18] IBM Security, “X-force threat intelligence index 2023.” Cybersecurity, vol. 2, pp. 1–22, 2019.
https://www.ibm.com/reports/threat-intelligence. [45] V. Turri and R. Dzombak, “Why we need to know more: Exploring the
[19] European Union Agency for Cybersecurity (ENISA), “Enisa threat state of ai incident documentation practices,” in Proceedings of the 2023
landscape 2023.” https://www.enisa.europa.eu/publications/enisa-threat- AAAI/ACM Conference on AI, Ethics, and Society, pp. 576–583, 2023.
landscape-2023. [46] B. Xia, Q. Lu, H. Perera, L. Zhu, Z. Xing, Y. Liu, and J. Whittle, “Towards
[20] A. Grotto and J. J. Dempsey, “Vulnerability disclosure and management concrete and connected ai risk assessment (c 2 aira): A systematic
for ai/ml systems: A working paper with policy recommendations,” mapping study,” in 2023 IEEE/ACM 2nd International Conference on AI
SSRN Electronic Journal, 2021. Engineering–Software Engineering for AI (CAIN), pp. 104–116, IEEE,
[21] Y. Hu, W. Kuang, Z. Qin, K. Li, J. Zhang, Y. Gao, W. Li, and K. Li, 2023.
“Artificial intelligence security: Threats and countermeasures,” ACM [47] P. Giudici, M. Centurelli, and S. Turchetta, “Artificial intelligence risk
Computing Surveys (CSUR), vol. 55, pp. 1 – 36, 2021. measurement,” Expert Systems with Applications, vol. 235, p. 121220,
[22] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and 2024.
A. Swami, “Practical black-box attacks against machine learning,” in [48] J. Harshith, M. S. Gill, and M. Jothimani, “Evaluating the vul-
Proceedings of the 2017 ACM on Asia conference on computer and nerabilities in ml systems in terms of adversarial attacks,” ArXiv,
communications security, pp. 506–519, 2017. vol. abs/2308.12918, 2023.
[23] C. Berghoff, M. Neu, and A. von Twickel, “Vulnerabilities of connection- [49] S. L. Eggers and C. Sample, “Vulnerabilities in artificial intelligence
ist ai applications: Evaluation and defense,” Frontiers in Big Data, vol. 3, and machine learning applications and data,” tech. rep., Idaho National
2020. Lab.(INL), Idaho Falls, ID (United States), 2020.
[24] N. L. Rane, “Multidisciplinary collaboration: key players in successful [50] A. Grotto and J. Dempsey, “Vulnerability disclosure and management
implementation of chatgpt and similar generative artificial intelligence in for ai/ml systems: A working paper with policy recommendations,” ML
manufacturing, finance, retail, transportation, and construction industry,” Systems: A Working Paper with Policy Recommendations (November
2023. 15, 2021), 2021.
[25] I. H. Sarker, M. H. Furhad, and R. Nowrozy, “Ai-driven cybersecurity: [51] J. D. Kong, K. Fevrier, J. O. Effoduh, and N. L. Bragazzi, “Artificial
an overview, security intelligence modeling and research directions,” SN intelligence, law, and vulnerabilities,” in AI and Society, pp. 179–196,
Computer Science, vol. 2, no. 3, p. 173, 2021. Chapman and Hall/CRC, 2023.
[26] Microsoft, “Microsoft Responsible AI Impact Assessment Template,” [52] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov,
tech. rep., June 2022. Online. G. Giacinto, and F. Roli, “Evasion attacks against machine learning at
[27] Google, “Saif.” Available at: https://github.com/protectai/rebuff, 2021. test time,” in Machine Learning and Knowledge Discovery in Databases:
[28] S. Inc., “Snowflake ai security framework,” 2023. Accessed: 2024-10-06. European Conference, ECML PKDD 2013, Prague, Czech Republic,
[29] N. I. of Standards and T. (NIST), “Ai risk management framework,” 2022. September 23-27, 2013, Proceedings, Part III 13, pp. 387–402, Springer,
[30] O. Foundation, “Owasp ai security and privacy guide,” 2023. Accessed: 2013.
2024-10-06. [53] S. Cattell and A. Ghosh, “Coordinated disclosure for ai: Beyond security
[31] Azure, “Counterfit.” Available at: https://github.com/Azure/counterfit, vulnerabilities,” arXiv preprint arXiv:2402.07039, 2024.
2022. [54] J. M. Spring, A. Galyardt, A. D. Householder, and N. VanHoudnos, “On
[32] O. A. Ajala, “Leveraging ai/ml for anomaly detection, threat prediction, managing vulnerabilities in ai/ml systems,” in Proceedings of the New
and automated response.,” 2024. Security Paradigms Workshop 2020, pp. 111–126, 2020.
[33] A. Oseni, N. Moustafa, H. Janicke, P. Liu, Z. Tari, and A. Vasilakos, [55] G. Jabeen, S. Rahim, W. Afzal, D. Khan, A. A. Khan, Z. Hussain,
“Security and privacy for artificial intelligence: Opportunities and chal- and T. Bibi, “Machine learning techniques for software vulnerability
lenges,” arXiv preprint arXiv:2102.04661, 2021. prediction: a comparative study,” Applied Intelligence, vol. 52, no. 15,
[34] L. Mauri and E. Damiani, “Modeling threats to ai-ml systems using pp. 17614–17635, 2022.
stride,” Sensors, vol. 22, no. 17, p. 6662, 2022. [56] L. Song, R. Shokri, and P. Mittal, “Privacy risks of securing machine
[35] J. M. Spring, A. Galyardt, A. D. Householder, and N. VanHoudnos, “On learning models against adversarial examples,” in Proceedings of the
managing vulnerabilities in ai/ml systems,” in Proceedings of the New 2019 ACM SIGSAC Conference on Computer and Communications
Security Paradigms Workshop 2020, pp. 111–126, 2020. Security, pp. 241–257, 2019.
[36] S. Qiu, Q. Liu, S. Zhou, and C. Wu, “Review of artificial intelligence [57] S. Shahriar, S. Allana, M. H. Fard, and R. Dara, “A survey of privacy risks
adversarial attack and defense technologies,” Applied Sciences, vol. 9, and mitigation strategies in the artificial intelligence life cycle,” IEEE
no. 5, p. 909, 2019. Access, 2023.
[37] H. Liang, E. He, Y. Zhao, Z. Jia, and H. Li, “Adversarial attack and [58] Y. Chen, A. Arunasalam, and Z. B. Celik, “Can large language models
defense: A survey,” Electronics, vol. 11, no. 8, p. 1283, 2022. provide security & privacy advice? measuring the ability of llms to refute
[38] Z. Tian, L. Cui, J. Liang, and S. Yu, “A comprehensive survey on poison- misconceptions,” in Proceedings of the 39th Annual Computer Security
ing attacks and countermeasures in machine learning,” ACM Computing Applications Conference, pp. 366–378, 2023.
Surveys, vol. 55, no. 8, pp. 1–35, 2022. [59] A. Koshiyama, E. Kazim, P. Treleaven, P. Rai, L. Szpruch, G. Pavey,
[39] Y. Gao, B. G. Doan, Z. Zhang, S. Ma, J. Zhang, A. Fu, S. Nepal, and G. Ahamat, F. Leutner, R. Goebel, A. Knight, et al., “Towards algorithm
H. Kim, “Backdoor attacks and countermeasures on deep learning: A auditing: managing legal, ethical and technological risks of ai, ml and
comprehensive review,” arXiv preprint arXiv:2007.10760, 2020. associated algorithms,” Royal Society Open Science, vol. 11, no. 5,
[40] G. Xu, H. Li, H. Ren, K. Yang, and R. H. Deng, “Data security issues p. 230859, 2024.
in deep learning: Attacks, countermeasures, and opportunities,” IEEE [60] H. Barmer, R. Dzombak, M. Gaston, E. Heim, V. Palat, F. Redner,
Communications Magazine, vol. 57, no. 11, pp. 116–122, 2019. T. Smith, and N. VanHoudnos, “Robust and secure ai,” 2021.
20
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
[61] M. Fazelnia, A. Okutan, and M. Mirakhorli, “Supporting ai/ml security [87] N. Carlini and D. Wagner, “Adversarial examples are not easily detected:
workers through an adversarial techniques, tools, and common knowl- Bypassing ten detection methods,” in Proceedings of the 10th ACM
edge (ai/ml att&ck) framework,” arXiv preprint arXiv:2211.05075, 2022. workshop on artificial intelligence and security, pp. 3–14, 2017.
[62] K. D. Gupta and D. Dasgupta, “Adversarial attacks and defenses for [88] F. Liu, K. Lin, L. Li, J. Wang, Y. Yacoob, and L. Wang, “Aligning
deployed ai models,” IT Professional, vol. 24, no. 4, pp. 37–41, 2022. large multi-modal model with robust instruction tuning,” arXiv preprint
[63] S. Sai, U. Yashvardhan, V. Chamola, and B. Sikdar, “Generative ai arXiv:2306.14565, 2023.
for cyber security: Analyzing the potential of chatgpt, dall-e and other [89] S. Shang, X. Zhao, Z. Yao, Y. Yao, L. Su, Z. Fan, X. Zhang, and Z. Jiang,
models for enhancing the security space,” IEEE Access, 2024. “Can llms deeply detect complex malicious queries? a framework for
[64] “Deepdefense: A steganalysis-based backdoor detecting and mitigating jailbreaking via obfuscating intent,” The Computer Journal, p. bxae124,
protocol in deep neural networks for ai security,” Security and Commu- 2024.
nication Networks, 2023. [90] A. Kavian, M. M. Pourhashem Kallehbasti, S. Kazemi, E. Firouzi, and
[65] C.-L. Chang, J.-L. Hung, C.-W. Tien, C.-W. Tien, and S.-Y. Kuo, “Evalu- M. Ghafari, “Llm security guard for code,” in Proceedings of the 28th
ating robustness of ai models against adversarial attacks,” in Proceedings International Conference on Evaluation and Assessment in Software
of the 1st ACM Workshop on Security and Privacy on Artificial Intelli- Engineering, pp. 600–603, 2024.
gence, pp. 47–54, 2020. [91] R. Zhang, H.-W. Li, X.-Y. Qian, W.-B. Jiang, and H.-X. Chen, “On
[66] M. A. Ramirez, S.-K. Kim, H. A. Hamadi, E. Damiani, Y.-J. Byon, T.- large language models safety, security, and privacy: A survey,” Journal
Y. Kim, C.-S. Cho, and C. Y. Yeun, “Poisoning attacks and defenses on of Electronic Science and Technology, p. 100301, 2025.
artificial intelligence: A survey,” arXiv preprint arXiv:2202.10276, 2022. [92] T. Pang, C. Du, Q. Liu, J. Jiang, M. Lin, et al., “Improved few-shot
[67] X. Xin, Y. Bai, H. Wang, Y. Mou, and J. Tan, “An anti-poisoning attack jailbreaking can circumvent aligned language models and their defenses,”
method for distributed ai system,” Journal of Computer and Communica- Advances in Neural Information Processing Systems, vol. 37, pp. 32856–
tions, vol. 9, no. 12, pp. 99–105, 2021. 32887, 2025.
[68] S. Dunn and Team, “Llm ai cybersecurity & governance checklist.” [93] D. Halawi, A. Wei, E. Wallace, T. T. Wang, N. Haghtalab, and J. Stein-
file:///mnt/data/LLM hardt, “Covert malicious finetuning: Challenges in safeguarding llm
[69] R. Pankajakshan, S. Biswal, Y. Govindarajulu, and G. Gressel, “Mapping adaptation,” arXiv preprint arXiv:2406.20053, 2024.
llm security landscapes: A comprehensive stakeholder risk assessment [94] Q. Ren, C. Gao, J. Shao, J. Yan, X. Tan, W. Lam, and L. Ma, “Codeattack:
proposal,” arXiv preprint arXiv:2403.13309, 2024. Revealing safety generalization challenges of large language models via
[70] F. Wu, N. Zhang, S. Jha, P. McDaniel, and C. Xiao, “A new era in llm code completion,” in Findings of the Association for Computational
security: Exploring security concerns in real-world llm-based systems,” Linguistics ACL 2024, pp. 11437–11452, 2024.
arXiv preprint arXiv:2402.18649, 2024. [95] Y. Mo, Y. Wang, Z. Wei, and Y. Wang, “Fight back against jailbreaking
[71] A. Kumar, S. Singh, S. V. Murty, and S. Ragupathy, “The ethics via prompt adversarial tuning,” in The Thirty-eighth Annual Conference
of interaction: Mitigating security threats in llms,” arXiv preprint on Neural Information Processing Systems, 2024.
arXiv:2401.12273, 2024. [96] Y. Wang, Z. Shi, A. Bai, and C.-J. Hsieh, “Defending llms against jail-
[72] J. Kanamugire and A. S. Faiq, “Security issues in large language models breaking attacks via backtranslation,” arXiv preprint arXiv:2402.16459,
such as chatgpt,” 2024.
[73] A. Alabdulakreem, C. M. Arnold, Y. Lee, P. M. Feenstra, B. Katz, and [97] E. Bassani and I. Sanchez, “Guardbench: A large-scale benchmark for
A. Barbu, “Securellm: Using compositionality to build provably secure guardrail models,” in Proceedings of the 2024 Conference on Empirical
language models for private, sensitive, and secret data,” arXiv preprint Methods in Natural Language Processing, pp. 18393–18409, 2024.
arXiv:2405.09805, 2024. [98] Z. Xu, Y. Liu, G. Deng, Y. Li, and S. Picek, “Llm jailbreak at-
[74] B. C. Das, M. H. Amini, and Y. Wu, “Security and privacy challenges tack versus defense techniques–a comprehensive study,” arXiv preprint
of large language models: A survey,” arXiv preprint arXiv:2402.00888, arXiv:2402.13457, 2024.
2024. [99] W. Cheng, K. Sun, X. Zhang, and W. Wang, “Security attacks on llm-
[75] W. Zhao, Z. Li, and J. Sun, “Causality analysis for evaluating the security based code completion tools,” arXiv preprint arXiv:2408.11006, 2024.
of large language models,” arXiv preprint arXiv:2312.07876, 2023. [100] B. S. Neyigapula, “Secure ai model sharing: A cryptographic approach
[76] X. Wu, R. Duan, and J. Ni, “Unveiling security, privacy, and ethical for encrypted model exchange,” International Journal of Artificial Intelli-
concerns of chatgpt,” Journal of Information and Intelligence, vol. 2, gence and Machine Learning, 2024.
no. 2, pp. 102–115, 2024. [101] T. Shevlane, “Structured access: an emerging paradigm for safe ai de-
[77] K. Huang, J. Yeoh, S. Wright, and H. Wang, “Build your security program ployment,” arXiv preprint arXiv:2201.05159, 2022.
for genai,” in Generative AI Security: Theories and Practices, pp. 99–132, [102] W. Wang, H. Zhou, M. Li, and J. Yan, “An autonomous deployment
Springer, 2024. mechanism for ai security services,” IEEE Access, 2023.
[78] A. Golda, K. Mekonen, A. Pandey, A. Singh, V. Hassija, V. Chamola, and [103] S. A. Khowaja, K. Dev, N. M. F. Qureshi, P. Khuwaja, and L. Foschini,
B. Sikdar, “Privacy and security concerns in generative ai: A comprehen- “Toward industrial private ai: A two-tier framework for data and model
sive survey,” IEEE Access, 2024. security,” IEEE Wireless Communications, vol. 29, no. 2, pp. 76–83,
[79] B. Zhu, N. Mu, J. Jiao, and D. Wagner, “Generative ai security: Chal- 2022.
lenges and countermeasures,” arXiv preprint arXiv:2402.12617, 2024. [104] A. Habbal, M. K. Ali, and M. A. Abuzaraida, “Artificial intelligence trust,
[80] Y. Zeng, H. Lin, J. Zhang, D. Yang, R. Jia, and W. Shi, “How johnny can risk and security management (ai trism): Frameworks, applications, chal-
persuade llms to jailbreak them: Rethinking persuasion to challenge ai lenges and future research directions,” Expert Systems with Applications,
safety by humanizing llms,” arXiv preprint arXiv:2401.06373, 2024. vol. 240, p. 122442, 2024.
[81] A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, [105] M. Madaio, L. Egede, H. Subramonyam, J. Wortman Vaughan, and
Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box llms H. Wallach, “Assessing the fairness of ai systems: Ai practitioners’
automatically,” arXiv preprint arXiv:2312.02119, 2023. processes, challenges, and needs for support,” Proceedings of the ACM
[82] A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, on Human-Computer Interaction, vol. 6, no. CSCW1, pp. 1–26, 2022.
“Universal and transferable adversarial attacks on aligned language mod- [106] B. Richardson and J. E. Gilbert, “A framework for fairness: A systematic
els,” arXiv preprint arXiv:2307.15043, 2023. review of existing fair ai solutions,” arXiv preprint arXiv:2112.05700,
[83] P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, 2021.
“Jailbreaking black box large language models in twenty queries,” arXiv [107] European Commission High-Level Expert Group on Artificial Intelli-
preprint arXiv:2310.08419, 2023. gence, “The Assessment List for Trustworthy Artificial Intelligence,”
[84] K. Hines, G. Lopez, M. Hall, F. Zarfati, Y. Zunger, and E. Kiciman, tech. rep., July 2020. Online.
“Defending against indirect prompt injection attacks with spotlighting,” [108] P. D. P. C. (Singapore), “Model ai governance framework,” 2020.
arXiv preprint arXiv:2403.14720, 2024. [109] G. of New South Wales (Australia), “Nsw artificial intelligence assurance
[85] N. Carlini, “A llm assisted exploitation of ai-guardian,” arXiv preprint framework,” 2022.
arXiv:2307.15008, 2023. [110] G. of Canada, “Algorithm impact assessment tool,” 2022.
[86] J. Su, J. Kempe, and K. Ullrich, “Mission impossible: A statistical per- [111] M. of the Interior and N. Kingdom Relations, “Fundamental rights and
spective on jailbreaking llms,” arXiv preprint arXiv:2408.01420, 2024. algorithm impact assessment,” 2021.
21
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
[112] City, H. D.-S. County of San Francisco, and D. C. DC, “Ethics algo- [137] S. Gerke, T. Minssen, and G. Cohen, “Ethical and legal challenges
rithms toolkit,” 2020. of artificial intelligence-driven healthcare,” in Artificial intelligence in
[113] Azure, “Python risk identification tool for generative ai (pyrit).” Available healthcare, pp. 295–336, Elsevier, 2020.
at: https://github.com/Azure/PyRIT, 2023. Accessed: 2024-09-27. [138] N. Wang, Y. Luo, T. Sato, K. Xu, and Q. A. Chen, “Does physical ad-
[114] I. Research, “Adversarial robustness toolbox (art).” Available at: versarial example really matter to autonomous driving? towards system-
https://research.ibm.com/projects/adversarial-robustness-toolbox, 2024. level effect of adversarial object evasion attack,” in Proceedings of the
Accessed: 2024-09-27. IEEE/CVF International Conference on Computer Vision, pp. 4412–
[115] ProtectAI, “Rebuff.” Available at: https://github.com/protectai/rebuff, 4423, 2023.
2024. [139] Deloitte, “The role of ai in retail security,” Deloitte Insights, 2021.
[116] ProctectAI, “Modelscan.” Available at: [140] T. Shivani, H. Ramakrishna, and N. Nagashree, “Vulnerability manage-
https://github.com/protectai/modelscan, 2022. ment using machine learning techniques,” in 2021 IEEE International
[117] X. Shen, Z. Chen, M. Backes, Y. Shen, and Y. Zhang, ““Do Anything Conference on Mobile Networks and Wireless Communications (ICM-
Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on NWC), pp. 1–4, IEEE, 2021.
Large Language Models,” in ACM SIGSAC Conference on Computer [141] Q. Lu, L. Zhu, X. Xu, J. Whittle, D. Zowghi, and A. Jacquet, “Responsi-
and Communications Security (CCS), ACM, 2024. ble ai pattern catalogue: A collection of best practices for ai governance
[118] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing and engineering,” ACM Computing Surveys, vol. 56, no. 7, pp. 1–35,
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014. 2024.
[119] A. Madry,
˛ A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards [142] B. Xia, Q. Lu, L. Zhu, and Z. Xing, “Towards ai safety: A taxonomy for
deep learning models resistant to adversarial attacks,” stat, vol. 1050, ai system evaluation,” arXiv preprint arXiv:2404.05388, 2024.
no. 9, 2017. [143] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagap-
[120] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and pan, B. Nushi, and T. Zimmermann, “Software engineering for machine
A. Swami, “Practical black-box attacks against machine learning,” in learning: A case study,” in 2019 IEEE/ACM 41st International Confer-
Proceedings of the 2017 ACM on Asia conference on computer and ence on Software Engineering: Software Engineering in Practice (ICSE-
communications security, pp. 506–519, 2017. SEIP), pp. 291–300, IEEE, 2019.
[121] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How [144] Y. Zeng, Y. Wu, X. Zhang, H. Wang, and Q. Wu, “Autodefense:
to backdoor federated learning,” in International conference on artificial Multi-agent llm defense against jailbreak attacks,” arXiv preprint
intelligence and statistics, pp. 2938–2948, PMLR, 2020. arXiv:2403.04783, 2024.
[122] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopad- [145] G. R. Machado, E. Silva, and R. R. Goldschmidt, “Adversarial machine
hyay, “A survey on adversarial attacks and defences,” CAAI Transactions learning in image classification: A survey toward the defender’s perspec-
on Intelligence Technology, vol. 6, no. 1, pp. 25–45, 2021. tive,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–38, 2021.
[123] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership infer- [146] S. S. Kumar, M. Cummings, and A. Stimpson, “Strengthening llm trust
ence attacks against machine learning models,” in 2017 IEEE symposium boundaries: A survey of prompt injection attacks,”
on security and privacy (SP), pp. 3–18, IEEE, 2017. [147] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan,
[124] K. Grosse, L. Bieringer, T. R. Besold, and A. Alahi, “Towards more “A survey on bias and fairness in machine learning,” ACM computing
practical threat models in artificial intelligence security,” arXiv preprint surveys (CSUR), vol. 54, no. 6, pp. 1–35, 2021.
arXiv:2311.09994, 2023. [148] B. Zhu, N. Mu, J. Jiao, and D. Wagner, “Generative ai security: challenges
[125] S. Lu, L.-M. Duan, and D.-L. Deng, “Quantum adversarial machine and countermeasures,” arXiv preprint arXiv:2402.12617, 2024.
learning,” Physical Review Research, vol. 2, no. 3, p. 033212, 2020. [149] C. Barrett, B. Boyd, E. Bursztein, N. Carlini, B. Chen, J. Choi, A. R.
[126] E. Yocam, A. Rizi, M. Kamepalli, V. Vaidyan, Y. Wang, and G. Comert, Chowdhury, M. Christodorescu, A. Datta, S. Feizi, et al., “Identifying and
“Quantum adversarial machine learning and defense strategies: Chal- mitigating the security risks of generative ai,” Foundations and Trends®
lenges and opportunities,” arXiv preprint arXiv:2412.12373, 2024. in Privacy and Security, vol. 6, no. 1, pp. 1–52, 2023.
[127] M. T. West, S.-L. Tsang, J. S. Low, C. D. Hill, C. Leckie, L. C. Hol- [150] V. Authors, “Llm security,” 2023.
lenberg, S. M. Erfani, and M. Usman, “Towards quantum enhanced ad- [151] E. Derner, K. Batistič, J. Zahálka, and R. Babuška, “A security risk
versarial robustness in machine learning,” Nature Machine Intelligence, taxonomy for large language models,” arXiv preprint arXiv:2311.11415,
vol. 5, no. 6, pp. 581–589, 2023. 2023.
[128] K.-C. Tseng, W.-C. Lai, W.-C. Huang, Y.-C. Chang, and S. Zeadally, “Ai [152] H. Li, Y. Chen, J. Luo, Y. Kang, X. Zhang, Q. Hu, C. Chan, and
threats: Adversarial examples with a quantum-inspired algorithm,” IEEE Y. Song, “Privacy in large language models: Attacks, defenses and future
Consumer Electronics Magazine, 2024. directions,” arXiv preprint arXiv:2310.10383, 2023.
[129] Y. Shamoo, “Adversarial attacks and defense mechanisms in the age [153] e. a. Wu, “Llm jailbreak attack versus defense techniques - a comprehen-
of quantum computing,” in Leveraging Large Language Models for sive study,” arXiv preprint arXiv:2402.13457, 2023.
Quantum-Aware Cybersecurity, pp. 301–344, IGI Global Scientific Pub- [154] U. Iqbal, T. Kohno, and F. Roesner, “Llm platform security: Applying
lishing, 2025. a systematic evaluation framework to openai’s chatgpt plugins,” arXiv
[130] M. S. Akter, H. Shahriar, I. Iqbal, M. Hossain, M. Karim, V. Clincy, preprint arXiv:2309.10254, 2023.
and R. Voicu, “Exploring the vulnerabilities of machine learning and [155] e. a. Zou, “Baseline defenses for adversarial attacks against aligned
quantum machine learning to adversarial attacks using a malware dataset: language models,” arXiv preprint arXiv:2309.00614, 2023.
a comparative analysis,” in 2023 IEEE International Conference on [156] F. Wu, N. Zhang, S. Jha, P. McDaniel, and C. Xiao, “A new era in llm
Software Services Engineering (SSE), pp. 222–231, IEEE, 2023. security: Exploring security concerns in real-world llm-based systems,”
[131] S. R. Sindiramutty, “Autonomous threat hunting: A future paradigm for arXiv preprint arXiv:2402.18649, 2024.
ai-driven threat intelligence,” arXiv preprint arXiv:2401.00286, 2023. [157] S. Caton and C. Haas, “A systematic literature review of human-centered,
[132] E. Crothers, N. Japkowicz, and H. L. Viktor, “Machine-generated text: ethical, and responsible ai,” arXiv preprint arXiv:2302.05284, 2023.
A comprehensive survey of threat models and detection methods,” IEEE [158] T. Hagendorff, “Trust and ethics in ai,” AI & Society, vol. 35, pp. 393–
Access, 2023. 409, 2020.
[133] Y. Xie, D. Wang, P.-Y. Chen, J. Xiong, S. Liu, and S. Koyejo, “A word [159] V. Authors, “Ethics of ai: A systematic literature review of principles and
is worth a thousand dollars: Adversarial attack on tweets fools stock challenges,” arXiv preprint arXiv:2109.07906, 2021.
predictions,” arXiv preprint arXiv:2205.01094, 2022. [160] D. Jeong, “Artificial intelligence security threat, crime, and forensics:
[134] A. Panesar and H. Panesar, “Artificial intelligence and machine learning Taxonomy and open issues,” IEEE Access, vol. 8, pp. 184560–184574,
in global healthcare,” in Handbook of Global Health, pp. 1775–1813, 2020.
Springer, 2021. [161] A. Oprea and A. Vassilev, “Adversarial machine learning: A taxonomy
[135] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and and terminology of attacks and mitigations,” tech. rep., National Institute
I. S. Kohane, “Adversarial attacks on medical machine learning,” Science, of Standards and Technology, 2023.
vol. 363, no. 6433, pp. 1287–1289, 2019. [162] X. Gong, Y. Chen, Q. Wang, and W. Kong, “Backdoor attacks and
[136] F. Li, N. Ruijs, and Y. Lu, “Ethics & ai: A systematic review on ethical defenses in federated learning: State-of-the-art, taxonomy, and future
concerns and related strategies for designing with ai in healthcare,” Ai, directions,” IEEE Wireless Communications, vol. 30, no. 2, pp. 114–121,
vol. 4, no. 1, pp. 28–53, 2022. 2022.
22
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3567195
[163] G. A. Fink, “Adversarial artificial intelligence,” Journal of Information JAVIER CARNERERO-CANO is a Research
Warfare, vol. 18, no. 4, pp. 1–23, 2019. Scientist working on trustworthy AI at IBM Re-
[164] MITRE-ATLAS, “Atlas.” Available at: https://github.com/mitre- search Europe - Ireland and a visiting researcher
atlas/atlas-data, 2021. at Imperial College London. He obtained his PhD
[165] Databricks, “Introducing databricks ai security framework (dasf),” 2024. in AI security at Imperial College London. In his
[166] J. Brokman, O. Hofman, O. Rachmil, I. Singh, R. Sabapathy, A. Priya, PhD, he focused on understanding and preventing
V. Pahuja, A. Giloni, R. Vainshtein, and H. Kojima, “Insights and current data poisoning attacks against supervised learning.
gaps in open-source llm vulnerability scanners: A comparative analysis,”
His current research interests include trustworthy
arXiv preprint arXiv:2410.16527, 2024.
and secure AI and adversarial machine learning.
[167] L. Derczynski, E. Galinkin, J. Martin, S. Majumdar, and N. Inie, “garak:
A framework for security probing large language models,” arXiv preprint He also obtained his MRes in Multimedia and
arXiv:2406.11036, 2024. Communications, and his MSc and BEng in Telecommunications Engineer-
[168] G. AI, “Gaurdrails.” Available at: https://github.com/guardrails- ing from Universidad Carlos III de Madrid, where he also worked as a
ai/guardrails, 2023. Research Assistant in antennas and electromagnetic sensors.
[169] R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan,
P. Lohia, J. Martino, S. Mehta, A. Mojsilovic, S. Nagar, K. N. Rama-
murthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney,
and Y. Zhang, “Ai fairness 360: An extensible toolkit for detecting,
understanding, and mitigating unwanted algorithmic bias,” arXiv preprint
AMANDA MINNICH is a Principal Research
arXiv:1810.01943, October 2018.
Manager at Microsoft, leading the Long-Term Ops
[170] Meta, “Purple llama: Towards safe and responsible ai development.”
https://github.com/meta-llama/PurpleLlama, 2023. Accessed: 2024-10- and Research wing of the AI Red Team. Previ-
09. ously she spent three years as a senior operator on
[171] P. Security, “Prompt fuzzer: Open-source tool for strengthening genai the AI Red Team, where she focused on probing
apps,” 2024. Accessed: 2024-04-29. Microsoft’s foundational models and Copilots for
[172] A. J. Lohn and W. Hoffman, “Securing ai: How traditional vulnerability safety and security vulnerabilities. Before joining
disclosure must adapt,” Center for Security and Emerging Technology, Microsoft she worked at Twitter, leveraging graph
March, 2022. clustering algorithms to detect international elec-
tion interference, abuse, and spam campaigns. Dr.
Minnich also held roles at Sandia National Laboratories and Mandiant,
applying advanced machine learning techniques to malware classification
and malware family identification. She is passionate about tech outreach,
especially for women in tech. Dr. Minnich holds an MS and PhD in
Computer Science with Distinction from the University of New Mexico and
a BA in Integrative Biology from UC Berkeley.
23
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/