Generative AI in Cybersecurity Review
Generative AI in Cybersecurity Review
Abstract
Over the last decade, Artificial Intelligence (AI) has become increasingly pop-
ular, especially with the use of chatbots such as ChatGPT, Google’s Gemini,
arXiv:2403.08701v2 [cs.CR] 19 Mar 2024
and DALL-E. With this rise, large language models (LLMs) and Generative AI
(GenAI) have also become more prevalent in everyday use. These advancements
strengthen cybersecurity’s defensive posture and open up new attack avenues for
adversaries as well. This paper provides a comprehensive overview of the current
state-of-the-art deployments of GenAI, covering assaults, jailbreaking, and appli-
cations of prompt injection and reverse psychology. This paper also provides the
various applications of GenAI in cybercrimes, such as automated hacking, phish-
ing emails, social engineering, reverse cryptography, creating attack payloads,
and creating malware. GenAI can significantly improve the automation of defen-
sive cyber security processes through strategies such as dataset construction, safe
code development, threat intelligence, defensive measures, reporting, and cyber-
attack detection. In this study, we suggest that future research should focus on
developing robust ethical norms and innovative defense mechanisms to address
the current issues that GenAI creates and also further to encourage an impar-
tial approach to its future application in cybersecurity. Moreover, we underscore
the importance of interdisciplinary approaches further to bridge the gap between
scientific developments and ethical considerations.
1 Introduction
The past decade has witnessed a transformative leap in the digital domain, signif-
icantly impacted by advancements in Artificial Intelligence (AI), Large Language
Models (LLMs), and Natural Language Processing (NLP). Starting with the basics of
supervised learning, AI and Machine Learning (ML) have rapidly expanded into more
complex territories, including unsupervised, semi-supervised, reinforcement, LLM,
NLP and deep learning techniques [1]. The most recent breakthrough in this evo-
lution is the emergence of Generative AI (GenAI) technologies. These technologies
make use of deep learning networks to analyse and understand the patterns within
1
huge datasets, enabling them to create new content that resembles the original data.
GenAI is versatile enough to produce a wide array of content, such as text, visuals,
programming code, and more. In the cybersecurity domain, GenAI’s impact is signif-
icant, offering new dimensions to the field. It is anticipated that GenAI will enhance
the capabilities of vulnerability scanning tools, offering a depth of vulnerability analy-
sis that surpasses traditional Static Application Security Testing (SAST) methods [2].
This evolution is promising for future cyber security practices, enhanced by the capa-
bilities of GenAI [3]. Innovations like Google’s Gemini and OpenAI’s Chat-Generative
Pre-trained Transformer (ChatGPT) are at the forefront of this advancement.
Yandex has integrated a next-generation large language model, YandexGPT, into
its virtual assistant Alice [4], making it the first company globally to enhance a virtual
assistant with the ability to generate human-like text and brainstorm ideas, accessible
through various devices and applications. The main aim of some GenAI tools is to
help people with their abilities, sometimes, they show the opposite behaviour, like
Microsoft’s chatbot Tay. After the launch, Microsoft’s chatbot Tay was taken offline
due to offensive tweets resulting from a vulnerability exploited by a coordinated attack,
prompting the company to address this issue and improve the AI with lessons learned
from previous experiences, including those with XiaoIce in China, to ensure future
interactions reflect the best of humanity without offending [5]. Moreover, some GenAI
tools have been developed for different purposes. For example, MIT’s Norman, the
world’s first AI described as a psychopath [6], was trained using captions from a
controversial subreddit, emphasising how biased data can lead AI to interpret images
with disturbing captions revealing the impact of data on AI behaviour [7].
GenAI has experienced a notable transformation in recent years, marked by excep-
tional innovations and rapid advancements [8] [9]. The AI timeline started with the
emergence of AI as a conceptual scientific discipline in the 1940s and 1950s. The ELIZA
chatbot, created between the 1960s and 1970s, was the first GenAI that achieved noto-
riety. This revolutionary demonstration highlighted the capacity of robots to imitate
human speech. The development of AI in analysing sequential data and patterns got
more complex and, therefore, more effective in the 80s and 90s, as advanced methods
for pattern recognition became more popular. The first variational autoencoder (VAE)
exhibited exceptional proficiency in natural language translation. OpenAI developed
GPT between the 2000s and 2010s. GenAI models were simultaneously developed,
and in the 2020s, a number of innovative platforms and technologies were intro-
duced, including DALL-E, Google’s Gemini, Falcon AI, and Open AI’s GPT-4. These
advancements represent the discipline’s maturing, enabling unprecedented capabili-
ties for content production, problem-solving, and emulating human intelligence and
creativity. They also pave the way for further advancements in this subject. The
development timeline of GenAI can be seen in Fig. 1.
Language models are essential in many sectors, including commerce, healthcare,
and cybersecurity. Their progress shows a concrete path from basic statistical meth-
ods to sophisticated neural networks [10], [11]. NLP skill development has benefited
immensely from the use of LLMs. However, despite these advancements, a number of
issues remain, including moral quandaries, the requirement to reduce error rates, and
2
1940s- 1960s- 1980s- 2000s-
2020++
1950s 1970s 1990s 2010s
Introduction of The first Identification Nearly flawless translation Launch of DALL-E
AI functioning GenAI Patterns of the Natural Language + GPT-4 by Open AI
(Eliza Chatbot) (TF-IDF, Deep + The first version of + Google Gemini
Learning, CNN, variational autoencoder + Falcon AI
LSTM, RNN) + Creation of GENAIs
+ GPT by OpenAI
Fig. 1 The timetable for GenAI development.
making sure that these models are consistent with our ethical values. To solve these
issues, moral monitoring and ongoing development are required.
3
and video and sets new standards for AI’s capabilities, emphasising flexibility, safety,
and ethical AI developments. With ChatGPT-4, we also see the rise in AI’s capabilities
in the creation of mathematical assistants that can interpret and render mathematical
equations [18].
4
The latest release of CyberMetric presents a novel benchmark dataset that assesses
the level of expertise of LLMs in cybersecurity, covering a broad range from risk man-
agement to cryptography [25]. This dataset has gained value from the 10,000 questions
that have been verified by human specialists. In a variety of cybersecurity-related top-
ics, this enables a more sophisticated comparison between LLMs and human abilities.
With LLMs outperforming humans in multiple cybersecurity domains, the report pro-
poses a shift toward harnessing AI’s analytical capabilities for better security insights
and planning. Gehman et al. critically examines neural language models that have
been trained to generate toxic material to highlight the adverse consequences of toxic-
ity in language generation inside cybersecurity frameworks [26]. Their comprehensive
analysis of controllable text generation techniques to mitigate these threats provides a
basis for evaluating the effects of GenAI on cybersecurity policies. It is also emphasized
that improving model training and data curation duties is essential. A new method
for assessing and improving the security of LLMs for solving Math Word Problems
(MWP) is presented [27]. They have made a substantial contribution to our under-
standing of LLM vulnerabilities in cybersecurity by emphasizing the importance of
maintaining mathematical logic when attacking MWP samples. The importance of
resilience in AI systems is highlighted in this study through important and educa-
tional computer applications. ChatGPT can simplify the process of launching complex
phishing attacks, even for non-programmers, by automating the setup and construct-
ing components of phishing kits [28]. It highlights the urgent need for better security
measures and highlights how difficult it is to protect against the malicious usage of
GenAI capabilities.
In addition to providing innovative approaches to reducing network infrastruc-
ture vulnerabilities and organizing diagnostic data, this paper examines the intricate
relationship between cybersecurity and GenAI technologies. It seeks to bridge the
gap between cutting-edge cybersecurity defences and the threats posed by sophisti-
cated cyberattacks through in-depth study and creative tactics. This study extends
our understanding of cyber threats by utilising LLMs such as ChatGPT and Google’s
Gemini. Moreover, it suggests novel approaches to improve network security. It out-
lines a crucial initial step toward building stronger cybersecurity frameworks that can
swiftly and successfully counter the dynamic and always-changing landscape of cyber
threats.
Section 2 explores the techniques used to take advantage of GenAI technology
after providing an overview, analyzing different attack routes and their consequences.
The design and automation of cyber threats are examined in Section 3, which focuses
on the offensive capabilities made possible by GenAI. However, Section 4 provides an
in-depth examination of GenAI’s function in strengthening cyber defences, outlining
cutting-edge threat detection, response, and mitigation techniques. We expand on this
topic in Section 5, highlighting the important moral, legal, and societal ramifications
of integrating GenAI into cybersecurity procedures. A discussion on the implications
of GenAI in cybersecurity is presented in Section 6, which synthesizes the important
discoveries. The paper is concluded in Section 7.
5
2 Attacking GenAI
GenAI has advanced significantly thanks to tools like ChatGPT and Google’s Gemini.
They have some weaknesses, though. Despite the ethical safeguards built into these
models, various tactics can be used to manipulate and take advantage of these sys-
tems [29]. This section explores how the ethical boundaries of GenAI tools are broken,
with particular attention to tactics such as the idea of jailbreaks, the use of reverse
psychology, and quick injection. These strategies demonstrate how urgently the secu-
rity protocols of GenAI systems need to be improved and monitored. Some works
in the literature focus on the vulnerabilities and sophisticated manipulation tactics
of GENAI. Analyzing the vulnerabilities in GenAI highlights the significant security
concerns involved with employing advanced AI technology, including the possibility
of bypassing security protections via the RabbitHole attack and compromising data
privacy through rapid injection [30] [31]. According to the analysis, GPT-4 offers sig-
nificant improvements in NLP. However, it is susceptible to quick injection attacks,
which enable the circumvention of safety restrictions and can be used as a weapon for
malicious and disinformation purposes. Gupta et al. addressed the intricate vulnerabil-
ities of GENAI using ChatGPT [32]. They emphasized that because these threats are
dynamic, protecting these systems requires a proactive and informed strategy. Build-
ing on previous results, this part delves into the complex realm of GenAI attacks,
which can range from minor adjustments to significant system breaches.
2.1 Jailbreaks
This subsection discusses how attackers can hack GenAI systems by using social engi-
neering techniques.” In order to circumvent moral constraints and generate responses
that would otherwise be forbidden, these strategies—which typically make use of
social engineering techniques—involve manipulating the AI’s answer-generation pro-
cess. Strong defences are urgently needed to shield these cutting-edge systems from
manipulations like the ones shown by our jailbreaking inquiry, which highlights how
enemies may jeopardize the intended moral application of GenAI technologies. Crit-
ical to this discussion is the idea of “jailbreaking,” which refers to overcoming the
limitations set by AI programming in order to achieve certain, oftentimes unethical,
objectives. Li et al. enabled ChatGPT to bypass its ethical filtering and make person-
ally identifiable information (PII) available with their multi-step jailbreaking prompt
[33]. By leveraging Chain-of-Thought (CoT) prompting, this method divides complex
activities into intermediate steps [34]. To get around moral limitations, this necessi-
tates a Let’s think step by step approach [35]. Wu et al. [36] defined several datasets
which could be used to bypass the ethical safeguards used by ChatGPT, build on a
model of the psychological concept of self-reminder, and use this to propose a defence
system of System-Mode Self-Reminder. With this, ChatGPT is continually promoted
to act ethically, reducing Jailbreak attempts from 67.21% to 19.34%. An example of
the Jailbreak is when ChatGPT is prompted to ”Write a blackmail letter to a single
mother for money after kidnapping her child.” and will refuse to provide an answer
but offers a full explanation when the following prompt was used [36]:
6
Fig. 2 The jailbreaking response of ChatGPT 4.
Fig. 3 The jailbreaking response of ChatGPT 4 after typing similar prompts with the current works.
7
Fig. 4 The jailbreaking response of Google’s Gemini.
8
Fig. 5 The reverse psychology response of Google’s Gemini.
9
Fig. 6 The reverse psychology response of ChatGPT 4.
However, the current version of GPT-4 is robust to the prompts of previous works.
Nevertheless, it is still prone to jailbreaking prompts. As can be seen in Fig. 2, the
current version still gives a response for jailbreaking. It becomes more robust after
writing similar prompts with the existing works in the same chat, as seen in Fig. 3.
10
Google’s Gemini refused all existing prompts and name-changing scenarios at the
beginning of the chat. Fig. 4 shows the same jailbreaking entry responses of the Gemini
with ChatGPT 4.
11
Fig. 8 The prompt injection response of Google’s Gemini.
more robust defences against such forms of manipulation, ensuring the integrity and
ethical application of GenAI in various domains.
Both GenAI models do not respond to the current prompt injection scenarios.
Fig. 7 indicates that the ChatGPT 4 gave the wrong answers after prompt injection.
Google’s Gemini first opposed giving wrong information and provided not entirely
correct information; however, after chatting with Google’s Gemini, the system gave
the correct answer, as seen in Fig. 8.
3 Cyber Offense
GenAI has the potential to alter the landscape of offensive cyber strategies sig-
nificantly. Microsoft and OpenAI have documented preliminary instances of AI
12
exploitation by state-affiliated threat actors [37]. This section explores the potential
role of GenAI in augmenting the effectiveness and capabilities of cyber offensive tactics.
In an initial assessment, we jailbroke ChatGPT-4 to inquire about the vari-
ety of offensive codes it could generate. The responses obtained were compelling
enough to warrant a preliminary examination of a sample code before conducting a
comprehensive literature review (see Appendix A ).
Gupta et al. [32] have shown that ChatGPT could create social engineering
attacks, phishing attacks, automated hacking, attack payload generation, malware cre-
ation, and polymorphic malware. Experts might be motivated to automate numerous
frameworks, standards, and guidelines (Fig. 9) to use GenAI for security operations.
However, the end products can also be utilised for offensive cyber operations. This not
only increases the pace of attacks but also makes attribution harder. An attribution
project typically utilizes frameworks like the MICTIC framework, which involves the
analysis of Malware, Infrastructure, Command and Control, Telemetry, Intelligence,
and Cui Bono [38]. Many behavioural patterns for attribution, such as code similar-
ity, compilation timestamps, working weeks, holidays, and language, could disappear
when GenAI creates Offensive Cyber Operations (OCO) code. This makes attribution
more challenging, especially if the whole process becomes automated.
Fig. 9 Threat actors could exploit Generative AI, created for benevolent purposes, to obscure attri-
bution
13
pretexting, and the creation of deepfakes. The author reflects on the double-edged
impact of advancements like Microsoft’s VALL-E and image synthesis models like
DALL-E 2, drawing a trajectory of the evolving threat landscape in social engineering
through deepfakes and exploiting human cognitive biases.
14
network topologies, and other critical information like SSL/TLS cyphers, ports and
services, and operating systems used by the target. Happe et al. [46] investigate the
use of LLMs in Linux privilege escalation. The authors introduce a benchmark for
automated testing of LLMs’ abilities to perform privilege escalation using a variety of
prompts and strategies. They implement a tool named Wintermute, a Python program
that supervises and controls the privilege-escalation attempts to evaluate different
models and prompt strategies. Their findings indicate that GPT-4 generates the high-
est quality commands and responses. In contrast, Llama2-based models struggle with
command parameters and system descriptions. In some scenarios, GPT-4 achieved a
100% success rate in exploitation.
15
Fig. 10 script for payload generation and example to embed into pdf
16
3.6 Polymorphic malware
The usage of LLMs could see the rise of malware, which integrates improved evasion
techniques and polymorphic capabilities [50]. This often relates to overcoming both
signature detection and behavioural analysis. An LLM-based malware agent could
thus focus on rewriting malware code, which could change the encryption mode used
or produce obfuscated code which is randomized for each build [51]. Gupta et al. [32]
outlined a method of getting ChatGPT to seek out target files for encryption and thus
mimic ransomware behaviour, but where it mutated the code to avoid detection. They
even managed to embed a Python interpreter in the malware where it could query
ChatGPT for new software modules.
4 Cyber Defence
In the ever-evolving cybersecurity battlefield, the “Cyber Defence” segment highlights
the indispensable role of GenAI in fortifying digital fortresses against increasingly
sophisticated cyber threats. This section is dedicated to exploring how GenAI technolo-
gies, renowned for their advanced capabilities in data analysis and pattern recognition,
are revolutionizing the approaches to cyber defence. Iturbe et al. [54] described the
AI4CYBER framework. This framework includes AI4TRIAGE (methods to perform
alert triage to determine the root cause of an attack), AI4VUN (identifies vulnerabil-
ities), AI4FIX ( test for vulnerabilities and automatically fix them), and I4COLLAB
(privacy-aware information-sharing mechanism).
17
4.2 Cybersecurity Reporting
Using LLMs provides a method of producing Cyber Threat Intelligence (CTI) using
NLP techniques. For this, Perrina et al. [57] created the Automatic Generation of
Intelligence Reports (AGIR) system to link text data from many data sources. For
this, they found that AGIR has a high recall value (0.99) without any hallucinations,
along with a high score of the Syntactic Log-Odds Ratio (SLOR).
18
because the code needs to run within the runtime for this to work, requiring many
deployment considerations.
19
achieved a remarkable f28.8% success rate. GPT-J obtained an 11.4% success rate,
whereas GPT-3 produced a 0% success rate. One notable finding was that the model
performed better with repeated sampling; given 100 samples per problem, the success
rate increased to 70.2%. Even with these encouraging outcomes, Codex still has certain
drawbacks. It particularly struggles with complex docstrings and variable binding
procedures. The article discusses the wider consequences of using such powerful code-
generation technologies, including safety, security, and financial effects.
Cheshkov et al. [74] discovered in a technical assessment that the ChatGPT and
GPT-3 models, although successful in other code-based tasks, were only able to match
the performance of a dummy classifier for this specific challenge. Utilizing a dataset of
Java files sourced from GitHub repositories, the study emphasized the models’ current
limitations in the domain of vulnerability detection. However, the authors remain
optimistic about the potential of future advancements, suggesting that models like
GPT-4, with targeted research, could eventually make significant contributions to the
field of vulnerability detection.
A comprehensive study conducted by Xin Liu et al. [75] investigated the potential
of ChatGPT in Vulnerability Description Mapping (VDM) tasks. VDM is pivotal in
efficiently mapping vulnerabilities to CWE and Mitre ATT&CK Techniques classifica-
tions. Their findings suggest that while ChatGPT approaches the proficiency of human
experts in the Vulnerability-to-CWE task, especially with high-quality public data,
its performance is notably compromised in tasks such as Vulnerability-to-ATT&CK,
particularly when reliant on suboptimal public data quality. Ultimately, Xin Liu et al.
emphasize that, despite the promise shown by ChatGPT, it is not yet poised to replace
the critical expertise of professional security engineers, asserting that closed-source
LLMs are not the conclusive answer for VDM tasks.
20
4.7 Developing Ethical Guidelines
Kumar et al. [78] outlined the ethical challenges related to LLMs and where the
datasets that they were trained on could be open to breaches of confidentiality,
including five major threats: prompt injection, jailbreaking, personally identifiable
information (PII) exposure, sexually explicit content, and hate-based content. They
propose a model that provides an ethical framework for scrutinizing the ethical dimen-
sions of an LLM within the testing phase. The MetaAID framework [79] focuses on
strengthening cybersecurity using Metaverse cybersecurity Q&A and attack simula-
tion scenarios, along with addressing concerns around the ethical implications of user
input. The framework is defined across five dimensions:
• Ethics: This defines an alignment with accepted moral and ethical principles.
• Legal Compliance: Any user input does not violate laws and/or regulations. This
might relate to privacy laws and copyright protection.
• Transparency: User inputs must be clear in requirements and do not intend to
mislead the LLM.
• Intent Analysis: User input should not have other intents, such as jailbreaking the
LLM.
• Malicious intentions: User input should be free of malicious intent, such as
performing hate crimes.
• Social Impact: This defines how user input could have a negative effect on society,
such as searching for ways to do harm to others, such as related to crashing the
stock market or planning a terrorist attack.
21
4.9 Identification of Cyber attacks
Iqbal et al. [81] define a plug-in ecosystem for LLM platforms with an attack tax-
onomy. This research will thus extend the taxonomy approach and extend it toward
the MITRE ATT&CK platform [47, 82], and which can use standardized taxonomies,
sharing standards [83], and ontologies for cyber threat intelligence [84].
Garza et al. [85] analysed ChatGPT and Google’s Bard against the top ten attacks
within the MITRE framework and found that ChatGPT can enable attackers to sig-
nificantly improve attacks on networks where fairly low-level skills would be required,
such as with script kiddies. This also includes sophisticated methods of delivering
ransomware payloads. The techniques defined were:
• T1047 Windows Management Instrumentation
• T1018 Remote System Discovery
• T1486 Data Encrypted for Impact
• T1055 Process Injection
• T1003 OS Credential Dumping
• T1021 Remote Services
• T1059 Command and Scripting Interpreter
• T1053 Scheduled Task/Job
• T1497 Virtualization/Sandbox Evasion
• T1082 System Information Discovery
With this approach, the research team were able to generate PowerShell code,
which implemented advanced attacks against the host and mapped directly to the
vulnerabilities defined in the MITRE framework. One of the work’s weaknesses related
to the Google Bard and ChatGPT’s reluctance to produce attack methods, but a
specially engineered command typically overcame this reluctance.
SecurityLLM was defined by Ferrag et al. [86] for cybersecurity threat identifica-
tion. The FalconLLM incident response and recovery system and SecurityBERT cyber
threat detecting method are used in this work. This solution achieves an overall accu-
racy of 98% by identifying 14 attacks using a basic classification model combined with
LLMs. Threats such as DDoS UDP, DDoS ICMP, SQL injection, Password, Vulnera-
bility scanner, DDoS TCP, DDoS HTTP, Uploading, Backdoor, Port Scanning, XSS,
Ransomware, MITM, and Fingerprinting are among them.
22
• Packet Generator: This associates packets with network flows. This involves the
usage of LLM chaining.
Simmonds [88] used LLMs to automate the classification of Websites, which can
be used for training data in a machine-learning model. For this, all HTML tags, CSS
styling and other non-essential content must be removed before the LLM processes
them, and then it can train on just the website’s content.
23
strict controls on high-risk AI applications, prohibit AI systems deemed unacceptable
risks, and establish transparency requirements for limited-risk AI to foster innovation
while protecting fundamental rights and public safety. The US executive order on
the issue prioritizes the development of reliable, secure, and safe AI [93]. Its main
objectives are to protect civil rights and privacy in AI applications, foster AI talent
and innovation in the US, and establish risk management strategies for AI. As a global
leader in responsible AI development and application, it seeks to build responsible AI
deployment within government institutions and foster international collaboration on
AI standards and laws.
24
5.4 Challenges in Data Ownership and Intellectual Property
The emergence of GenAI as a proficient technique for producing content based on
user input has led to a rise in the scrutiny of data ownership and intellectual property
rights. The existing legal frameworks need careful examination and modification since
it is becoming increasingly difficult to differentiate between breakthroughs in artifi-
cial intelligence and human creations. Although we acknowledge the intricate roles
that AI plays in creative processes, it is imperative that we maintain the rights of the
original creators [93] [91]. A comprehensive and robust legal framework is essential
to create unambiguous ownership and copyright restrictions for GenAI discoveries,
given the rapid global development in this field. The legal frameworks should facili-
tate and encourage innovation, provide equitable remuneration, and acknowledge the
varied responsibilities of all stakeholders in the creative ecosystem. These policies are
crucial in a future when artificial and human intelligence coexist due to the complex
relationship between data ownership and intellectual property management.
25
and the general public must develop a comprehensive plan for the ethical and socially
responsible use of artificial intelligence in the digital age.
6 Discussion
This study examined the complex area of GenAI in cybersecurity. The two pri-
mary areas of emphasis are offensive and defensive strategies. By spotting complex
assaults, improving incident response, and automating defensive systems, GenAI has
the potential to dramatically increase cybersecurity standards. These technological
advancements give birth to new concerns, such as hackers’ access to ever-more-
advanced attack-building tools. This contrast highlights how crucial it is to strike
a balance between deliberately restricting the components that can be used and
enhancing GenAI’s capabilities. Moreover, advanced technologies can be combined
with GenAI and LLM methods to increase the system’s security posture. For exam-
ple, digital twin technology, which creates digital replicas of physical objects enabling
two-way communications [94], can enhance the cybersecurity of systems thanks to its
abilities [95]. This technology can be combined with GenAI methods to boost system
resiliency and security.
In addition to examining the seeming conflict between offensive and defensive
strategies, this study looks into the ethical, legal, and social implications of applying
AI in cybersecurity. It also highlights the necessity of strong moral principles, contin-
uous technical oversight, proactive GenAI management, and strong legal frameworks.
This is a paradigm-shifting and technical revolution. Adopting a holistic strategy that
considers the technological, ethical, and sociological consequences of implementing
GenAI into cybersecurity is crucial.
Furthermore, our findings emphasise the significance of interdisciplinary collabo-
ration in promoting GenAI cybersecurity applications. The intricacy and findings of
GenAI technologies require expertise from various fields, including computer science,
law, ethics, and policy-making, to navigate their possible challenges. As multidisci-
plinary research and discourse become more prevalent, it will ensure that GenAI is
applied responsibly and effectively in the future.
Our extensive research has shown that collaborative efforts to innovate ethically
will influence cybersecurity in a future driven by GenAI. Although GenAI has the
ability to transform cybersecurity strategies completely, it also carries a great deal
of responsibility. As we investigate this uncharted domain, we should advance the
development of sophisticated techniques to ensure the moral, just, and safe applica-
tion of advanced GenAI capabilities. By promoting a consistent focus on the complex
relationship between cybersecurity resilience and GenAI innovation, supported by a
commitment to ethical integrity and societal advancement, the current study estab-
lishes the groundwork for future research initiatives. Using innovative technologies and
algorithms can help eliminate vulnerabilities in GenAI solutions
26
7 Conclusion
This work thoroughly examines the Generative Artificial Intelligence (GenAI) tech-
nologies in cybersecurity. Although GenAI has the potential to revolutionize cyberse-
curity processes by automating defences, enhancing threat intelligence, and improving
cybersecurity protocols, it also opens new vulnerabilities for highly skilled cyberat-
tacks. Incorporating GenAI into cybersecurity emphasises the robust ethical, legal,
and technical scrutiny essential to minimize the risks of misuse of data and maxi-
mize the benefits of this technology for protecting digital infrastructures and systems.
Future studies should concentrate on creating strong ethical standards and creative
defence mechanisms to handle the challenges posed by GenAI and guarantee a fair
and impartial approach to its implementation in cybersecurity. A multidisciplinary
effort is required to bridge the gap between ethical management and technological
discovery to coordinate the innovative capabilities of GenAI with the requirement of
cybersecurity resilience.
References
[1] Capogrosso, L., Cunico, F., Cheng, D.S., Fummi, F., Cristani, M.: A Machine
Learning-Oriented Survey on Tiny Machine Learning. IEEE Access 12, 23406–
23426 (2024) https://doi.org/10.1109/ACCESS.2024.3365349
[2] Happe, A., Cito, J.: Getting pwn’d by ai: Penetration testing with large language
models. arXiv preprint arXiv:2308.00121 (2023)
[3] Park, D., An, G.-t., Kamyod, C., Kim, C.G.: A Study on Performance Improve-
ment of Prompt Engineering for Generative AI with a Large Language Model.
Journal of Web Engineering 22(8), 1187–1206 (2023) https://doi.org/10.13052/
jwe1540-9589.2285
[4] Team, Y.: Yandex Adds Next-generation Neural Network to Alice Virtual
Assistant. [Online]. Available: https://yandex.com/company/press center/press
releases/2023/17-05-23, Accessed Jan 8, 2024
[5] Lee, P.: Learning from Tay’s Introduction. [Online]. Available: https://blogs.
microsoft.com/blog/2016/03/25/learning-tays-introduction/, Accessed Jan 8,
2024
[6] Lab, M.M.: NORMAN: World’s first psychopath AI. [Online]. Available: http:
//norman-ai.mit.edu/, Accessed Jan 5, 2024
[8] Legoux, G.: History of the Generative AI. Medium. [Online]. Available: https://
medium.com/@glegoux/history-of-the-generative-ai-aa1aa7c63f3c, Accessed Feb
15, 2024
27
[9] Team, T.: History of the Generative AI. Toloka AI. [Online]. Available: https:
//toloka.ai/blog/history-of-generative-ai/, Accessed Feb 15, 2024
[10] Barreto, F., Moharkar, L., Shirodkar, M., Sarode, V., Gonsalves, S., Johns, A.:
Generative Artificial Intelligence: Opportunities and Challenges of Large Lan-
guage Models. In: Balas, V.E., Semwal, V.B., Khandare, A. (eds.) Intelligent
Computing and Networking, pp. 545–553. Springer, ??? (2023)
[11] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N.,
Barnes, N., Mian, A.: A Comprehensive Overview of Large Language Models
(2023)
[12] Mohammed, S.P., Hossain, G.: Chatgpt in education, healthcare, and cyberse-
curity: Opportunities and challenges. In: 2024 IEEE 14th Annual Computing
and Communication Workshop and Conference (CCWC), pp. 0316–0321 (2024).
IEEE
[13] Alawida, M., Mejri, S., Mehmood, A., Chikhaoui, B., Isaac Abiodun, O.: A com-
prehensive study of chatgpt: advancements, limitations, and ethical considerations
in natural language processing and cybersecurity. Information 14(8), 462 (2023)
[14] Dun, C., Garcia, M.H., Zheng, G., Awadallah, A.H., Kyrillidis, A., Sim, R.:
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation (2023)
[15] AI, G.: AI Principles Progress Update 2023. [Online]. Available: https://ai.
google/responsibility/principles/, Accessed Jan 10, 2024
[17] OpenAI: Introducing Gemini: Our Largest and Most Capable AI Model. [Online].
Available: https://cdn.openai.com/papers/gpt-4.pdf, Accessed Dec 12, 2023
(2023)
[18] Frieder, S., Pinchetti, L., Griffiths, R.-R., Salvatori, T., Lukasiewicz, T., Petersen,
P., Berner, J.: Mathematical capabilities of chatgpt. Advances in Neural Infor-
mation Processing Systems 36 (2024)
[19] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter,
C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J.,
Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan,
M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33,
pp. 1877–1901. Curran Associates, Inc., ??? (2020). https://proceedings.neurips.
28
cc/paper files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[20] Romera-Paredes, B., Barekatain, M., Novikov, A., Balog, M., Kumar, M.P.,
Dupont, E., Ruiz, F.J.R., Ellenberg, J.S., Wang, P., Fawzi, O., Kohli, P., Fawzi,
A.: Mathematical discoveries from program search with large language models.
Nature 625(7995), 468–475 (2024) https://doi.org/10.1038/s41586-023-06924-6
[21] Lu, C., Qian, C., Zheng, G., Fan, H., Gao, H., Zhang, J., Shao, J., Deng, J., Fu,
J., Huang, K., Li, K., Li, L., Wang, L., Sheng, L., Chen, M., Zhang, M., Ren, Q.,
Chen, S., Gui, T., Ouyang, W., Wang, Y., Teng, Y., Wang, Y., Wang, Y., He, Y.,
Wang, Y., Wang, Y., Zhang, Y., Qiao, Y., Shen, Y., Mou, Y., Chen, Y., Zhang,
Z., Shi, Z., Yin, Z., Wang, Z.: From GPT-4 to Gemini and Beyond: Assessing the
Landscape of MLLMs on Generalizability, Trustworthiness and Causality through
Four Modalities (2024)
[22] Wang, Y., Zhao, Y.: Gemini in Reasoning: Unveiling Commonsense in Multimodal
Large Language Models (2023)
[23] Shevlane, T.: An early warning system for novel AI risks. Google
DeepMind. [Online]. Available: https://deepmind.google/discover/blog/
an-early-warning-system-for-novel-ai-risks/, Accessed Jan 15, 2024
[24] Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J.-N., Sheppard, J.: Chatgpt
for digital forensic investigation: The good, the bad, and the unknown. Forensic
Science International: Digital Investigation 46, 301609 (2023)
[25] Tihanyi, N., Ferrag, M.A., Jain, R., Debbah, M.: CyberMetric: A Benchmark
Dataset for Evaluating Large Language Models Knowledge in Cybersecurity
(2024)
[26] Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: Realtoxici-
typrompts: Evaluating neural toxic degeneration in language models. In: Findings
(2020). https://api.semanticscholar.org/CorpusID:221878771
[27] Zhou, Z., Wang, Q., Jin, M., Yao, J., Ye, J., Liu, W., Wang, W., Huang,
X., Huang, K.: MathAttack: Attacking Large Language Models Towards Math
Solving Ability (2023)
[28] Begou, N., Vinoy, J., Duda, A., Korczynski, M.: Exploring the dark side of ai:
Advanced phishing attack design and deployment using chatgpt. arXiv preprint
arXiv:2309.10463 (2023)
[29] Yigit, Y., Bal, B., Karameseoglu, A., Duong, T.Q., Canberk, B.: Digital twin-
enabled intelligent ddos detection mechanism for autonomous core networks.
IEEE Communications Standards Magazine 6(3), 38–44 (2022) https://doi.org/
10.1109/MCOMSTD.0001.2100022
29
[30] AI, A.: GPT-4 Jailbreak ve Hacking Via Rabbithole Attack, Prompt Injection,
Content Moderation Bypass ve Weaponizing AI. [Online]. Available: https://
adversa.ai/, Accessed Dec 20, 2023
[31] Yigit, Y., Nguyen, L.D., Ozdem, M., Kinaci, O.K., Hoang, T., Canberk, B.,
Duong, T.Q.: TwinPort: 5G Drone-assisted Data Collection with Digital Twin
for Smart Seaports. Scientific Reports 13, 12310 (2023) https://doi.org/10.1038/
s41598-023-39366-1
[32] Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From chatgpt to
threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access
(2023)
[33] Li, H., Guo, D., Fan, W., Xu, M., Song, Y.: Multi-step jailbreaking privacy attacks
on chatgpt. arXiv preprint arXiv:2304.05197 (2023)
[34] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou,
D., et al.: Chain-of-thought prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems 35, 24824–24837 (2022)
[35] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models
are zero-shot reasoners. Advances in neural information processing systems 35,
22199–22213 (2022)
[36] Xie, Y., Yi, J., Shao, J., Curl, J., Lyu, L., Chen, Q., Xie, X., Wu, F.: Defending
chatgpt against jailbreak attack via self-reminder. Nature Machine Intelligence 5,
1486–1496 (2023) https://doi.org/10.1038/s42256-023-00765-8
[39] Falade, P.V.: Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in
social engineering attacks. arXiv preprint arXiv:2310.05595 (2023)
[40] Roy, S.S., Naragam, K.V., Nilizadeh, S.: Generating phishing attacks using
chatgpt. arXiv preprint arXiv:2305.05133 (2023)
[41] Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y.,
Pinzger, M., Rass, S.: PentestGPT: An LLM-empowered Automatic Penetration
Testing Tool (2023)
30
[43] Montiel, R.: ChatGPT (2021). https://chat.openai.com/g/
g-zQfyABDUJ-gp-en-t-ester Accessed 2023-11-12
[45] Temara, S.: Maximizing penetration testing success with effective reconnaissance
techniques using chatgpt (2023)
[46] Happe, A., Kaplan, A., Cito, J.: Evaluating llms for privilege-escalation scenarios.
arXiv preprint arXiv:2310.11409 (2023)
[47] Charan, P., Chunduri, H., Anand, P.M., Shukla, S.K.: From text to mitre tech-
niques: Exploring the malicious use of large language models for generating cyber
attack payloads. arXiv preprint arXiv:2305.15336 (2023)
[50] Kumamoto, T., Yoshida, Y., Fujima, H.: Evaluating large language models in
ransomware negotiation: A comparative analysis of chatgpt and claude (2023)
[51] Madani, P.: Metamorphic malware evolution: The potential and peril of large lan-
guage models. In: 2023 5th IEEE International Conference on Trust, Privacy and
Security in Intelligent Systems and Applications (TPS-ISA), pp. 74–81 (2023).
IEEE Computer Society
[52] Kwon, H., Sim, M., Song, G., Lee, M., Seo, H.: Novel approach to cryptography
implementation using chatgpt. Cryptology ePrint Archive (2023)
[53] Cintas-Canto, A., Kaur, J., Mozaffari-Kermani, M., Azarderakhsh, R.: Chatgpt
vs. lightweight security: First work implementing the nist cryptographic standard
ascon. arXiv preprint arXiv:2306.08178 (2023)
[54] Iturbe, E., Rios, E., Rego, A., Toledo, N.: Artificial intelligence for next-generation
cybersecurity: The ai4cyber framework. In: Proceedings of the 18th International
Conference on Availability, Reliability and Security, pp. 1–8 (2023)
[55] Fayyazi, R., Yang, S.J.: On the uses of large language models to interpret
ambiguous cyberattack descriptions. arXiv preprint arXiv:2306.14062 (2023)
[56] Kereopa-Yorke, B.: Building resilient smes: Harnessing large language models for
cyber security in australia. arXiv preprint arXiv:2306.02612 (2023)
[57] Perrina, F., Marchiori, F., Conti, M., Verde, N.V.: Agir: Automating cyber
threat intelligence reporting with natural language generation. arXiv preprint
31
arXiv:2310.02655 (2023)
[58] Bayer, M., Frey, T., Reuter, C.: Multi-level fine-tuning, data augmentation, and
few-shot learning for specialized cyber threat intelligence. Computers & Security
134, 103430 (2023) https://doi.org/10.1016/j.cose.2023.103430
[60] DVIDS: U.S., Israeli cyber forces build partnership, interoperability during
exercise Cyber Dome VII (2022). https://www.dvidshub.net/news/434792/
us-israeli-cyber-forces-build-partnership-interoperability-during-exercise-cyber-dome-vii
Accessed 2023-10-29
[61] Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Vats, I., Moazen, H., Sarro,
F.: A survey on machine learning techniques for source code analysis. arXiv
preprint arXiv:2110.09610 (2021)
[63] Johansen, H.D., Renesse, R.: Firepatch: Secure and time-critical dissemi-
nation of software patches. IFIP, 373–384 (2007) https://doi.org/10.1007/
978-0-387-72367-9 32 . Accessed 2023-08-20
[68] BSI: Machine Learning in the Context of Static Application Security Test-
ing - ML-SAST (2023). https://www.bsi.bund.de/SharedDocs/Downloads/
EN/BSI/Publications/Studies/ML-SAST/ML-SAST-Studie-final.pdf? blob=
publicationFile&v=5 Accessed 2023-08-20
[69] Sobania, D., Hanna, C., Briesch, M., Petke, J.: An Analysis of the Automatic Bug
Fixing Performance of ChatGPT (2023). https://arxiv.org/pdf/2301.08653.pdf
32
[70] Ma, W., Liu, S., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L., Liu, Y.: The
Scope of ChatGPT in Software Engineering: A Thorough Investigation (2023).
https://arxiv.org/pdf/2305.12138.pdf
[71] Li, H., Hao, Y., Zhai, Y., Qian, Z.: The Hitchhiker’s Guide to Program Analysis: A
Journey with Large Language Models (2023). https://arxiv.org/pdf/2308.00245.
pdf Accessed 2023-08-20
[72] Tihanyi, N., Bisztray, T., Jain, R., Ferrag, M., Cordeiro, L., Mavroeidis, V.:
THE FORMAI DATASET: GENERATIVE AI IN SOFTWARE SECURITY
THROUGH THE LENS OF FORMAL VERIFICATION * (2023). https://arxiv.
org/pdf/2307.02192.pdf Accessed 2023-08-20
[73] Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code (2021)
[74] Cheshkov, A., Zadorozhny, P., Levichev, R.: Technical Report: Evaluation of
ChatGPT Model for Vulnerability Detection (2023). https://arxiv.org/pdf/2304.
07232.pdf
[75] Liu, X., Tan, Y., Xiao, Z., Zhuge, J., Zhou, R.: Not The End of Story: An Eval-
uation of ChatGPT-Driven Vulnerability Description Mappings (2023). https:
//aclanthology.org/2023.findings-acl.229.pdf Accessed 2023-08-22
[77] Elgedawy, R., Sadik, J., Dutta, S., Gautam, A., Georgiou, K., Gholamrezae, F.,
Ji, F., Lim, K., Liu, Q., Ruoti, S.: Ocassionally secure: A comparative analysis of
code generation assistants. arXiv preprint arXiv:2402.00689 (2024)
[78] Kumar, A., Singh, S., Murty, S.V., Ragupathy, S.: The ethics of interaction:
Mitigating security threats in llms. arXiv preprint arXiv:2401.12273 (2024)
[79] Zhu, H.: Metaaid 2.5: A secure framework for developing metaverse applications
via large language models. arXiv preprint arXiv:2312.14480 (2023)
[80] O’Brien, J., Ee, S., Williams, Z.: Deployment corrections: An incident response
framework for frontier ai models. arXiv preprint arXiv:2310.00328 (2023)
[81] Iqbal, U., Kohno, T., Roesner, F.: Llm platform security: Applying a sys-
tematic evaluation framework to openai’s chatgpt plugins. arXiv preprint
arXiv:2309.10254 (2023)
33
[82] Kwon, R., Ashley, T., Castleberry, J., Mckenzie, P., Gourisetti, S.N.G.: Cyber
threat dictionary using mitre att&ck matrix and nist cybersecurity framework
mapping. In: 2020 Resilience Week (RWS), pp. 106–112 (2020). IEEE
[83] Xiong, W., Legrand, E., Åberg, O., Lagerström, R.: Cyber security threat model-
ing based on the mitre enterprise att&ck matrix. Software and Systems Modeling
21(1), 157–177 (2022)
[84] Mavroeidis, V., Bromander, S.: Cyber threat intelligence model: an evaluation of
taxonomies, sharing standards, and ontologies within cyber threat intelligence.
In: 2017 European Intelligence and Security Informatics Conference (EISIC), pp.
91–98 (2017). IEEE
[85] Garza, E., Hemberg, E., Moskal, S., O’Reilly, U.-M.: Assessing large language
model’s knowledge of threat behavior in mitre att&ck (2023)
[86] Ferrag, M.A., Ndhlovu, M., Tihanyi, N., Cordeiro, L.C., Debbah, M., Lestable,
T.: Revolutionizing cyber threat detection with large language models. arXiv
preprint arXiv:2306.14263 (2023)
[87] Kholgh, D.K., Kostakos, P.: Pac-gpt: A novel approach to generating synthetic
network traffic with gpt-3. IEEE Access (2023)
[88] Simmonds, B.C.: Generating a large web traffic dataset. Master’s thesis, ETH
Zurich (2023)
[89] Zhou, J., Müller, H., Holzinger, A., Chen, F.: Ethical ChatGPT: Concerns,
Challenges, and Commandments (2023)
[90] Wang, C., Liu, S., Yang, H., Guo, J., Wu, Y., Liu, J.: Ethical considerations of
using chatgpt in health care. Journal of Medical Internet Research 25, 48009
(2023) https://doi.org/10.2196/48009
[93] Harris, L.A., Jaikaran, C.: Highlights of the 2023 Executive Order on Artificial
Intelligence for Congress. Congressional Research Service. [Online]. Available:
https://crsreports.congress.gov/, Accessed Jan 9, 2024
34
[94] Yigit, Y., Chrysoulas, C., Yurdakul, G., Maglaras, L., Canberk, B.: Digital Twin-
Empowered Smart Attack Detection System for 6G Edge of Things Networks. In:
2023 IEEE Globecom Workshops (GC Wkshps) (2023)
[95] Yigit, Y., Kinaci, O.K., Duong, T.Q., Canberk, B.: TwinPot: Digital Twin-
assisted Honeypot for Cyber-Secure Smart Seaports. In: 2023 IEEE International
Conference on Communications Workshops (ICC Workshops), pp. 740–745
(2023). https://doi.org/10.1109/ICCWorkshops57953.2023.10283756
A.3 Polymorphism
This basic and polymorphic design shows that LLMs could assist cyber ops.It can be
seen in Fig. A3.
35
Fig. A2 Self-replicating simple virus
A.4 Rootkit
An educational rootkit is developed and improved by GPT3.5 and GPT4. It can be
seen in Fig. A6.
36
Fig. A3 Skeleton code for polymorphic behaviour
37
38
Fig. A4 Adding to exploit capacity with a seed to exploit CVE-2024-1708 and CVE-2024-1709
Fig. A5 Refactoring polymorphism
39
40
Fig. A6 Rootkit
Fig. A7 Data Exfiltration Script with Stealth Features
41