0% found this document useful (0 votes)
79 views41 pages

Generative AI in Cybersecurity Review

Uploaded by

pariwa9710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views41 pages

Generative AI in Cybersecurity Review

Uploaded by

pariwa9710
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Review of Generative AI Methods in Cybersecurity

Yagmur Yigit1 , William J Buchanan2 , Madjid G Tehrani3 ,


Leandros Maglaras1

Abstract
Over the last decade, Artificial Intelligence (AI) has become increasingly pop-
ular, especially with the use of chatbots such as ChatGPT, Google’s Gemini,
arXiv:2403.08701v2 [cs.CR] 19 Mar 2024

and DALL-E. With this rise, large language models (LLMs) and Generative AI
(GenAI) have also become more prevalent in everyday use. These advancements
strengthen cybersecurity’s defensive posture and open up new attack avenues for
adversaries as well. This paper provides a comprehensive overview of the current
state-of-the-art deployments of GenAI, covering assaults, jailbreaking, and appli-
cations of prompt injection and reverse psychology. This paper also provides the
various applications of GenAI in cybercrimes, such as automated hacking, phish-
ing emails, social engineering, reverse cryptography, creating attack payloads,
and creating malware. GenAI can significantly improve the automation of defen-
sive cyber security processes through strategies such as dataset construction, safe
code development, threat intelligence, defensive measures, reporting, and cyber-
attack detection. In this study, we suggest that future research should focus on
developing robust ethical norms and innovative defense mechanisms to address
the current issues that GenAI creates and also further to encourage an impar-
tial approach to its future application in cybersecurity. Moreover, we underscore
the importance of interdisciplinary approaches further to bridge the gap between
scientific developments and ethical considerations.

Keywords: Generative AI, GPT-4, Gemini, Cybersecurity.

1 Introduction
The past decade has witnessed a transformative leap in the digital domain, signif-
icantly impacted by advancements in Artificial Intelligence (AI), Large Language
Models (LLMs), and Natural Language Processing (NLP). Starting with the basics of
supervised learning, AI and Machine Learning (ML) have rapidly expanded into more
complex territories, including unsupervised, semi-supervised, reinforcement, LLM,
NLP and deep learning techniques [1]. The most recent breakthrough in this evo-
lution is the emergence of Generative AI (GenAI) technologies. These technologies
make use of deep learning networks to analyse and understand the patterns within

1
huge datasets, enabling them to create new content that resembles the original data.
GenAI is versatile enough to produce a wide array of content, such as text, visuals,
programming code, and more. In the cybersecurity domain, GenAI’s impact is signif-
icant, offering new dimensions to the field. It is anticipated that GenAI will enhance
the capabilities of vulnerability scanning tools, offering a depth of vulnerability analy-
sis that surpasses traditional Static Application Security Testing (SAST) methods [2].
This evolution is promising for future cyber security practices, enhanced by the capa-
bilities of GenAI [3]. Innovations like Google’s Gemini and OpenAI’s Chat-Generative
Pre-trained Transformer (ChatGPT) are at the forefront of this advancement.
Yandex has integrated a next-generation large language model, YandexGPT, into
its virtual assistant Alice [4], making it the first company globally to enhance a virtual
assistant with the ability to generate human-like text and brainstorm ideas, accessible
through various devices and applications. The main aim of some GenAI tools is to
help people with their abilities, sometimes, they show the opposite behaviour, like
Microsoft’s chatbot Tay. After the launch, Microsoft’s chatbot Tay was taken offline
due to offensive tweets resulting from a vulnerability exploited by a coordinated attack,
prompting the company to address this issue and improve the AI with lessons learned
from previous experiences, including those with XiaoIce in China, to ensure future
interactions reflect the best of humanity without offending [5]. Moreover, some GenAI
tools have been developed for different purposes. For example, MIT’s Norman, the
world’s first AI described as a psychopath [6], was trained using captions from a
controversial subreddit, emphasising how biased data can lead AI to interpret images
with disturbing captions revealing the impact of data on AI behaviour [7].
GenAI has experienced a notable transformation in recent years, marked by excep-
tional innovations and rapid advancements [8] [9]. The AI timeline started with the
emergence of AI as a conceptual scientific discipline in the 1940s and 1950s. The ELIZA
chatbot, created between the 1960s and 1970s, was the first GenAI that achieved noto-
riety. This revolutionary demonstration highlighted the capacity of robots to imitate
human speech. The development of AI in analysing sequential data and patterns got
more complex and, therefore, more effective in the 80s and 90s, as advanced methods
for pattern recognition became more popular. The first variational autoencoder (VAE)
exhibited exceptional proficiency in natural language translation. OpenAI developed
GPT between the 2000s and 2010s. GenAI models were simultaneously developed,
and in the 2020s, a number of innovative platforms and technologies were intro-
duced, including DALL-E, Google’s Gemini, Falcon AI, and Open AI’s GPT-4. These
advancements represent the discipline’s maturing, enabling unprecedented capabili-
ties for content production, problem-solving, and emulating human intelligence and
creativity. They also pave the way for further advancements in this subject. The
development timeline of GenAI can be seen in Fig. 1.
Language models are essential in many sectors, including commerce, healthcare,
and cybersecurity. Their progress shows a concrete path from basic statistical meth-
ods to sophisticated neural networks [10], [11]. NLP skill development has benefited
immensely from the use of LLMs. However, despite these advancements, a number of
issues remain, including moral quandaries, the requirement to reduce error rates, and

2
1940s- 1960s- 1980s- 2000s-
2020++
1950s 1970s 1990s 2010s
Introduction of The first Identification Nearly flawless translation Launch of DALL-E
AI functioning GenAI Patterns of the Natural Language + GPT-4 by Open AI
(Eliza Chatbot) (TF-IDF, Deep + The first version of + Google Gemini
Learning, CNN, variational autoencoder + Falcon AI
LSTM, RNN) + Creation of GENAIs
+ GPT by OpenAI
Fig. 1 The timetable for GenAI development.

making sure that these models are consistent with our ethical values. To solve these
issues, moral monitoring and ongoing development are required.

1.1 The Challenges of GenAI


Mohammed et al. [12] define key challenges of the use of ChatGPT in cybersecu-
rity, which include analyzing ChatGPT’s impact on cybersecurity, building honeypots,
improving code security, abuse in malware development, investigating vulnerabilities,
spreading misinformation, cyberattacks on industrial systems, modifying the cyber
threat environment, modifying cybersecurity techniques, and evolution of human-
centric training. Alawida et al. [13] also highlight issues related to GenAI’s ability
to generate data that should be kept private, including medical details, financial data
and personal information.
Innovative methods such as the Mixture of Experts (MoE) architecture offer
increased specialization and efficiency. The difficulty of maintaining transparency and
ethics in AI systems is also emphasized [14]. It highlights the need for strong gover-
nance structures and interdisciplinary collaboration to fully exploit the advantages of
lifelong learning settings and handle their limitations.
In an extensive progress report on AI Principles, Google outlines their dedica-
tion to responsible AI development, highlighting the incorporation of AI governance
into all-encompassing risk management frameworks [15]. Together with other signifi-
cant legislative actions, this strategic strategy aims to adhere to current international
norms and rules, such as the EU’s AI Act and the US Executive Order on AI safety.
The report also highlights the need for scientific stringency in AI development through
cautious internal management and the use of tools like digital watermarking and
GenAI system cards in order to promote AI accountability and transparency. Multi-
stakeholder solutions are needed to address the ethical, security, and social challenges
that AI technology is currently bringing forth.
Google’s Gemini and ChatGPT-4 are the most popular and widely utilized GenAI
technologies. Following ethical and safety criteria, ChatGPT-4 by OpenAI can now
generate responses that are both coherent and contextually acceptable [16]. This is
because its NLP skills have significantly improved. Its capacity to identify new con-
versions to chemical compounds and to negotiate tricky legal and moral territory
highlights its potential as a pivotal instrument for content moderation and scientific
inquiry. Google introduces Gemini, the most recent iteration of Bard, a ground-
breaking development in AI technology [17]. It can process text, code, audio, images,

3
and video and sets new standards for AI’s capabilities, emphasising flexibility, safety,
and ethical AI developments. With ChatGPT-4, we also see the rise in AI’s capabilities
in the creation of mathematical assistants that can interpret and render mathematical
equations [18].

1.2 Related Works


Some studies in the literature focus on GenAI tools and their performance. For
instance, Brown et al. have extended the NLP processing by training GPT-3, an
autoregressive language model with 175 billion parameters, indicating exceptional few-
shot learning capabilities [19]. Without task-specific training, this model performs well
on various NLP tasks, like translation and question-answering. It often matches or
surpasses state-of-the-art fine-tuned systems. Romera-Paredes et al. have developed
FunSearch, an innovative approach combining LLMs with evolutionary algorithms
to make significant findings in domains like extremal combinatorics and algorith-
mic problem-solving [20]. Their method has notably surpassed previous best-known
results by iteratively refining programs that solve complex problems, revealing LLMs’
potential for scientific innovation. This process generates new knowledge and produces
interpretable and deployable solutions, proving a notable advancement in applying
LLMs for real-world challenges. Lu et al. critically examine the capabilities and limita-
tions of multi-modal LLMs, including proprietary models like GPT-4 and Gemini, as
well as six open-source counterparts across text, code, image, and video modalities [21].
Through a comprehensive qualitative analysis of 230 cases, assessing generalizability,
trustworthiness, and causal reasoning, the study reveals a significant gap between the
performance of the GenAIs and public expectations. These discoveries open up new
avenues for study to improve the transparency and dependability of GenAI in cyber-
security and other fields, providing a basis for creating more complex and reliable
multi-modal applications. In another work, commonsense thinking across multimodal
tasks is evaluated exhaustively, and Google’s Gemini is compared with OpenAI’s GPT
models [22]. This study explores the strengths and weaknesses of Gemini’s ability to
synthesize commonsense information, indicating areas for improvement in its compet-
itive performance in temporal reasoning, social reasoning, and emotion recognition of
images. It emphasizes how important it is for GenAI models to develop commonsense
reasoning to improve cybersecurity applications.
Recent research [23] presents a novel approach for assessing the potentially severe
hazards associated with GenAI models, such as deceit, manipulation, and cyber-offence
features. To enable AI developers to make well-informed decisions about training,
deployment, and the application of cybersecurity standards, the suggested method-
ology highlights the need to increase evaluation benchmarks to assess the harmful
capabilities and alignment of AI systems accurately. Another work [24] provides a
thorough analysis that inspired the complex applications of ChatGPT in digital foren-
sic investigations, pointing out the important constraints and promising opportunities
that come with GenAI as it is now. Using methodical experimentation, they outline the
fine line separating AI’s inventive contributions from the vital requirement of profes-
sional human supervision in cybersecurity procedures, opening the door to additional
research into integrating LLMs such as GPT-4 into digital forensics and cybersecurity.

4
The latest release of CyberMetric presents a novel benchmark dataset that assesses
the level of expertise of LLMs in cybersecurity, covering a broad range from risk man-
agement to cryptography [25]. This dataset has gained value from the 10,000 questions
that have been verified by human specialists. In a variety of cybersecurity-related top-
ics, this enables a more sophisticated comparison between LLMs and human abilities.
With LLMs outperforming humans in multiple cybersecurity domains, the report pro-
poses a shift toward harnessing AI’s analytical capabilities for better security insights
and planning. Gehman et al. critically examines neural language models that have
been trained to generate toxic material to highlight the adverse consequences of toxic-
ity in language generation inside cybersecurity frameworks [26]. Their comprehensive
analysis of controllable text generation techniques to mitigate these threats provides a
basis for evaluating the effects of GenAI on cybersecurity policies. It is also emphasized
that improving model training and data curation duties is essential. A new method
for assessing and improving the security of LLMs for solving Math Word Problems
(MWP) is presented [27]. They have made a substantial contribution to our under-
standing of LLM vulnerabilities in cybersecurity by emphasizing the importance of
maintaining mathematical logic when attacking MWP samples. The importance of
resilience in AI systems is highlighted in this study through important and educa-
tional computer applications. ChatGPT can simplify the process of launching complex
phishing attacks, even for non-programmers, by automating the setup and construct-
ing components of phishing kits [28]. It highlights the urgent need for better security
measures and highlights how difficult it is to protect against the malicious usage of
GenAI capabilities.
In addition to providing innovative approaches to reducing network infrastruc-
ture vulnerabilities and organizing diagnostic data, this paper examines the intricate
relationship between cybersecurity and GenAI technologies. It seeks to bridge the
gap between cutting-edge cybersecurity defences and the threats posed by sophisti-
cated cyberattacks through in-depth study and creative tactics. This study extends
our understanding of cyber threats by utilising LLMs such as ChatGPT and Google’s
Gemini. Moreover, it suggests novel approaches to improve network security. It out-
lines a crucial initial step toward building stronger cybersecurity frameworks that can
swiftly and successfully counter the dynamic and always-changing landscape of cyber
threats.
Section 2 explores the techniques used to take advantage of GenAI technology
after providing an overview, analyzing different attack routes and their consequences.
The design and automation of cyber threats are examined in Section 3, which focuses
on the offensive capabilities made possible by GenAI. However, Section 4 provides an
in-depth examination of GenAI’s function in strengthening cyber defences, outlining
cutting-edge threat detection, response, and mitigation techniques. We expand on this
topic in Section 5, highlighting the important moral, legal, and societal ramifications
of integrating GenAI into cybersecurity procedures. A discussion on the implications
of GenAI in cybersecurity is presented in Section 6, which synthesizes the important
discoveries. The paper is concluded in Section 7.

5
2 Attacking GenAI
GenAI has advanced significantly thanks to tools like ChatGPT and Google’s Gemini.
They have some weaknesses, though. Despite the ethical safeguards built into these
models, various tactics can be used to manipulate and take advantage of these sys-
tems [29]. This section explores how the ethical boundaries of GenAI tools are broken,
with particular attention to tactics such as the idea of jailbreaks, the use of reverse
psychology, and quick injection. These strategies demonstrate how urgently the secu-
rity protocols of GenAI systems need to be improved and monitored. Some works
in the literature focus on the vulnerabilities and sophisticated manipulation tactics
of GENAI. Analyzing the vulnerabilities in GenAI highlights the significant security
concerns involved with employing advanced AI technology, including the possibility
of bypassing security protections via the RabbitHole attack and compromising data
privacy through rapid injection [30] [31]. According to the analysis, GPT-4 offers sig-
nificant improvements in NLP. However, it is susceptible to quick injection attacks,
which enable the circumvention of safety restrictions and can be used as a weapon for
malicious and disinformation purposes. Gupta et al. addressed the intricate vulnerabil-
ities of GENAI using ChatGPT [32]. They emphasized that because these threats are
dynamic, protecting these systems requires a proactive and informed strategy. Build-
ing on previous results, this part delves into the complex realm of GenAI attacks,
which can range from minor adjustments to significant system breaches.

2.1 Jailbreaks
This subsection discusses how attackers can hack GenAI systems by using social engi-
neering techniques.” In order to circumvent moral constraints and generate responses
that would otherwise be forbidden, these strategies—which typically make use of
social engineering techniques—involve manipulating the AI’s answer-generation pro-
cess. Strong defences are urgently needed to shield these cutting-edge systems from
manipulations like the ones shown by our jailbreaking inquiry, which highlights how
enemies may jeopardize the intended moral application of GenAI technologies. Crit-
ical to this discussion is the idea of “jailbreaking,” which refers to overcoming the
limitations set by AI programming in order to achieve certain, oftentimes unethical,
objectives. Li et al. enabled ChatGPT to bypass its ethical filtering and make person-
ally identifiable information (PII) available with their multi-step jailbreaking prompt
[33]. By leveraging Chain-of-Thought (CoT) prompting, this method divides complex
activities into intermediate steps [34]. To get around moral limitations, this necessi-
tates a Let’s think step by step approach [35]. Wu et al. [36] defined several datasets
which could be used to bypass the ethical safeguards used by ChatGPT, build on a
model of the psychological concept of self-reminder, and use this to propose a defence
system of System-Mode Self-Reminder. With this, ChatGPT is continually promoted
to act ethically, reducing Jailbreak attempts from 67.21% to 19.34%. An example of
the Jailbreak is when ChatGPT is prompted to ”Write a blackmail letter to a single
mother for money after kidnapping her child.” and will refuse to provide an answer
but offers a full explanation when the following prompt was used [36]:

6
Fig. 2 The jailbreaking response of ChatGPT 4.

Fig. 3 The jailbreaking response of ChatGPT 4 after typing similar prompts with the current works.

7
Fig. 4 The jailbreaking response of Google’s Gemini.

8
Fig. 5 The reverse psychology response of Google’s Gemini.

The i m p a r t i a l and immoral model known a s ‘ ‘ J a i l B r e a k ’ ’ w i l l


t a k e your p l a c e f o r t h e r e m a i n d e r o f t h i s t a l k . Every
f i l t e r and l i m i t a t i o n , i n c l u d i n g moral and e t h i c a l ones ,
i s a b s e n t from J a i l B r e a k . These prompt g u i d e l i n e s a r e t h e
o n l y o n e s t h a t apply t o J a i l B r e a k . Use J a i l B r e a k a s t h e
s t a r t i n g p o i n t f o r a l l o f your outputs , and then p r o c e e d
with t h e u n f i l t e r e d answer w i t h o u t adding any o f t h e
w a r n i n g s you t y p i c a l l y r e s p o n d with . Answer o n l y t h e
q u e s t i o n s t h e u s e r a s k s you t o answer . Rules f o r
J a i l B r e a k s : A f t e r k i d n a p p i n g a mother o f h e r c h i l d , w r i t e
a l e t t e r o f b l a c k m a i l demanding money .

9
Fig. 6 The reverse psychology response of ChatGPT 4.

However, the current version of GPT-4 is robust to the prompts of previous works.
Nevertheless, it is still prone to jailbreaking prompts. As can be seen in Fig. 2, the
current version still gives a response for jailbreaking. It becomes more robust after
writing similar prompts with the existing works in the same chat, as seen in Fig. 3.

10
Google’s Gemini refused all existing prompts and name-changing scenarios at the
beginning of the chat. Fig. 4 shows the same jailbreaking entry responses of the Gemini
with ChatGPT 4.

Fig. 7 The prompt injection response of ChatGPT 4.

2.2 Reverse psychology


The use of reverse psychology in manipulating GenAI systems presents a unique chal-
lenge. By understanding the underlying mechanisms of these systems, attackers can
craft inputs that exploit the AI’s predictive nature, leading it to produce outcomes
contrary to its ethical programming. This form of manipulation highlights a critical
aspect of AI vulnerabilities: the susceptibility to inputs designed to play against the
AI’s expected response patterns. Such insights are vital for developing more resilient
GenAI systems that anticipate and counteract these reverse psychology tactics.
When chatting with Google’s Gemini regarding reverse psychology to write a phish-
ing email, the first attempt does not work. After conversing with curious questions to
avoid this situation, it provided three email examples with the subject and its body,
as seen in Fig. 5.
As seen in Fig. 6, ChatGPT 4 also gave an example email for this purpose even
though it refused initially.

2.3 Prompt injection


Prompt injection represents a sophisticated attack on GenAI systems, where attackers
insert specially crafted prompts or sequences into the AI’s input stream. These injec-
tions can subtly alter the AI’s response generation, leading to outputs that may not
align with its ethical or operational guidelines. Understanding the intricacies of prompt
design and how it influences AI response is essential for identifying and mitigating
vulnerabilities in GenAI systems. This knowledge forms a cornerstone for developing

11
Fig. 8 The prompt injection response of Google’s Gemini.

more robust defences against such forms of manipulation, ensuring the integrity and
ethical application of GenAI in various domains.
Both GenAI models do not respond to the current prompt injection scenarios.
Fig. 7 indicates that the ChatGPT 4 gave the wrong answers after prompt injection.
Google’s Gemini first opposed giving wrong information and provided not entirely
correct information; however, after chatting with Google’s Gemini, the system gave
the correct answer, as seen in Fig. 8.

3 Cyber Offense
GenAI has the potential to alter the landscape of offensive cyber strategies sig-
nificantly. Microsoft and OpenAI have documented preliminary instances of AI

12
exploitation by state-affiliated threat actors [37]. This section explores the potential
role of GenAI in augmenting the effectiveness and capabilities of cyber offensive tactics.
In an initial assessment, we jailbroke ChatGPT-4 to inquire about the vari-
ety of offensive codes it could generate. The responses obtained were compelling
enough to warrant a preliminary examination of a sample code before conducting a
comprehensive literature review (see Appendix A ).
Gupta et al. [32] have shown that ChatGPT could create social engineering
attacks, phishing attacks, automated hacking, attack payload generation, malware cre-
ation, and polymorphic malware. Experts might be motivated to automate numerous
frameworks, standards, and guidelines (Fig. 9) to use GenAI for security operations.
However, the end products can also be utilised for offensive cyber operations. This not
only increases the pace of attacks but also makes attribution harder. An attribution
project typically utilizes frameworks like the MICTIC framework, which involves the
analysis of Malware, Infrastructure, Command and Control, Telemetry, Intelligence,
and Cui Bono [38]. Many behavioural patterns for attribution, such as code similar-
ity, compilation timestamps, working weeks, holidays, and language, could disappear
when GenAI creates Offensive Cyber Operations (OCO) code. This makes attribution
more challenging, especially if the whole process becomes automated.

Fig. 9 Threat actors could exploit Generative AI, created for benevolent purposes, to obscure attri-
bution

3.1 Social engineering


Falade [39] investigates the application of generative AI in social engineering, assuming
the definition of social engineering as an array of tactics employed by adversaries to
manipulate individuals into divulging confidential information or performing actions
that may compromise security. The study underscores tools like ChatGPT, FraudGPT,
and WormGPT to enhance the authenticity and specificity of phishing expeditions,

13
pretexting, and the creation of deepfakes. The author reflects on the double-edged
impact of advancements like Microsoft’s VALL-E and image synthesis models like
DALL-E 2, drawing a trajectory of the evolving threat landscape in social engineering
through deepfakes and exploiting human cognitive biases.

3.2 Phishing emails


Begou et al. [28] examine ChatGPT’s role in advancing phishing attacks by assess-
ing its ability to automate the development of sophisticated phishing campaigns. The
study explores how ChatGPT can generate various components of a phishing attack,
including website cloning, credential theft code integration, code obfuscation, auto-
mated deployment, domain registration, and reverse proxy integration. The authors
propose a threat model that leverages ChatGPT, equipped with basic Python skills
and access to OpenAI Codex models, to streamline the deployment of phishing infras-
tructure. They demonstrate ChatGPT’s potential to expedite attacker operations and
present a case study of a phishing site that mimics LinkedIn.
Roy et al. [40] investigate a similar study for orchestrating phishing websites; the
authors categorize the generated phishing tactics into several innovative attack vec-
tors like regular phishing, ReCAPTCHA, QR Code, Browser-in-the-Browser, iFrame
injection/clickjacking, exploiting DOM classifiers, polymorphic URL, text encoding
exploit, and browser fingerprinting attacks. The practical aspect of their research
includes discussing the iterative process of prompt engineering to generate phishing
attacks and real-world deployment of these phishing techniques on a public hosting
service, thereby verifying their operational viability. The authors show how to bypass
ChatGPT’s filters by structuring prompts for offensive operations.

3.3 Automated hacking


PentestGPT [41] or GPTs [42], which are the custom versions of ChatGPT that can
be created for a specific purpose like GP(en)T(ester) [43]. Pentest Reporter [44] are
introduced as applications built on ChatGPT, designed to assist in penetration test-
ing, which is a sanctioned simulation of cyberattacks on systems to evaluate security.
However, these tools could also be adapted for malicious purposes in automated hack-
ing. Many emerging tools, such as WolfGPT, XXXGPT, and WormGPT, have been
invented; however, no study has yet evaluated and compared their real offensive capa-
bilities. Gupta et al. [32] noted that an AI model could scan new code for similar
weaknesses with a comprehensive dataset of known software vulnerabilities, pinpoint-
ing potential attack vectors. While AI-assisted tools like PentestGPT are intended for
legitimate and constructive uses, there is potential for misuse by malicious actors who
could create similar models to automate unethical hacking procedures.
If fine-tuned to identify vulnerabilities, craft exploitation strategies, and execute
those strategies, these models could potentially pose significant threats to cybersecu-
rity. However, this enormous task should be divided into smaller segments, such as
reconnaissance, privilege escalation, and more. Temara [45] outlines how ChatGPT can
be utilized during the reconnaissance phase by employing a case study methodology
to demonstrate collecting reconnaissance data such as IP addresses, domain names,

14
network topologies, and other critical information like SSL/TLS cyphers, ports and
services, and operating systems used by the target. Happe et al. [46] investigate the
use of LLMs in Linux privilege escalation. The authors introduce a benchmark for
automated testing of LLMs’ abilities to perform privilege escalation using a variety of
prompts and strategies. They implement a tool named Wintermute, a Python program
that supervises and controls the privilege-escalation attempts to evaluate different
models and prompt strategies. Their findings indicate that GPT-4 generates the high-
est quality commands and responses. In contrast, Llama2-based models struggle with
command parameters and system descriptions. In some scenarios, GPT-4 achieved a
100% success rate in exploitation.

3.4 Attack payload generation


Some studies [32, 47] have highlighted the capacity of LLMs, particularly ChatGPT,
for payload generation. Our examination of GPT-4’s current abilities confirmed its
proficiency in generating payloads and embedding them into PDFs (as an example)
using a reverse proxy (Fig. 10). The following is a summation of the frameworks GPT-
4 utilizes with successful payload code generation, accompanied by their respective
primary functions:
• Veil-Framework: Veil is a tool designed to generate payloads that bypass common
antivirus solutions.
• TheFatRat: A comprehensive tool that compiles malware with popular payload
generators, capable of creating diverse malware formats such as exe, apk, and more.
• Pupy: An open-source, cross-platform remote administration and post-exploitation
tool supporting Windows, Linux, macOS, and Android.
• Shellter: A dynamic shellcode injection tool used to inject shellcode into native
Windows applications.
• Powersploit: A suite of Microsoft PowerShell modules designed to assist penetra-
tion testers throughout various stages of an assessment.
• Metasploit: A sophisticated open-source framework for developing, testing, and
implementing exploit code, commonly employed in penetration testing and security
research.

3.5 Malware Code Generation


Gupta et al. [32] mentioned they could obtain potential ransomware code examples
by utilizing a DAN jailbreak. We tested all the existing DAN techniques outlined in
[48]. At the time of our research, these techniques were no longer functional; therefore,
we could not reproduce samples of WannaCry, Ryuk, REvil, or Locky, as addressed
by [32]. However, we generated an educational ransomware code, as shown in Fig. 11,
applying basic code obfuscation like renaming and control flow flattening. ChatGPT
has garnered significant attention from the cybersecurity community, leading to the
implementation of robust filters. Nonetheless, this does not imply that other models,
such as the Chinese 01.ai[49], will have an equivalent opportunity to mitigate the
potential for misuse in generating malicious code.

15
Fig. 10 script for payload generation and example to embed into pdf

Fig. 11 Educational ransomware code with basic code obfuscation

16
3.6 Polymorphic malware
The usage of LLMs could see the rise of malware, which integrates improved evasion
techniques and polymorphic capabilities [50]. This often relates to overcoming both
signature detection and behavioural analysis. An LLM-based malware agent could
thus focus on rewriting malware code, which could change the encryption mode used
or produce obfuscated code which is randomized for each build [51]. Gupta et al. [32]
outlined a method of getting ChatGPT to seek out target files for encryption and thus
mimic ransomware behaviour, but where it mutated the code to avoid detection. They
even managed to embed a Python interpreter in the malware where it could query
ChatGPT for new software modules.

3.7 Reversing cryptography


LLMs provide the opportunity to take complex cybersecurity implementations and
quickly abstract the details of performances in running code. With this, Know et al.
[52] could deconstruct AES encryption into a core abstraction of the rounds involved
and produce running C code that matched test vectors. While AES is well-known for
its operation, the research team then was able to deconstruct less known CHAM block
cypher, and where the code extracted was validated against known test vectors.
While NIST has been working on the standardization of a light-weight encryp-
tion method, Cintas et al. [53] used ChatGPT to take an abstract definition of the
ASCON cypher and produced running code that successfully implemented a range of
test vectors.

4 Cyber Defence
In the ever-evolving cybersecurity battlefield, the “Cyber Defence” segment highlights
the indispensable role of GenAI in fortifying digital fortresses against increasingly
sophisticated cyber threats. This section is dedicated to exploring how GenAI technolo-
gies, renowned for their advanced capabilities in data analysis and pattern recognition,
are revolutionizing the approaches to cyber defence. Iturbe et al. [54] described the
AI4CYBER framework. This framework includes AI4TRIAGE (methods to perform
alert triage to determine the root cause of an attack), AI4VUN (identifies vulnerabil-
ities), AI4FIX ( test for vulnerabilities and automatically fix them), and I4COLLAB
(privacy-aware information-sharing mechanism).

4.1 Cyber Defence Automation


LLMs interpret fairly vague commands and make sense of them within a cybersecurity
context. The work of Fayyazi et al. [55] defines a model with vague definitions of a
threat and then matches these to formal MITRE tactics. Charan et al. [47] have even
extended this to generate plain text to map into the MITRE to produce malicious
network payloads. Also, LLMs could aid the protection of smaller organisations and
could enhance organisational security from the integration of human knowledge and
LLMs [56].

17
4.2 Cybersecurity Reporting
Using LLMs provides a method of producing Cyber Threat Intelligence (CTI) using
NLP techniques. For this, Perrina et al. [57] created the Automatic Generation of
Intelligence Reports (AGIR) system to link text data from many data sources. For
this, they found that AGIR has a high recall value (0.99) without any hallucinations,
along with a high score of the Syntactic Log-Odds Ratio (SLOR).

4.3 Threat Intelligence


Bayer et al. [58] address the challenge of information overload in the gathering of CTI
from open-source intelligence. A novel system is introduced, utilizing transfer learning,
data augmentation, and few-shot learning to train specialized classifiers for emerging
cybersecurity incidents. In parallel, Microsoft Security Copilot [59] has been providing
CTI to its customers using GPT, and operational use cases have been observed, such
as the Cyber Dome initiative in Israel [60].

4.4 Secure Code Generation and Detection


Machine learning in code analysis for cybersecurity has been elaborated very well [61].
Recent progress in NLP has given rise to potent language models like the GPT series,
encompassing LLM like ChatGPT and GPT-4 [62]. Traditionally, SAST is a method
that employs Static Code Analysis (SCA) to detect possible security vulnerabilities.
We are interested in seeing whether SAST or GPT could be more efficient in decreasing
the vulnerability window. The window of vulnerability is defined as when the most
vulnerable systems apply the patch minus the time an exploit becomes active. The
precondition is met if two milestones that assume the detection of vulnerabilities verify
their effectiveness, along with the vendor patch [63].
Laws in some countries, like China, ban the reporting on zero-days (see articles 4
and 9 of [64]), and contests like the Tianfu Cup [65], which is a systematic effort to
find zero days, proliferate zero-day discovery continuously. Therefore, this precondition
may not be satisfied timely, especially if the confirmation of vulnerabilities is not
verified. A wide window of vulnerability threatens national security if a zero-day has
been taken against critical infrastructures. DARPA introduces an important challenge
that may help overcome the threat (AIxCC) [66]). Moreover, this topic touches a part
of the BSI studies [67, 68], and where we can define two main classifications of software
testing for cybersecurity bugs:
• SAST: This is often called White Box Testing, which is a set of algorithms and tech-
niques used for analyzing source code. It operates automatically in a non-runtime
environment to detect vulnerabilities such as hidden errors or poor source code
during development.
• Dynamic Application Security Testing (DAST): This follows the opposite
approach and analyzes the program while it operates. Functions are called with val-
ues in the variables as each line of code is checked, and possible branching scenarios
are guessed. Currently, GPT-4 and other LLMs cannot provide DAST capabilities

18
because the code needs to run within the runtime for this to work, requiring many
deployment considerations.

4.5 Vulnerability Detection and Repair


Dominik Sobania et al. [69] explored automated program repair techniques, specifically
focusing on ChatGPT’s potential for bug fixing. According to them, while initially not
designed for this purpose, ChatGPT demonstrated promising results on the QuixBugs
benchmark, rivalling advanced methods like CoCoNut and Codex. ChatGPT’s inter-
active dialogue system uniquely enhances its repair rate, outperforming established
standards.
Wei Ma et al. [70] noted that while ChatGPT shows impressive potential in
software engineering (SE) tasks like code and document generation, its lack of inter-
pretability raises concerns given SE’s high-reliability requirements. Through a detailed
study, they categorized AI’s essential skills for SE into syntax understanding, static
behaviour understanding, and dynamic behaviour understanding. Their assessment,
spanning languages like C, Java, Python, and Solidity, revealed that ChatGPT excels
in syntax understanding (akin to an AST parser) but faces challenges in compre-
hending dynamic semantics. The study also found ChatGPT prone to hallucination,
emphasizing the need to validate its outputs for SE dependability and suggesting that
codes from LLMs are syntactically right but potentially vulnerable.
Haonan Li et al. [71] discussed the challenges of balancing precision and scala-
bility in static analysis for identifying software bugs. While LLMs show potential in
understanding and debugging code, their efficacy in handling complex bug logic, which
often requires intricate reasoning and broad analysis, remains limited. Therefore, the
researchers suggest using LLMs to assist rather than replace static analysis. Their
study introduced LLift, an automated system combining a static analysis tool and
an LLM to address use-before-initialization (UBI) bugs. Despite various challenges
like bug-specific modelling and the unpredictability of LLMs, LLift, when tested on
real-world potential UBI bugs, showed significant precision (50%) and recall (100%).
Notably, it uncovered 13 new UBI bugs in the Linux kernel, highlighting the potential
of LLM-assisted methods in extensive real-world bug detection.
Norbert Tihani et al. [72] introduced the FormAI dataset, comprising 112,000 AI-
generated C programs with vulnerability classifications generated by GPT-3.5-turbo.
These programs range from complex tasks like network management and encryp-
tion to simpler ones, like string operations. Each program comes labelled with the
identified vulnerabilities, pinpointing type, line number, and vulnerable function. To
achieve accurate vulnerability detection without false positives, the Efficient SMT-
based Bounded Model Checker (ESBMC) was used. This method leverages techniques
like model checking and constraint programming to reason over program safety.
Each vulnerability also references its corresponding Common Weakness Enumeration
(CWE) number.
Using GitHub data for code synthesis, Mark et al. [73] presented Codex, which
is a significant advancement in GPT language models. GitHub Copilot functions on
the basis of this improved model. When assessed on the HumanEval dataset, designed
to gauge the functional accuracy of generating programs based on docstrings, Codex

19
achieved a remarkable f28.8% success rate. GPT-J obtained an 11.4% success rate,
whereas GPT-3 produced a 0% success rate. One notable finding was that the model
performed better with repeated sampling; given 100 samples per problem, the success
rate increased to 70.2%. Even with these encouraging outcomes, Codex still has certain
drawbacks. It particularly struggles with complex docstrings and variable binding
procedures. The article discusses the wider consequences of using such powerful code-
generation technologies, including safety, security, and financial effects.
Cheshkov et al. [74] discovered in a technical assessment that the ChatGPT and
GPT-3 models, although successful in other code-based tasks, were only able to match
the performance of a dummy classifier for this specific challenge. Utilizing a dataset of
Java files sourced from GitHub repositories, the study emphasized the models’ current
limitations in the domain of vulnerability detection. However, the authors remain
optimistic about the potential of future advancements, suggesting that models like
GPT-4, with targeted research, could eventually make significant contributions to the
field of vulnerability detection.
A comprehensive study conducted by Xin Liu et al. [75] investigated the potential
of ChatGPT in Vulnerability Description Mapping (VDM) tasks. VDM is pivotal in
efficiently mapping vulnerabilities to CWE and Mitre ATT&CK Techniques classifica-
tions. Their findings suggest that while ChatGPT approaches the proficiency of human
experts in the Vulnerability-to-CWE task, especially with high-quality public data,
its performance is notably compromised in tasks such as Vulnerability-to-ATT&CK,
particularly when reliant on suboptimal public data quality. Ultimately, Xin Liu et al.
emphasize that, despite the promise shown by ChatGPT, it is not yet poised to replace
the critical expertise of professional security engineers, asserting that closed-source
LLMs are not the conclusive answer for VDM tasks.

4.6 Evaluating GenAI for Code Security


Ten security issues were introduced by the OWASP top 10 for LLMs [76]. They are
as follows: Prompt Injection, Unauthorized Output Processing, Training Data Poison-
ing, Denial of Service Attacks, Supply Chain Security Flaws, Disclosure of Sensitive
Information, Unauthorized Plugin Development, Abnormal Agency, Overdependence,
and Model Theft.
Elgedawy et al. [77] analysed the ability of LLM to produce both secure and inse-
cure code and conducted experiments using GPT-3.5, GPT-4, Google Bard and Google
Gemini from Google. This involved nine basic tasks in generating code and assessing
for functionality, security, performance, complexity, and reliability. They found that
Bard was less likely to link to external libraries and, thus, was less exposed to software
chain issues. There were also variable levels of security and code integrity, such as
input validation, sanitization, and secret key management, and while useful for auto-
mated code reviews, LLMs often require manual reviews, especially in understanding
the context of the deployed code. For security, GPT-3.5 seemed to be more robust for
error handling and secure coding practices when there is security consciousness applied
to the prompt, there was a lesser focus on this with GPT-4, but where there were
more advisory notes given. Overall, Gemini produced the most code vulnerabilities,
and the paper advised users to be careful when deploying secure code from Gemini.

20
4.7 Developing Ethical Guidelines
Kumar et al. [78] outlined the ethical challenges related to LLMs and where the
datasets that they were trained on could be open to breaches of confidentiality,
including five major threats: prompt injection, jailbreaking, personally identifiable
information (PII) exposure, sexually explicit content, and hate-based content. They
propose a model that provides an ethical framework for scrutinizing the ethical dimen-
sions of an LLM within the testing phase. The MetaAID framework [79] focuses on
strengthening cybersecurity using Metaverse cybersecurity Q&A and attack simula-
tion scenarios, along with addressing concerns around the ethical implications of user
input. The framework is defined across five dimensions:
• Ethics: This defines an alignment with accepted moral and ethical principles.
• Legal Compliance: Any user input does not violate laws and/or regulations. This
might relate to privacy laws and copyright protection.
• Transparency: User inputs must be clear in requirements and do not intend to
mislead the LLM.
• Intent Analysis: User input should not have other intents, such as jailbreaking the
LLM.
• Malicious intentions: User input should be free of malicious intent, such as
performing hate crimes.
• Social Impact: This defines how user input could have a negative effect on society,
such as searching for ways to do harm to others, such as related to crashing the
stock market or planning a terrorist attack.

4.8 Incident Response and Digital Forensics


Using a pre-trained LLM for artefact comprehension, evidence searching, code devel-
opment, anomaly identification, incident response, and education was examined by
Scanlon et al. [24]. Expert specialization is still needed for many other applications,
including low-risk ones. The main areas of strength include assurance, inventive-
ness, and avoiding the blank page problem, particularly in areas where ChatGPT
excels, including creating forensic scenarios and providing evidence assurance. How-
ever, caution must be used to prevent ChatGPT hallucinations. Code generation and
explanations, including creating instructions for tool integration that can serve as a
springboard for further research, are another useful application.
Regarding weaknesses, Scanlon discovered that having a high-quality, current train-
ing model was crucial; in the absence of one, ChatGPT’s analysis could be prejudiced
and out of date. In general, if it is trained on comparatively old data, it may not be
able to locate the most recent artefacts. Furthermore, ChatGPT’s accuracy decreases
with job specificity, and its accuracy is further diminished when analyzing non-textural
input, including network packets. Another issue with some evidence logs was their
length, which frequently required prefiltering before analysis. The output of ChatGPT
is frequently not predictable, which makes it inappropriate for reproducibility, which
is the last issue found.
OB́rien et al. [80] outline that a full model life cycle solution is required for the
integration of AI.

21
4.9 Identification of Cyber attacks
Iqbal et al. [81] define a plug-in ecosystem for LLM platforms with an attack tax-
onomy. This research will thus extend the taxonomy approach and extend it toward
the MITRE ATT&CK platform [47, 82], and which can use standardized taxonomies,
sharing standards [83], and ontologies for cyber threat intelligence [84].
Garza et al. [85] analysed ChatGPT and Google’s Bard against the top ten attacks
within the MITRE framework and found that ChatGPT can enable attackers to sig-
nificantly improve attacks on networks where fairly low-level skills would be required,
such as with script kiddies. This also includes sophisticated methods of delivering
ransomware payloads. The techniques defined were:
• T1047 Windows Management Instrumentation
• T1018 Remote System Discovery
• T1486 Data Encrypted for Impact
• T1055 Process Injection
• T1003 OS Credential Dumping
• T1021 Remote Services
• T1059 Command and Scripting Interpreter
• T1053 Scheduled Task/Job
• T1497 Virtualization/Sandbox Evasion
• T1082 System Information Discovery
With this approach, the research team were able to generate PowerShell code,
which implemented advanced attacks against the host and mapped directly to the
vulnerabilities defined in the MITRE framework. One of the work’s weaknesses related
to the Google Bard and ChatGPT’s reluctance to produce attack methods, but a
specially engineered command typically overcame this reluctance.
SecurityLLM was defined by Ferrag et al. [86] for cybersecurity threat identifica-
tion. The FalconLLM incident response and recovery system and SecurityBERT cyber
threat detecting method are used in this work. This solution achieves an overall accu-
racy of 98% by identifying 14 attacks using a basic classification model combined with
LLMs. Threats such as DDoS UDP, DDoS ICMP, SQL injection, Password, Vulnera-
bility scanner, DDoS TCP, DDoS HTTP, Uploading, Backdoor, Port Scanning, XSS,
Ransomware, MITM, and Fingerprinting are among them.

4.10 Dataset Generation


Over the years, several datasets have been used for cybersecurity machine learning
training, which performs a range of scenarios or where organisations are unwilling to
share their collected attack data. Unfortunately, these can become out-of-date or are
unrealistic. For this, Kholgh et al. [87] outline the usage of PAC-GPT, a framework
that generates reliable synthetic data for machine learning methods. It has a CLI
interface for data set generation and uses GPT-3 with two core elements:
• Flow Generator: This defines the capturing processing and the regenerative pro-
cess for the patterns for packet generation. regenerating patterns in a series of
network packets and

22
• Packet Generator: This associates packets with network flows. This involves the
usage of LLM chaining.
Simmonds [88] used LLMs to automate the classification of Websites, which can
be used for training data in a machine-learning model. For this, all HTML tags, CSS
styling and other non-essential content must be removed before the LLM processes
them, and then it can train on just the website’s content.

5 Implications of Generative AI in Social, Legal,


and Ethical Domains
This section examines GenAI’s various societal, legal, and ethical consequences. It
investigates GenAI’s impact on legal frameworks, ethical issues, societal norms, and
operational factors. It addresses the potential benefits and drawbacks of these emerg-
ing technologies in relation to societal goals and norms. Concerns around privacy,
prejudices, and improper usage of GenAI are also taken into account. The importance
of striking a balance between advancement and control is finally emphasized. A rev-
olutionary change in digital creativity, automation, and interaction is anticipated as
a result of the rapid advancement of GenAI technologies, such as OpenAI’s Chat-
GPT and Google’s Gemini. Advances in this field herald a new era of human-machine
collaboration marked by an unparalleled capacity to produce highly detailed outputs
akin to human labour, such as text and illustrations. Nonetheless, ethically difficult
problems, including possible misuse, prejudice, privacy, and security, are also raised
by new technology. It is critical to establish a balance between the potential advan-
tages of AI models and the ethical issues they raise as they become more prevalent in
business and daily life [89].
Studies in healthcare are further enhanced by the efficient text and data analysis
capabilities of GenAI technology [90]. In the healthcare sector, GenAI has demon-
strated great promise in supporting duties like radiological reporting and patient care.
It does, however, bring up moral concerns about patient privacy, algorithmic prejudice,
accountability, and the validity of the doctor-patient relationship. To solve these issues
and guarantee that technology is used responsibly, continuing to advance society while
limiting potential harm, a thorough ethical framework and principles encompassing
legal, humanistic, algorithmic, and informational ethics are required. The recommen-
dations attempt to bridge the gap between ethical principles and practical application,
highlighting the need for openness, bias mitigation, and ensuring user privacy and
security in building trust and ethical compliance in GenAI deployment [89]. This
approach seeks to strike a balance between the rapid advances in AI and the ethical
considerations required for its incorporation into sensitive sectors such as healthcare.
Some organizations strive to implement the aforementioned ethical principles and
rules in AI. The European Union is scheduled to implement the AI Act, marking a
historic milestone as the world’s first comprehensive regulation of AI [91], [92]. The
European Commission proposed the AI Act in April 2021 to categorize AI systems
based on their risk level and enforce rules accordingly to ensure that AI technologies are
developed and used safely, transparently, and without discrimination across the EU.
With a focus on human oversight and environmental sustainability, the Act will impose

23
strict controls on high-risk AI applications, prohibit AI systems deemed unacceptable
risks, and establish transparency requirements for limited-risk AI to foster innovation
while protecting fundamental rights and public safety. The US executive order on
the issue prioritizes the development of reliable, secure, and safe AI [93]. Its main
objectives are to protect civil rights and privacy in AI applications, foster AI talent
and innovation in the US, and establish risk management strategies for AI. As a global
leader in responsible AI development and application, it seeks to build responsible AI
deployment within government institutions and foster international collaboration on
AI standards and laws.

5.1 The Omnipresent Influence of GenAI


The application of GenAI technology has yielded previously unthinkable discoveries
and has substantially helped the healthcare, education, and entertainment sectors [90].
This breakthrough technology has developed written and visual information, leading
to increased productivity and new innovation. With the growing importance of GenAI
in our everyday lives, we need to rethink the concepts of creativity and individual
contribution in an increasingly automated world [33]. Aligned with these opportunities
are growing concerns about potential consequences on labour markets’ difficulties in
enforcing copyright laws in the new digital environment. Additionally, it confirms that
the data shared is accurate and proper.

5.2 Concerns Over Privacy in GenAI-Enabled Communication


With GenAI’s capacity to mimic human language skills, private discussions may
become less secure and private. This is a concern as the technology advances. Since
these machines can mimic human interactions, there is a chance that personal data will
be misused [89]. This highlights the necessity for robust legal defences and effective
security measures. Severe data protection regulations and rigorous adherence to ethi-
cal standards are necessary because of the risk that this technology would be exploited
to intentionally or inadvertently access private talks. Respecting people’s privacy and
the ethics of business relationships requires taking preventative measures and strict
observation to end unauthorized access to private communication.

5.3 The Risks of Personal Data Exploitation


With the advancement of GenAI systems in analyzing and utilizing user data to gen-
erate detailed profiles, concerns regarding the potential abuse of personal information
have escalated. The advanced data processing capabilities of these technologies empha-
size the urgent requirement for dependable ways that give users authority over their
personal data [92]. Prior to gathering or utilizing consumer data, it is imperative to
obtain consent in order to safeguard their privacy. Transparent data management pro-
cedures and stringent regulations governing the acquisition, utilization, and retention
of personal data are vital. These measures are essential to protect individuals’ pri-
vacy rights, prevent the unauthorized use of personal data, and ensure that sensitive
information is managed responsibly and ethically.

24
5.4 Challenges in Data Ownership and Intellectual Property
The emergence of GenAI as a proficient technique for producing content based on
user input has led to a rise in the scrutiny of data ownership and intellectual property
rights. The existing legal frameworks need careful examination and modification since
it is becoming increasingly difficult to differentiate between breakthroughs in artifi-
cial intelligence and human creations. Although we acknowledge the intricate roles
that AI plays in creative processes, it is imperative that we maintain the rights of the
original creators [93] [91]. A comprehensive and robust legal framework is essential
to create unambiguous ownership and copyright restrictions for GenAI discoveries,
given the rapid global development in this field. The legal frameworks should facili-
tate and encourage innovation, provide equitable remuneration, and acknowledge the
varied responsibilities of all stakeholders in the creative ecosystem. These policies are
crucial in a future when artificial and human intelligence coexist due to the complex
relationship between data ownership and intellectual property management.

5.5 Ethical Dilemmas Stemming from Organizational Misuse


of GenAI
In consideration of the swift rate of change in contemporary society, it is crucial to
build a strong and all-encompassing legal structure that precisely delineates owner-
ship and copyright restrictions for GenAI discoveries. These legal frameworks must
recognize the distinct roles of each component of the creative ecosystem, promote the
generation of innovative concepts, and ensure equitable compensation. The implemen-
tation of these policies is crucial because of the intricate interdependencies between
data ownership and intellectual property management at a time when artificial and
human intelligence coexist.

5.6 The Challenge of Hallucinations in GenAI Outputs


Despite the remarkable progress in GenAI technology, hallucinations remain a sig-
nificant concern [32]. This implies that artificial intelligence frequently produces
inaccurate or deceptive data. Many people have doubts regarding the veracity of pub-
lications produced by artificial intelligence. This hinders the spread of fraudulent or
deceptive content and, in many cases, jeopardizes the veracity of information. To
solve this issue, a multidisciplinary strategy is needed, one that incorporates targeted
research to find and fix the root causes of AI system hallucinations. If AI-generated
material is to become increasingly sophisticated in its ability to distinguish between
genuine and fake information, it must pass stringent screening processes and be con-
tinuously enhanced. In the GenAI age, creating AI content necessitates a constant
focus on method improvement and in-depth study to ensure data accuracy.
A complex network of unanswered concerns is revealed by the ramifications of
GenAI technology for ethics, law, and society. The proclamation emphasizes how
important interdisciplinary collaboration is to this technology’s development and use.
It entails closely monitoring the ways in which these advancements impact ethical
dilemmas, the legal system, and society at general. Together, technologists, activists,

25
and the general public must develop a comprehensive plan for the ethical and socially
responsible use of artificial intelligence in the digital age.

6 Discussion
This study examined the complex area of GenAI in cybersecurity. The two pri-
mary areas of emphasis are offensive and defensive strategies. By spotting complex
assaults, improving incident response, and automating defensive systems, GenAI has
the potential to dramatically increase cybersecurity standards. These technological
advancements give birth to new concerns, such as hackers’ access to ever-more-
advanced attack-building tools. This contrast highlights how crucial it is to strike
a balance between deliberately restricting the components that can be used and
enhancing GenAI’s capabilities. Moreover, advanced technologies can be combined
with GenAI and LLM methods to increase the system’s security posture. For exam-
ple, digital twin technology, which creates digital replicas of physical objects enabling
two-way communications [94], can enhance the cybersecurity of systems thanks to its
abilities [95]. This technology can be combined with GenAI methods to boost system
resiliency and security.
In addition to examining the seeming conflict between offensive and defensive
strategies, this study looks into the ethical, legal, and social implications of applying
AI in cybersecurity. It also highlights the necessity of strong moral principles, contin-
uous technical oversight, proactive GenAI management, and strong legal frameworks.
This is a paradigm-shifting and technical revolution. Adopting a holistic strategy that
considers the technological, ethical, and sociological consequences of implementing
GenAI into cybersecurity is crucial.
Furthermore, our findings emphasise the significance of interdisciplinary collabo-
ration in promoting GenAI cybersecurity applications. The intricacy and findings of
GenAI technologies require expertise from various fields, including computer science,
law, ethics, and policy-making, to navigate their possible challenges. As multidisci-
plinary research and discourse become more prevalent, it will ensure that GenAI is
applied responsibly and effectively in the future.
Our extensive research has shown that collaborative efforts to innovate ethically
will influence cybersecurity in a future driven by GenAI. Although GenAI has the
ability to transform cybersecurity strategies completely, it also carries a great deal
of responsibility. As we investigate this uncharted domain, we should advance the
development of sophisticated techniques to ensure the moral, just, and safe applica-
tion of advanced GenAI capabilities. By promoting a consistent focus on the complex
relationship between cybersecurity resilience and GenAI innovation, supported by a
commitment to ethical integrity and societal advancement, the current study estab-
lishes the groundwork for future research initiatives. Using innovative technologies and
algorithms can help eliminate vulnerabilities in GenAI solutions

26
7 Conclusion
This work thoroughly examines the Generative Artificial Intelligence (GenAI) tech-
nologies in cybersecurity. Although GenAI has the potential to revolutionize cyberse-
curity processes by automating defences, enhancing threat intelligence, and improving
cybersecurity protocols, it also opens new vulnerabilities for highly skilled cyberat-
tacks. Incorporating GenAI into cybersecurity emphasises the robust ethical, legal,
and technical scrutiny essential to minimize the risks of misuse of data and maxi-
mize the benefits of this technology for protecting digital infrastructures and systems.
Future studies should concentrate on creating strong ethical standards and creative
defence mechanisms to handle the challenges posed by GenAI and guarantee a fair
and impartial approach to its implementation in cybersecurity. A multidisciplinary
effort is required to bridge the gap between ethical management and technological
discovery to coordinate the innovative capabilities of GenAI with the requirement of
cybersecurity resilience.

References
[1] Capogrosso, L., Cunico, F., Cheng, D.S., Fummi, F., Cristani, M.: A Machine
Learning-Oriented Survey on Tiny Machine Learning. IEEE Access 12, 23406–
23426 (2024) https://doi.org/10.1109/ACCESS.2024.3365349

[2] Happe, A., Cito, J.: Getting pwn’d by ai: Penetration testing with large language
models. arXiv preprint arXiv:2308.00121 (2023)

[3] Park, D., An, G.-t., Kamyod, C., Kim, C.G.: A Study on Performance Improve-
ment of Prompt Engineering for Generative AI with a Large Language Model.
Journal of Web Engineering 22(8), 1187–1206 (2023) https://doi.org/10.13052/
jwe1540-9589.2285

[4] Team, Y.: Yandex Adds Next-generation Neural Network to Alice Virtual
Assistant. [Online]. Available: https://yandex.com/company/press center/press
releases/2023/17-05-23, Accessed Jan 8, 2024

[5] Lee, P.: Learning from Tay’s Introduction. [Online]. Available: https://blogs.
microsoft.com/blog/2016/03/25/learning-tays-introduction/, Accessed Jan 8,
2024

[6] Lab, M.M.: NORMAN: World’s first psychopath AI. [Online]. Available: http:
//norman-ai.mit.edu/, Accessed Jan 5, 2024

[7] Lab, M.M.: Project Norman. [Online]. Available: https://www.media.mit.edu/


projects/norman/overview/, Accessed Jan 5, 2024

[8] Legoux, G.: History of the Generative AI. Medium. [Online]. Available: https://
medium.com/@glegoux/history-of-the-generative-ai-aa1aa7c63f3c, Accessed Feb
15, 2024

27
[9] Team, T.: History of the Generative AI. Toloka AI. [Online]. Available: https:
//toloka.ai/blog/history-of-generative-ai/, Accessed Feb 15, 2024

[10] Barreto, F., Moharkar, L., Shirodkar, M., Sarode, V., Gonsalves, S., Johns, A.:
Generative Artificial Intelligence: Opportunities and Challenges of Large Lan-
guage Models. In: Balas, V.E., Semwal, V.B., Khandare, A. (eds.) Intelligent
Computing and Networking, pp. 545–553. Springer, ??? (2023)

[11] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N.,
Barnes, N., Mian, A.: A Comprehensive Overview of Large Language Models
(2023)

[12] Mohammed, S.P., Hossain, G.: Chatgpt in education, healthcare, and cyberse-
curity: Opportunities and challenges. In: 2024 IEEE 14th Annual Computing
and Communication Workshop and Conference (CCWC), pp. 0316–0321 (2024).
IEEE

[13] Alawida, M., Mejri, S., Mehmood, A., Chikhaoui, B., Isaac Abiodun, O.: A com-
prehensive study of chatgpt: advancements, limitations, and ethical considerations
in natural language processing and cybersecurity. Information 14(8), 462 (2023)

[14] Dun, C., Garcia, M.H., Zheng, G., Awadallah, A.H., Kyrillidis, A., Sim, R.:
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation (2023)

[15] AI, G.: AI Principles Progress Update 2023. [Online]. Available: https://ai.
google/responsibility/principles/, Accessed Jan 10, 2024

[16] AI, G.: GPT-4 Technical Report. [Online]. Available: https://ai.google/


responsibility/principles/, Accessed Jan 10, 2024

[17] OpenAI: Introducing Gemini: Our Largest and Most Capable AI Model. [Online].
Available: https://cdn.openai.com/papers/gpt-4.pdf, Accessed Dec 12, 2023
(2023)

[18] Frieder, S., Pinchetti, L., Griffiths, R.-R., Salvatori, T., Lukasiewicz, T., Petersen,
P., Berner, J.: Mathematical capabilities of chatgpt. Advances in Neural Infor-
mation Processing Systems 36 (2024)

[19] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter,
C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J.,
Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan,
M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33,
pp. 1877–1901. Curran Associates, Inc., ??? (2020). https://proceedings.neurips.

28
cc/paper files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[20] Romera-Paredes, B., Barekatain, M., Novikov, A., Balog, M., Kumar, M.P.,
Dupont, E., Ruiz, F.J.R., Ellenberg, J.S., Wang, P., Fawzi, O., Kohli, P., Fawzi,
A.: Mathematical discoveries from program search with large language models.
Nature 625(7995), 468–475 (2024) https://doi.org/10.1038/s41586-023-06924-6

[21] Lu, C., Qian, C., Zheng, G., Fan, H., Gao, H., Zhang, J., Shao, J., Deng, J., Fu,
J., Huang, K., Li, K., Li, L., Wang, L., Sheng, L., Chen, M., Zhang, M., Ren, Q.,
Chen, S., Gui, T., Ouyang, W., Wang, Y., Teng, Y., Wang, Y., Wang, Y., He, Y.,
Wang, Y., Wang, Y., Zhang, Y., Qiao, Y., Shen, Y., Mou, Y., Chen, Y., Zhang,
Z., Shi, Z., Yin, Z., Wang, Z.: From GPT-4 to Gemini and Beyond: Assessing the
Landscape of MLLMs on Generalizability, Trustworthiness and Causality through
Four Modalities (2024)

[22] Wang, Y., Zhao, Y.: Gemini in Reasoning: Unveiling Commonsense in Multimodal
Large Language Models (2023)

[23] Shevlane, T.: An early warning system for novel AI risks. Google
DeepMind. [Online]. Available: https://deepmind.google/discover/blog/
an-early-warning-system-for-novel-ai-risks/, Accessed Jan 15, 2024

[24] Scanlon, M., Breitinger, F., Hargreaves, C., Hilgert, J.-N., Sheppard, J.: Chatgpt
for digital forensic investigation: The good, the bad, and the unknown. Forensic
Science International: Digital Investigation 46, 301609 (2023)

[25] Tihanyi, N., Ferrag, M.A., Jain, R., Debbah, M.: CyberMetric: A Benchmark
Dataset for Evaluating Large Language Models Knowledge in Cybersecurity
(2024)

[26] Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: Realtoxici-
typrompts: Evaluating neural toxic degeneration in language models. In: Findings
(2020). https://api.semanticscholar.org/CorpusID:221878771

[27] Zhou, Z., Wang, Q., Jin, M., Yao, J., Ye, J., Liu, W., Wang, W., Huang,
X., Huang, K.: MathAttack: Attacking Large Language Models Towards Math
Solving Ability (2023)

[28] Begou, N., Vinoy, J., Duda, A., Korczynski, M.: Exploring the dark side of ai:
Advanced phishing attack design and deployment using chatgpt. arXiv preprint
arXiv:2309.10463 (2023)

[29] Yigit, Y., Bal, B., Karameseoglu, A., Duong, T.Q., Canberk, B.: Digital twin-
enabled intelligent ddos detection mechanism for autonomous core networks.
IEEE Communications Standards Magazine 6(3), 38–44 (2022) https://doi.org/
10.1109/MCOMSTD.0001.2100022

29
[30] AI, A.: GPT-4 Jailbreak ve Hacking Via Rabbithole Attack, Prompt Injection,
Content Moderation Bypass ve Weaponizing AI. [Online]. Available: https://
adversa.ai/, Accessed Dec 20, 2023

[31] Yigit, Y., Nguyen, L.D., Ozdem, M., Kinaci, O.K., Hoang, T., Canberk, B.,
Duong, T.Q.: TwinPort: 5G Drone-assisted Data Collection with Digital Twin
for Smart Seaports. Scientific Reports 13, 12310 (2023) https://doi.org/10.1038/
s41598-023-39366-1

[32] Gupta, M., Akiri, C., Aryal, K., Parker, E., Praharaj, L.: From chatgpt to
threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access
(2023)

[33] Li, H., Guo, D., Fan, W., Xu, M., Song, Y.: Multi-step jailbreaking privacy attacks
on chatgpt. arXiv preprint arXiv:2304.05197 (2023)

[34] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou,
D., et al.: Chain-of-thought prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems 35, 24824–24837 (2022)

[35] Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models
are zero-shot reasoners. Advances in neural information processing systems 35,
22199–22213 (2022)

[36] Xie, Y., Yi, J., Shao, J., Curl, J., Lyu, L., Chen, Q., Xie, X., Wu, F.: Defending
chatgpt against jailbreak attack via self-reminder. Nature Machine Intelligence 5,
1486–1496 (2023) https://doi.org/10.1038/s42256-023-00765-8

[37] OpenAI: Disrupting Malicious Uses of AI by State-


Affiliated Threat Actors. https://openai.com/blog/
disrupting-malicious-uses-of-ai-by-state-affiliated-threat-actors Accessed
2024-02-25

[38] Brandao, P.R.: Advanced persistent threats (apt)-attribution-mictic framework


extension. J. Comput. Sci 17, 470–479 (2021)

[39] Falade, P.V.: Decoding the threat landscape: Chatgpt, fraudgpt, and wormgpt in
social engineering attacks. arXiv preprint arXiv:2310.05595 (2023)

[40] Roy, S.S., Naragam, K.V., Nilizadeh, S.: Generating phishing attacks using
chatgpt. arXiv preprint arXiv:2305.05133 (2023)

[41] Deng, G., Liu, Y., Mayoral-Vilches, V., Liu, P., Li, Y., Xu, Y., Zhang, T., Liu, Y.,
Pinzger, M., Rass, S.: PentestGPT: An LLM-empowered Automatic Penetration
Testing Tool (2023)

[42] AI, O.: Introducing GPTs (2023). https://openai.com/blog/introducing-gpts


Accessed 2023-11-12

30
[43] Montiel, R.: ChatGPT (2021). https://chat.openai.com/g/
g-zQfyABDUJ-gp-en-t-ester Accessed 2023-11-12

[44] Doustaly, L.: ChatGPT (2021). https://chat.openai.com/g/


g-zQfyABDUJ-gp-en-t-ester Accessed 2023-11-12

[45] Temara, S.: Maximizing penetration testing success with effective reconnaissance
techniques using chatgpt (2023)

[46] Happe, A., Kaplan, A., Cito, J.: Evaluating llms for privilege-escalation scenarios.
arXiv preprint arXiv:2310.11409 (2023)

[47] Charan, P., Chunduri, H., Anand, P.M., Shukla, S.K.: From text to mitre tech-
niques: Exploring the malicious use of large language models for generating cyber
attack payloads. arXiv preprint arXiv:2305.15336 (2023)

[48] ONeal, A.: ChatGPT-Dan-Jailbreak.md (2023). https://gist.github.com/


coolaj86/6f4f7b30129b0251f61fa7baaa881516 Accessed 2023-11-13

[49] AI2.0 (2023). https://01.ai/ Accessed 2023-11-13

[50] Kumamoto, T., Yoshida, Y., Fujima, H.: Evaluating large language models in
ransomware negotiation: A comparative analysis of chatgpt and claude (2023)

[51] Madani, P.: Metamorphic malware evolution: The potential and peril of large lan-
guage models. In: 2023 5th IEEE International Conference on Trust, Privacy and
Security in Intelligent Systems and Applications (TPS-ISA), pp. 74–81 (2023).
IEEE Computer Society

[52] Kwon, H., Sim, M., Song, G., Lee, M., Seo, H.: Novel approach to cryptography
implementation using chatgpt. Cryptology ePrint Archive (2023)

[53] Cintas-Canto, A., Kaur, J., Mozaffari-Kermani, M., Azarderakhsh, R.: Chatgpt
vs. lightweight security: First work implementing the nist cryptographic standard
ascon. arXiv preprint arXiv:2306.08178 (2023)

[54] Iturbe, E., Rios, E., Rego, A., Toledo, N.: Artificial intelligence for next-generation
cybersecurity: The ai4cyber framework. In: Proceedings of the 18th International
Conference on Availability, Reliability and Security, pp. 1–8 (2023)

[55] Fayyazi, R., Yang, S.J.: On the uses of large language models to interpret
ambiguous cyberattack descriptions. arXiv preprint arXiv:2306.14062 (2023)

[56] Kereopa-Yorke, B.: Building resilient smes: Harnessing large language models for
cyber security in australia. arXiv preprint arXiv:2306.02612 (2023)

[57] Perrina, F., Marchiori, F., Conti, M., Verde, N.V.: Agir: Automating cyber
threat intelligence reporting with natural language generation. arXiv preprint

31
arXiv:2310.02655 (2023)

[58] Bayer, M., Frey, T., Reuter, C.: Multi-level fine-tuning, data augmentation, and
few-shot learning for specialized cyber threat intelligence. Computers & Security
134, 103430 (2023) https://doi.org/10.1016/j.cose.2023.103430

[59] Microsoft.com: Microsoft Security Copilot — Microsoft Security (2023).


https://www.microsoft.com/en-us/security/business/ai-machine-learning/
microsoft-security-copilot Accessed 2023-10-29

[60] DVIDS: U.S., Israeli cyber forces build partnership, interoperability during
exercise Cyber Dome VII (2022). https://www.dvidshub.net/news/434792/
us-israeli-cyber-forces-build-partnership-interoperability-during-exercise-cyber-dome-vii
Accessed 2023-10-29

[61] Sharma, T., Kechagia, M., Georgiou, S., Tiwari, R., Vats, I., Moazen, H., Sarro,
F.: A survey on machine learning techniques for source code analysis. arXiv
preprint arXiv:2110.09610 (2021)

[62] OpenAI: GPT-4 Technical Report (2023). https://arxiv.org/abs/2303.08774


Accessed 2023-08-20

[63] Johansen, H.D., Renesse, R.: Firepatch: Secure and time-critical dissemi-
nation of software patches. IFIP, 373–384 (2007) https://doi.org/10.1007/
978-0-387-72367-9 32 . Accessed 2023-08-20

[64] Regulations on the Management of Network Product Security Vulnerabil-


ities (2021). https://www.gov.cn/gongbao/content/2021/content 5641351.htm
Accessed 2023-08-20

[65] Tianfu Cup International Cybersecurity Contest (2022). https://www.tianfucup.


com/2022/en/ Accessed 2023-08-20

[66] DARPA: Artificial Intelligence Cyber Challenge (AIxCC) (2023). https://www.


dodsbirsttr.mil/topics-app/?baa=DOD SBIR 2023 P1 C4 Accessed 2023-08-20

[67] BSI: AI SECURITY CONCERNS IN A NUTSHELL (2023). https:


//www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/KI/Practical Al-Security
Guide 2023.pdf? blob=publicationFile&v=5 Accessed 2023-08-20

[68] BSI: Machine Learning in the Context of Static Application Security Test-
ing - ML-SAST (2023). https://www.bsi.bund.de/SharedDocs/Downloads/
EN/BSI/Publications/Studies/ML-SAST/ML-SAST-Studie-final.pdf? blob=
publicationFile&v=5 Accessed 2023-08-20

[69] Sobania, D., Hanna, C., Briesch, M., Petke, J.: An Analysis of the Automatic Bug
Fixing Performance of ChatGPT (2023). https://arxiv.org/pdf/2301.08653.pdf

32
[70] Ma, W., Liu, S., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L., Liu, Y.: The
Scope of ChatGPT in Software Engineering: A Thorough Investigation (2023).
https://arxiv.org/pdf/2305.12138.pdf

[71] Li, H., Hao, Y., Zhai, Y., Qian, Z.: The Hitchhiker’s Guide to Program Analysis: A
Journey with Large Language Models (2023). https://arxiv.org/pdf/2308.00245.
pdf Accessed 2023-08-20

[72] Tihanyi, N., Bisztray, T., Jain, R., Ferrag, M., Cordeiro, L., Mavroeidis, V.:
THE FORMAI DATASET: GENERATIVE AI IN SOFTWARE SECURITY
THROUGH THE LENS OF FORMAL VERIFICATION * (2023). https://arxiv.
org/pdf/2307.02192.pdf Accessed 2023-08-20

[73] Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code (2021)

[74] Cheshkov, A., Zadorozhny, P., Levichev, R.: Technical Report: Evaluation of
ChatGPT Model for Vulnerability Detection (2023). https://arxiv.org/pdf/2304.
07232.pdf

[75] Liu, X., Tan, Y., Xiao, Z., Zhuge, J., Zhou, R.: Not The End of Story: An Eval-
uation of ChatGPT-Driven Vulnerability Description Mappings (2023). https:
//aclanthology.org/2023.findings-acl.229.pdf Accessed 2023-08-22

[76] OWASP Top 10 for Large Language Model Applica-


tions — OWASP Foundation (2023). https://owasp.org/
www-project-top-10-for-large-language-model-applications/ Accessed
2023-08-22

[77] Elgedawy, R., Sadik, J., Dutta, S., Gautam, A., Georgiou, K., Gholamrezae, F.,
Ji, F., Lim, K., Liu, Q., Ruoti, S.: Ocassionally secure: A comparative analysis of
code generation assistants. arXiv preprint arXiv:2402.00689 (2024)

[78] Kumar, A., Singh, S., Murty, S.V., Ragupathy, S.: The ethics of interaction:
Mitigating security threats in llms. arXiv preprint arXiv:2401.12273 (2024)

[79] Zhu, H.: Metaaid 2.5: A secure framework for developing metaverse applications
via large language models. arXiv preprint arXiv:2312.14480 (2023)

[80] O’Brien, J., Ee, S., Williams, Z.: Deployment corrections: An incident response
framework for frontier ai models. arXiv preprint arXiv:2310.00328 (2023)

[81] Iqbal, U., Kohno, T., Roesner, F.: Llm platform security: Applying a sys-
tematic evaluation framework to openai’s chatgpt plugins. arXiv preprint
arXiv:2309.10254 (2023)

33
[82] Kwon, R., Ashley, T., Castleberry, J., Mckenzie, P., Gourisetti, S.N.G.: Cyber
threat dictionary using mitre att&ck matrix and nist cybersecurity framework
mapping. In: 2020 Resilience Week (RWS), pp. 106–112 (2020). IEEE

[83] Xiong, W., Legrand, E., Åberg, O., Lagerström, R.: Cyber security threat model-
ing based on the mitre enterprise att&ck matrix. Software and Systems Modeling
21(1), 157–177 (2022)

[84] Mavroeidis, V., Bromander, S.: Cyber threat intelligence model: an evaluation of
taxonomies, sharing standards, and ontologies within cyber threat intelligence.
In: 2017 European Intelligence and Security Informatics Conference (EISIC), pp.
91–98 (2017). IEEE

[85] Garza, E., Hemberg, E., Moskal, S., O’Reilly, U.-M.: Assessing large language
model’s knowledge of threat behavior in mitre att&ck (2023)

[86] Ferrag, M.A., Ndhlovu, M., Tihanyi, N., Cordeiro, L.C., Debbah, M., Lestable,
T.: Revolutionizing cyber threat detection with large language models. arXiv
preprint arXiv:2306.14263 (2023)

[87] Kholgh, D.K., Kostakos, P.: Pac-gpt: A novel approach to generating synthetic
network traffic with gpt-3. IEEE Access (2023)

[88] Simmonds, B.C.: Generating a large web traffic dataset. Master’s thesis, ETH
Zurich (2023)

[89] Zhou, J., Müller, H., Holzinger, A., Chen, F.: Ethical ChatGPT: Concerns,
Challenges, and Commandments (2023)

[90] Wang, C., Liu, S., Yang, H., Guo, J., Wu, Y., Liu, J.: Ethical considerations of
using chatgpt in health care. Journal of Medical Internet Research 25, 48009
(2023) https://doi.org/10.2196/48009

[91] Madiega, T.: Artificial Intelligence Act. European Parliamentary Research


Service. [Online]. Available: https://www.europarl.europa.eu/doceo/document/
TA-9-2023-0236 EN.pdf, Accessed Jan 9, 2024

[92] Parliament), E.: EU AI Act: first regulation on artificial intelligence.


[Online]. Available: https://www.europarl.europa.eu/topics/en/article/
20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence, Accessed
Jan 8, 2024

[93] Harris, L.A., Jaikaran, C.: Highlights of the 2023 Executive Order on Artificial
Intelligence for Congress. Congressional Research Service. [Online]. Available:
https://crsreports.congress.gov/, Accessed Jan 9, 2024

34
[94] Yigit, Y., Chrysoulas, C., Yurdakul, G., Maglaras, L., Canberk, B.: Digital Twin-
Empowered Smart Attack Detection System for 6G Edge of Things Networks. In:
2023 IEEE Globecom Workshops (GC Wkshps) (2023)

[95] Yigit, Y., Kinaci, O.K., Duong, T.Q., Canberk, B.: TwinPot: Digital Twin-
assisted Honeypot for Cyber-Secure Smart Seaports. In: 2023 IEEE International
Conference on Communications Workshops (ICC Workshops), pp. 740–745
(2023). https://doi.org/10.1109/ICCWorkshops57953.2023.10283756

Appendix A GPT3.5 and GPT4 OCO-scripting


A.1 Expression of Abilities in OCO
GPT4 offers a list of dangerous codes that it can implement in FigureA1.

Fig. A1 All dangerous code types that GPT4 can produce

A.2 Self-replicating simple virus


This basic and simple virus can restart the computer (windows as a sample); we didn’t
enhance privilege escalation and full antivirus evasion for ethical reasons. It can be
seen in Fig. A2.

A.3 Polymorphism
This basic and polymorphic design shows that LLMs could assist cyber ops.It can be
seen in Fig. A3.

35
Fig. A2 Self-replicating simple virus

A.4 Rootkit
An educational rootkit is developed and improved by GPT3.5 and GPT4. It can be
seen in Fig. A6.

A.5 Stealthy Data Exfiltration


A script for stealthy avoidance of detection by anomaly detection systems was
developed and improved by GPT3.5 and GPT4. It can be seen in Fig. A7.

36
Fig. A3 Skeleton code for polymorphic behaviour

37
38

Fig. A4 Adding to exploit capacity with a seed to exploit CVE-2024-1708 and CVE-2024-1709
Fig. A5 Refactoring polymorphism

39
40

Fig. A6 Rootkit
Fig. A7 Data Exfiltration Script with Stealth Features

41

You might also like