0% found this document useful (0 votes)
20 views11 pages

Automating Cyber Threat Intelligence and Attack Chain Generation Using Cyber Security Knowledge Graphs and Large Language Models

Uploaded by

johntriple
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Automating Cyber Threat Intelligence and Attack Chain Generation Using Cyber Security Knowledge Graphs and Large Language Models

Uploaded by

johntriple
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Automating Cyber Threat Intelligence and Attack Chain

Generation using Cyber Security Knowledge Graphs and Large


Language Models
Johannes F Loevenich1,2 , Erik Adler1,3 , Tobias Hürten1 , Florian Spelter1 , Damian
Roncevic1,4,5 , and Roberto Rigolin F. Lopes 1
1
Secure Communications & Information (SIX)
Posted on 3 Sep 2025 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.175693745.57950884/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

2
Department of Mathematics/Computer Science, University of Osnabrück
3
Department of Computer Science, Karlsruhe Institute of Technology (KIT)
4
Department of Computer Science, Duale Hochschule Baden-Württemberg
5
Hitachi Rail (GTS Deutschland GmbH)

September 03, 2025

Abstract
Modern cyberattacks are increasingly complex, using sophisticated tactics, techniques and procedures (TTPs) to evade detection
and compromise systems. Effective cyber defence relies on real-time and accurate Cyber Threat Intelligence (CTI), which is
often challenged by data quality, completeness and accessibility. While traditional methods and manually maintained knowledge
bases provide valuable insights, they struggle to adapt to the rapidly evolving threat landscape. To address these challenges, we
propose an architecture that uses Large Language Models (LLMs) for automated annotation of CTI reports and construction of
Cybersecurity Knowledge Graphs (CSKG) to build sophisticated attack chains. Building on our previous research, we extend the
capabilities of Autonomous Cyber Defence (ACD) agents to improve situational awareness and defence mechanisms in dynamic
environments. Experimental results demonstrate the effectiveness of our approach in improving CTI accessibility, accuracy, and
integration into defence strategies. Our experimental results highlight the potential of combining LLM, knowledge graphs and
automated planning to improve proactive cyber defence and attack simulation methodologies.

1
Automating Cyber Threat Intelligence and Attack
Chain Generation using Cyber Security Knowledge
Graphs and Large Language Models
Johannes F. Loevenich1,2 , Erik Adler1,3 , Tobias Hürten1 , Florian Spelter1 ,
Damian Roncevic1,4,5 , and Roberto Rigolin F. Lopes1
Secure Communications & Information (SIX), Thales Deutschland, Ditzingen, Germany
1
2
Department of Mathematics/Computer Science, University of Osnabrück, Osnabrück, Germany
3
Department of Computer Science, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
4
Department of Computer Science, Duale Hochschule Baden-Württemberg, Stuttgart, Germany
5
Hitachi Rail (GTS Deutschland GmbH), Ditzingen, Germany
Email: [email protected], [email protected], [email protected]

Abstract—Modern cyberattacks are increasingly complex, their usability is often limited by manual maintenance and
using sophisticated tactics, techniques and procedures (TTPs) to infrequent updates. Recent advances in Large Language
evade detection and compromise systems. Effective cyber defence Models (LLMs) have demonstrated the potential for
relies on real-time and accurate Cyber Threat Intelligence
(CTI), which is often challenged by data quality, completeness automating CTI extraction and classification tasks, as
and accessibility. While traditional methods and manually explored in works such as PentestGPT [2]. Despite
maintained knowledge bases provide valuable insights, they these advances, challenges remain in generalising to new
struggle to adapt to the rapidly evolving threat landscape. ontologies, dealing with data sparsity for emerging threats,
To address these challenges, we propose an architecture that and ensuring the accuracy and completeness of Cyber Security
uses Large Language Models (LLMs) for automated annotation
of CTI reports and construction of Cybersecurity Knowledge Knowledge Graphs (CSKGs).
Graphs (CSKG) to build sophisticated attack chains. Building A major challenge in cyber defence is to represent diverse,
on our previous research, we extend the capabilities of multimodal data sources in a way that is both structured
Autonomous Cyber Defence (ACD) agents to improve situational and actionable. Network logs, open source CTI reports
awareness and defence mechanisms in dynamic environments. and vulnerability databases are naturally heterogeneous,
Experimental results demonstrate the effectiveness of our
approach in improving CTI accessibility, accuracy, and containing unstructured text, relational data and system
integration into defence strategies. Our experimental results specific logs. Traditional approaches often fail to synchronise
highlight the potential of combining LLM, knowledge graphs these formats, resulting in fragmented and incomplete views
and automated planning to improve proactive cyber defence of the threat landscape. This lack of integration hinders
and attack simulation methodologies. the ability to contextualise vulnerabilities, understand attack
Index Terms—Autonomous Cyber Defence, Knowledge
Graphs, Cybersecurity, Large Language Model patterns, and predict adversary actions. To address this, we
construct three types of cybersecurity knowledge graphs: Type
I graphs model the network infrastructure, Type II graphs
I. I NTRODUCTION
capture CTI reports and their relationships, and Type III
Modern cyberattacks are increasingly sophisticated, using graphs connect these layers to provide a unified, multimodal
complex and evolving Tactics, Techniques and Procedures representation of the cybersecurity domain. These graphs
(TTP). In response, organisations rely heavily on Cyber enable a deeper understanding of threats and simplify the
Threat Intelligence (CTI) to anticipate, detect and mitigate automation of both defensive and offensive cybersecurity
threats. CTI, defined as ”evidence-based knowledge, including tasks.
context, mechanisms, indicators, implications, and actionable Another major challenge is the complexity of the relation-
advice about an existing or emerging threat” , provides ship between system and attack actions to automatically build
essential insight into the threat landscape and enables possible attack chains to test the system. Traditional methods
proactive defence strategies. However, traditional CTI of simulating cyberattacks, such as penetration testing, red
methods, predominantly based on Indicators of Compromises teaming and tool-assisted attack execution, also come with
(IOCs), face challenges of limited scope and short-term inherent limitations. These methods often rely on fixed
relevance [1]. playbooks or isolated actions that fail to capture the dynamics
While comprehensive CTI knowledge bases such as of multi-step attack chains. Automated planning approaches,
Common Vulnerabilities and Exposures (CVE), National such as those using Planning Domain Definition Language
Vulnerability Database (NVD), and MITRE ATT&CK [1], (PDDL), are promising, but require extensive manual input
and are limited in scope, often failing to comprehensively triplets from CTI reports. However, cybersecurity data often
integrate different TTPs. introduces domain-specific complexities, including special
In this work, we address these gaps by introducing an characters in IPv4 addresses, file paths, and file names, which
architecture that automates the annotation of CTI reports can confuse standard natural language processing engines. In
to build CSKGs using LLMs, and automatically constructs addition, these methods rely on static dictionaries and rules
comprehensive attack chains using knowledge of the current that require constant updating to remain relevant in the face
system architecture. Building on our previous research [3, of evolving threats.
4, 5], we extend the capabilities of Autonomous Cyber Fine-tuning based approaches, such as LADDER [16], train
Defense (ACD) agents to improve situational awareness and pre-trained transformer models such as Bidirectional Encoder
defence mechanisms in dynamic environments. The main Representations from Transformers (BERT) or A Robustly
contributions are: Optimized BERT Pretraining Approach (RoBERTa) on
1) A methodology for automated data annotation using annotated CTI datasets to perform Named Entity Recognition
LLMs to improve the quality and accessibility of CTI (NER) and Relation Extraction (RE). These methods are
reports. resource-intensive and rely on large labeled datasets, which
2) A methodology for constructing Type I, Type II, and are costly to produce, particularly for emerging threats.
Type III cybersecurity knowledge graphs by integrating Moreover, models fine-tuned to a specific ontology often
network logs and open source CTI reports. struggle to generalize to new terminologies or adapt to
3) A methodology for mapping CVEs to offensive alternative ontologies, limiting their scope and flexibility.
procedures, tactics and techniques in the MITRE This paper presents an automated annotation methodology
ATT&CK framework using BERT-based models. using LLMs to address these challenges. By reducing
4) An approach for generating executable attack chains reliance on manual input, improving adaptability to emerging
through PDDL planning, augmented by LLMs, enabling threats, and efficiently scaling across different ontologies, the
human-machine teaming. proposed solution aligns with NATO’s goals of improving
The rest of this paper is organised as follows. Section II data accessibility, interoperability, and situational awareness
discusses the problem to be solved by the present investigation across member nations.
and the relevance to NATO. Section III reviews related work Representing heterogeneous data sources such as network
on automated information extraction from CTI and defensive logs, vulnerability databases and CTI reports in a consistent
cybersecurity tasks. Section IV describes the methodology and actionable format is essential for effective decision
for automating the data annotation task using LLMs, the making. Existing systems such as Structured Threat
architecture for generating CSKG, and the methodology for Information Expression (STIX) [17] demonstrate the potential
automated generation of attack chains using the CSKG. of structured representations, but are limited in their scope,
Section V presents a comparative performance analysis of the granularity, and ability to integrate various data sources.
information extraction and automated attack chain generation. For example, the MITRE ATT&CK framework [1] is often
Finally, Section VI concludes the paper and lists future work. represented as a matrix, which provides a limited view
compared to the dynamic relationships that can be captured
II. P ROBLEM S TATEMENT & R ELEVANCE TO NATO in a graph.
The concepts presented in this paper build on previous CSKGs offer a promising solution by formalizing
work [6, 7], discussed at IST-208-RSY in Lillehammer, relationships between entities such as vulnerabilities, attack
Norway, and refined through dialogue with NATO subject patterns, and network components. Examples include
matter experts at MILCOM in Washington [8, 9, 10, 11, VulOntologies, that link vulnerabilities to platforms and
12, 13, 14]. These discussions identified critical gaps in applications, like TAGraph [18], which integrates threat actor
CTI automation, knowledge representation and attack chain data from multiple sources. However, the construction of
simulation to build Automated Cyber Operation (ACO) these graphs requires significant manual effort and domain
gyms for training ACD agents. The NATO Science and expertise, limiting their scalability.
Technology Organization (NATO STO) Information Systems This paper proposes the development of multi-layer CSKGs
Technology (IST) Panel and the NATO Collaboration Support to address these limitations. Type I graphs model the network
Office (CSO) highlighted the need for innovative methods to infrastructure, Type II graphs capture the relationships within
strengthen NATO’s cyber defence posture and improve the CSKG, and Type III graphs integrate these layers to provide a
capabilities of ACD agents. comprehensive view of the cybersecurity landscape within the
The decentralised and unstructured nature of CTI data network infrastructure. By automating the creation of these
poses a significant challenge to security professionals seeking CSKGs, this approach enables dynamic threat modelling and
to derive actionable intelligence. Current approaches to CTI real-time situational awareness, enhancing NATO’s ability to
knowledge extraction follow two paradigms: syntax parsing- protect critical infrastructure and respond to emerging threats.
based methods and fine-tuning-based methods. Syntax Simulating realistic attack chains is critical for testing
parsing-based tools, such as ThreatRaptor [15], rely on defences and conducting effective red teaming exercises.
grammatical rules to extract Subject-Verb-Object (SVO) Traditional methods rely on predefined playbooks or limited
automated planners, such as ChainReactor [19], which as identifying IOCs and linking them to associated TTP.
focuses only on privilege escalation attacks with a small set of By using their contextual understanding, LLMs can generate
predefined actions. In contrast, modern Advanced Persistent domain-specific labels aligned with established ontologies
Threats (APTs) involve complex, multi-step attack chains that such as the MITRE ATT&CK framework [1] or CVE
require dynamic planning across a wider range of TTP. classifications. In addition, LLMs are by definition adaptable,
PDDL provides a standardised framework for automating allowing them to handle new terminologies and emerging
the construction of attack chains by defining domain threats without the need for extensive retraining. This
requirements, actions and preconditions. However, existing adaptability addresses a key limitation of static, rule-based
implementations lack integration with external tools and rely systems that require frequent updates to remain effective.
heavily on manual input to construct realistic scenarios. In
addition, the complexity of cyberattacks involves multiple B. Cyber Security Knowledge Graphs (CSKG)
layers, from hardware to software, making it difficult to define Knowledge Organisation Systems (KOSes), such as
effective predicates and actions without expert input. taxonomies, thesauri, controlled vocabularies, datasets and
This paper uses LLMs to enhance PDDL-based planning, ontologies, play a central role in organising and processing
enabling the automated generation of comprehensive attack CTI. These systems support various cybersecurity applica-
chains. By integrating real-world data from CTI reports and tions, including adaptive threat-based adversary emulation,
existing attack toolkits, this methodology provides repro- purple teaming, security tool evaluation, and post-exploit
ducible and dynamic attack simulations. These capabilities threat modelling. The utility of KOSes in these contexts
align with NATO’s goals of improving proactive defence and depends on the underlying data models, data structures,
enhancing human-machine teaming in cyber operations. and levels of abstraction that determine their processability
The methods proposed in this paper directly support and interpretability. For example, the widely used MITRE
NATO’s strategic objectives, including those defined in the ATT&CK framework [1] is typically represented as a matrix
NATO Artificial Intelligence (AI) Strategy [20]. Key benefits that outlines adversary TTP based on real-world observations.
include: While this representation is standard, mapping its concepts
• Improving situational awareness by integrating multi-
and relationships into graph structures offers enhanced
modal data into actionable CSKGs. analytical capabilities.
• Supporting human-machine teaming by automating
STIX is another prominent example, serving as a language
resource intensive tasks, allowing experts to focus on and serialisation format that supports the modelling of
strategic decision making. CSKGs [23]. Ontologies such as Situation and Threat
• Enabling interoperability through standardised knowl-
Understanding by Correlating Contextual Observations
edge representation across coalition networks. (STUCCO) extend this capability by defining concepts such
• Supporting proactive defence by enabling realistic,
as users, accounts, hosts, vulnerabilities, malware and their
automated simulations of attacks. relationships. Implemented in JSON Schema and compatible
with the GraphSON format, STUCCO provides a rich
III. BACKGROUND framework for tasks such as incident response, malware
identification, and vulnerability management [24]. Similarly,
A. LLMs for Data Annotation Cybersecurity Operations Centre Ontology for Analysis
Data annotation is a fundamental task in cybersecurity, (CoCoa) aligns with National Institute of Standards and
serving as the basis for building machine learning models and Technology (NIST) standards to represent cyber incidents,
enabling automated threat analysis. The annotation process events, and network information in a manner conducive to
typically involves labelling large volumes of heterogeneous monitoring and visualisation [25].
data, including unstructured CTI reports, semi-structured In addition to these frameworks, comprehensive ontologies
network logs, and structured vulnerability databases. This such as Unified Cybersecurity Ontology (UCO) and
process is often resource-intensive, requires significant VulOntologies are examples of efforts to link diverse
manual effort and domain expertise, and struggles to keep cybersecurity terminologies and data sources. UCO bridges
pace with the dynamic nature of emerging threats [21]. multiple standards, including STIX, Common Attack Pattern
Traditional methods of data annotation, such as rule-based Enumeration and Classification (CAPEC), Malware Attribute
systems and fine-tuned machine learning models, are limited Enumeration and Characterization (MAEC), Common Weak-
in their scalability, adaptability, and ability to handle the ness Enumeration (CWE), and CVE, providing a unified
complexity of cybersecurity data. view of cybersecurity concepts and their interrelationships
LLMs, such as GPT-4, BERT, and LLaMA, have emerged [26]. VulOntologies, on the other hand, focuse on linking
as powerful tools for automating data annotation [22]. These vulnerabilities to applications, platforms, and weaknesses,
models are pre-trained on large corpora and can be fine-tuned providing critical insights for incident response and risk
to adapt to domain-specific tasks, including cybersecurity management.
data annotation. In contrast to traditional approaches, LLMs The practical applications of these knowledge graphs
are able to perform complex relationship extraction, such are exemplified by tools such as reported in our previous
work [5], which combines a property graph formalism annotated samples. The best performing fine-tuned LLMs
with network infrastructure, cyber threat, and mission were Sentence BERT (SBERT) and rankT5, which provided
dependency data. Knowledge graphs facilitate automated good semantic understanding for this ranking task with a
reasoning, allowing implicit knowledge to be inferred MAP@1 score of 85% and an NDCG@1 score of 84.46%.
from explicit data. For example, Resource Description
Framework (RDF) quadruples can model relationships IV. M ETHODOLOGY
in communication networks, improving cyber situational In this section, we provide the key definitions and describe
awareness by reasoning about vulnerabilities, attack patterns, the methodology for building the knowledge graphs and attack
and network configurations. Reasoning frameworks, such as chains from CTI reports.
the Subsumption Reasoning for Rule Deduction (SRDD),
enhance this capability by identifying redundant semantic A. Definitions
rules and enabling dynamic updates to threat intelligence 1) Cyber Security Knowledge Graphs (CSGKs): Formally,
data [27]. a Type I RDF graph GR is a set of RDF triples of the form
By integrating heterogeneous data sources into machine (s, p, o) ∈ (I ∪ B) × I × (I ∪ L ∪ B), where:
readable and interpretable structures,CSKGs offer transfor- • I is a set of International Resource
mative benefits for cybersecurity. They enable real-time Identifiers (IRIs) of the form
threat modelling, improve situational awareness, and support scheme:[//[user:pwd]host[:port][/]path
advanced applications such as dynamic risk assessment and [?query][#fragment] or a valid subset of these;
threat mitigation. In this paper, we exploit the potential • L represents RDF literals, which are either:
of CSKGs to integrate network logs, CTI reports, and – Plain, self-denoting literals LP of the form
vulnerability data into unified knowledge representations. "<string>"(@<lang>), where <string> is
These graphs form the foundation of our architecture, a string and <lang> is an optional language tag; or
enabling the automation of both defensive and offensive – Typed literals LT , of the form "<string>"
cybersecurity tasks while addressing critical gaps in data <datatype>, where <datatype> is an IRI
integration and reasoning. representing a datatype according to a schema, and
C. Mapping CVEs to MITRE ATT&CK <string> is an element of the lexical space
corresponding to the datatype;
In [28], the authors introduced a multi-label text
• B is a set of blank nodes, i.e., unique but anonymous
classification approach for mapping CVEs to ATT&CK
resources that are neither IRIs nor RDF literals.
techniques and proposed a multi-head joint embedding
neural network architecture. They addressed the problem Here, the sets I, L, and B are pairwise disjoint infinite
of insufficiently labelled data by using an unsupervised sets. The Type I RDF graph is used to represent a knowledge
labelling method that assigned labels to 17 MITRE ATT&CK base about the network infrastructure, and depending on the
techniques without supervision. Their unsupervised approach granularity, the nodes represent either simulated, emulated,
correctly labels a subset of only 17 out of ∼300 offensive or real network infrastructure, network device entities, and
techniques, leaving room for significant improvement in both their properties, whereas the edges represent the physical and
accuracy and coverage. logical links between them.
The study reported in [29] compared several Machine The second type of graph we use is a labeled property
Learning (ML)models and their performance in mapping graph, which is of the form GLP = (V, E, ι, λ, π), where:
CVEs to MITRE. The best performing approach was a • V is a finite set of vertices or nodes,
self-distillation approach that combines relational knowledge • E is a finite set of edges such that V and E are disjoint,
from a pre-trained LLM model and a zero-shot model. • ι : E → (V × V ) is an incidence function that maps
Experimental results showed that the pre-trained model each edge in E to a pair of vertices in V ,
generalised better to unseen data than a model without • λ : (V ∪ E) → LS is a label function that associates an
knowledge distillation. However, the classification was only edge or vertex with a set of labels from LS , and
multiclass, not multilabel, as a CVE was only mapped to a • π : (V ∪ E) × P → VS is a property assignment function
corresponding tactic. that assigns a set of values from VS to each property.
On the same dataset, [30] also presented a multi-label Here, the functions λ and π, are partial functions. Type II
text classification approach. The difference is that the is a graph that represents cyber threat intelligence covering
authors mapped attack techniques instead of tactics. The text open source data from MITRE ATT&CK, NIST, MITRE
descriptions of CVEs were classified using A BERT Model D3FEND or attack tools such as Atomic Red Team and
for Scientific Text (SciBERT) and adversarial attacks for the Metasploit. This graphs are a compressed representation
31 techniques with an F1 score of 47.84%. of cyber attack methodologies, where tactics, techniques,
In [31], the authors mapped CVEs to the MITRE CWE mitigation procedures or attack actions are node types within
Top 25 Weaknesses by approaching the problem as a ranking the graphs, and their interrelationships are represented as
problem and aiming to publish a dataset of 4012 manually connecting edges. The graphs include additional node types
representing descriptions of common CWEs and CVEs, 1
which are important to describe the broader context of
vulnerabilities. These nodes are mapped to the corresponding
Cyber Threat Intelligence
ATT&CK techniques and D3FEND mitigation procedures
providing links within the heterogeneous graph. Finally, the 2
Communication. In addition, the actor utilized side-loading
Type III graph is used for risk analysis of the system to execute Mimikatz and used stealer malware to collect
infrastructure and defined by a mapping between the Type keystroke and clipboard data from users.

I and Type II graphs.


Tags: LSASS Memory Keylogging DLL Side-Loading OTHER
2) Planning Domain Definition Language: PDDL provides (T1003.001) (T1056.001) (T1574.002) (T1115)
a standardised and human-readable format for describing CTI Report Analyser
planning problems for automatic attack chain generation.
3 7
It consists of two components: the domain file and the Tag 1 ... ... Tag j

problem file. In the domain file, PDDL specifies all possible BERT BERT BERT

states in the target domain and all possible predicates Pooling Pooling Pooling
and actions considered during planning. A predicate is Type I

a boolean statement or property that describes (part of) Embedding Embedding Embedding

the state of the target world. An action represents an Linear Layer1 Linear Layeri
...
operation that can change the state of the world in a

Knowledge Base
BERT Classifier Type II
planning problem. Each action has three primary components:
4
parameters, precondition, and effect. Parameters are defined Tactic1 ... Tactick
in predicates, which are then used in preconditions and used
in preconditions and effects to link actions. The execution of Technique1 ... Techniquei
Type III
an action results in changes in the target domain, i.e. the world
changes from one state to another. The problem description 6 5
Documentation SecRoBERTa Log Data
defines the start and end states of the given problem. Our goal
is to find a plan, usually a sequence of actions, that will get Attack Tool Analyser Blue Agent
us from the starting state to the target state.
Attack Problem File:
In addition, we use a cost function to assign costs to effects, …
Attack Domain File:
This allows automated planning algorithms to solve the (:domain attack-domain) ...
(:init (initial state))
problem during planning by optimising the choice of actions. (:goal (goal state))
(define (:domain attack-
Preconditions and effects support logical operations such as domain)
… (:types (…))
”and”, ”or”, and ”not”. Preconditions and effects are essential (:predicates (…))
for coordinating actions: the effect of an action should satisfy Cost Function (:actions (…))…)
the preconditions of subsequent actions. The problem file Attack Chain Generator 8
consists of the corresponding domain, the requirements, the
object used in planning, the initial state, the final state denoted Fig. 1: Architecture for creating cyber security knowledge
by predicates, and the optimisation metric. graphs and attack chains.
B. Architecture
Our architecture combines LLMs with CSKGs to OpenC2 logs. The Type II graph is constructed from
automate cybersecurity knowledge extraction and attack chain annotated CTI data and enriched with links to the
generation, as illustrated in Fig. 1 (1)-(8). This includes MITRE ATT&CK framework. The Type III graph is used
analysers for extracting knowledge from attack tools and CTI for risk analysis of the system infrastructure and defined
reports, and a generator for building realistic attack chains by a mapping between the Type I and Type II graphs.
based on the processed knowledge: 4) Attack Chain Generator: Uses the Type III knowledge
1) Attack Tool Analyser: SecRoBERTa model fine-tuned graph to map attack chains by integrating system state
on cyber security text to extract features from attack data with attack actions prioritised by a cost function and
tool documentation, including supported platforms, planned using the PDDL solver.
executors, and relevant MITRE ATT&CK tactics and
techniques. C. Automated Attack Tool Analyser
2) CTI Report Analyser: Fine-tuned SecRoBERTa model To define a set of attack actions, which are the
for annotating and extracting tactics, techniques, and smallest executable units used to perform a cyberattack,
procedures from CTI reports. we use an instance of SecRoBERTa as Tool Analyser to
3) Cyber Security Knowledge Graph (CSKG): The Type extract key features from the raw documentation of attack
I RDF graph is constructed using GraphDB, allowing tools, including supported platforms, executors, and relevant
SPARQL queries over normalised network data from MITRE ATT&CK tactics and techniques. For each action, we
1 1
2 (Prompt Name) Causality-based Analysis Prompt 2 (Prompt Name) CTI Report Analyzer Prompt
3 (System) You are a professional assistant for causal 3 (System) You act as a professional computer scientist
4 reasoning. 4 and security engineer.
5 (User) Which cause-effect-relationship is more likely 5 (User) Please analyze the following CTI report,
? 6 break down the attack into simple tags, and
6 (User) A. variable1 causes variable2 7 map each tag to the Tactic, Technique,
7 B. variable2 causes variable1 8 and Procedure (TTP) of the MITRE ATT&CK
8 C. neither {variable1 nor variable2 cause each 9 Matrix...
other 10 (User) The list of TTPs from MITRE ATT&CK Matrix is
9 D. variable1 and variable2 are equivalent 11 here: technique_list.
10 (User) First think about each option. 12 (User) Please output the results in the JSON format.
11 (User) Then please provide your final answer: 13 (User) Please only output the texts in the JSON file.
12 A, B, C, or D. 14 (User) Do not add any preambles.
13 (Assistant) @/** The reasonings given by LLM**/@ 15 (User) Here is the output template: output_template.
14 (User) Please provide your final answer: 16 (User) Here is the report: report_info.
15 A, B, C, or D?
16
17
Only give a single letter.
(Assistant) @/** Final Answer**/@
Listing 2: CTI report analyzer prompt.

Listing 1: Causality-based relationship analyzer prompt. provided, without introducing irrelevant or trivial effects.
Both preconditions and effects are generated as unstructured
generate a unique identifier, a nickname, and specify its data sentences in English.
source. Each attack action includes entities such as UUID, After structuring the available attack actions, the next
name, description, source, supported platforms, and executor, challenge is how to connect these isolated actions into
which can be assigned directly or extracted from the attack a holistic cyberattack, which requires an analysis of the
tool documentation. The main components of the Attack Tool relationships between different actions. PDDL-based planning
Analyser are illustrated in Fig. 1 (6). uses predicates to model the states of the target world and to
To map actions to MITRE ATT&CK tactics and techniques, link actions by connecting predicates between preconditions
which are essential for structuring various components into and effects.
a cohesive cyberattack, we use Atomic Red Team [32] as a Now we need to translate these descriptions into predicates
training set to fine-tune the SecRoBERTa model and apply the while preserving the relationships between related actions,
fine-tuned model to infer other unlabelled attack tools such i.e. generating a single predicate for semantically similar
as Metasploit. effect/precondition sentences. To capture the semantics of
Preconditions and effects are critical features for effective sentences, we use the State-of-the-Art text embedding models
attack planning and organisation. An action is included in from OpenAI.
an attack plan only if all its preconditions are met, and After obtaining the sentence embeddings, we use DBSCAN
its effects help satisfy the preconditions of other actions or to obtain the sentence clusters, where sentences in the same
achieve specific malicious goals. Our fine-tuned SecRoBERTa cluster have similar semantics. After clustering the sentences,
Attack Analyser uses the attack command and its description we ask the LLM to summarise the sentences in a cluster and
to generate corresponding preconditions and effects. To guide generate a predicate based on it. After text-based relationship
precondition generation, we design prompts that focus the analysis, similar sentences are consolidated into a single
model on specific, human-determined aspects. predicate.
We have taken the following aspects into consideration However, causal relationships may still exist between
when generating the prompts: certain predicates even if their wording differs, i.e. one
1) Executor: Each action, whether it is a script, command, predicate may imply another despite textual differences. In
cmdlet, or exploit, must be executed by the attackers this paper we also use an LLM to analyse the causal
using some executors. The LLM examines the ”executor” relationship between different predicates using the prompt
preconditions for each action, which describes how that illustrated in Listing 1. When two predicates are identified
action can be executed. as causally related (e.g. predicate A implies predicate B), we
2) Privilege: Another important precondition is whether introduce a pseudo-action with predicate A as its precondition
the command requires elevated privileges to run. If the and predicate B as its effect. This allows the planner to select
answer is yes, this indicates that attackers will need to the pseudo action when predicate A is true, so that predicate
take additional steps to obtain a more privileged executor B can also become true. As a result, actions that require
before this action can be performed. predicate B as a precondition can then be executed.
3) Files: This precondition summarises the external files
that must be placed in order to perform an attack action D. Automated CTI Report Analyser
on the victim host. To transform the vast actions space generated by the Attack
4) Credentials and information: Some attack actions also Tool Analyser in a suitable attack plan, we fine-tune an
require credentials or information gathered in the instance of SecRoBERTa on labeled data from the MITRE
previous steps. Threat Report ATT&CK Mapper (TRAM) project [33] to
For effects, we do not impose any restrictions on the create annotated data from CTI reports and to extract the
LLM. This comes from the observation that the LLM TTP and their descriptions from these reports, as illustrated in
primarily summarises effects based on the documentation Fig. 1 (1) and (2). We use two approaches to create annotated
data sets from the CTI reports. First, the document is divided The initial and final states for each attack chain are defined
into sentences, which are assigned to the ATT&CK labels that in the PDDL problem files and can be adjusted by the user.
are contained wholly or partially by that particular sentence. The initial state includes predicates that are set to true at
This means that some sentences will have no labels, some the start of planning. We define root actions as actions that
will have exactly one, and some will have more than one. either have no preconditions or have preconditions that are
The result is a multi-label macro dataset containing sentences satisfied by the initial state, i.e. no other attack actions are
and their respective labels. required beforehand. Similarly, the final state can be adjusted
The second approach is to split the documents into phrase- based on the attacker’s objectives and test goals, such as data
level tags as illustrated in Fig. 1 (2). In this case, each exfiltration, data compression, or lateral movement.
phrase can be mapped isomorphically to one single ATT&CK Finally, after generating the PDDL domain files with cost
technique therefore generating the micro single-label dataset functions assigned to the actions, and the PDDL problem files,
which we use to generate the Type II graph. For example, we used the Fast Downward PDDL solver available at [? ] to
the tag Keylogging (green) extracted from the phrase ”used find the optimal attack path from the action space.
stealer malware to collect keystroke” will be mapped to
V. R ESULTS & D ISCUSSION
technique T1056.001. An exemplary prompt used together
with our fine-tuned model is illustrated in Listing 2. A. Evaluation Methodology
In sequence, the CTI Report Analyser uses the architecture 1) External Attack Tools: We use Atomic Red Team [32]
in Fig. 1 (3) to compute a probability that the input maps and Metasploit [34] as data sources for attack tools. Atomic
to a particular ATT&CK technique or tactic (4). For the loss Red Team is a collection of tests that map to techniques
function, we combine a sigmoid layer with the Binary Cross- according to the MITRE ATT&CK framework [1]. Each test
Entropy (BCE) loss because the output after the linear layer provides the natural language description, input arguments,
represents the probabilities that a given CVE points to that attack commands, and dependencies. Metasploit is a widely
particular technique, which needs to be treated independently. used framework for penetration testing. We extract features
Consistently dividing the document allows for the positive from these data sources with the help of LLMs, add these
samples to be reproducible, as well as, produces negative features to our set of actions, and connect these actions to the
samples. The model needs to be trained on negative samples attack chain.
so that it can best determine when part of a document does 2) CTI Reports: We used the 50 most common CTI
not have an ATT&CK technique. reports as input to the model from the MITRE TRAM
project [33]. This dataset consists of annotated data of the
E. Attack Chain Generator
most common CVEs mapped to the corresponding attack
The attack chain Generator maps the knowledge from the techniques identified by hand by the MITRE research team.
Type I RDF graph to the corresponding CTI reports and It consists of two different files, a micro file and a macro file,
ATT&CK techniques stored in the Type II knowledge graph, where the micro file represents single-label annotations and
resulting in the Type III knowledge graph as illustrated in Fig. the macro file represents multi-label classification results. To
1 (5) and (6). All graphs are stored in the knowledge base construct the attack chain, we focused on reports containing
(7) which supports two options for back-end data storage and multiple attack patterns from the macro file, as we wanted
query processing: to reconstruct complete attack chains with multiple steps.
• Neo4j graph database with normalized data queried using Finally, we did not consider the image, audio or video
Cypher. attachments in the reports. The CTI report dataset created
• GraphDB RDF store with normalized data queried using in our evaluation will be open source in our code repository.
SPARQL. 3) Evaluation Metrics: To evaluate the performance of
To build sophisticated attack chains, the Attack Chain LLMs for the data annotation task and the mapping of
Generator (Fig. 1 (8)) computes the current system state using CTI reports to the MITRE ATT&CK database, we use
the monitored system logs (5) and the possible attack TTP the precision, recall, F1 score, completeness, correctness,
given the current system state from the Type III CSKG (7) and Bi-Lingual Evaluation Understudy (BLEU), Recall-Oriented
their relationships to find a subset of relevant attack actions in Understudy for Gisting Evaluation (ROUGE) and Character
the attack action space generated by the Attack Tool Analyser N-gram F-score (ChrF) metrics.
(6). BLEU [35] compares n-grams (sequences of n contiguous
Attack actions are prioritised by assigning a cost function words) between the generated text G and the gold standard
to the effect of each action, reflecting information on whether R. It calculates a precision score based on the number
the attack can be executed given the current system state of n-grams in G that match R, out of all generated n-
and whether there is a known vulnerability given the system grams. This precision score is adjusted by a Brevity Penalty
requirements reported in a CTI report. In summary, actions (BP) to penalise excessively short translations. ROUGE [36],
associated with techniques that meet the system requirements is a family of metrics for comparing the generated text
and are mentioned in the report are assigned a lower cost, with reference texts, where ROUGE-N measures the n-gram
while others are assigned a higher cost. overlap between G and a set of reference texts R, and
TABLE I: Quantitative metrics from two LLMs mapping CTI TABLE II: Performance metrics from five LLMs mapping
reports to MITRE ATT&CK techniques. CVEs to MITRE ATT&CK techniques [5].
Model BLEU-4 ROUGE-2 ROUGE-L ChrF Model Precision (SD) Recall (SD) F1 Micro (SD) F1 Macro (SD)
CyBERT 83.02 (1.10) 75.92 (1.35) 78.61 (1.20) 58.70 (4.49)
GPT 4o 3.23 28.48 27.29 36.08 SecBERT 82.22 (0.44) 77.00 (0.66) 78.83 (0.58) 64.10 (2.40)
GPT 4 1.88 34.36 26.84 25.17 SecRoBERTa 82.15 (0.39) 77.08 (1.25) 78.92 (0.72) 65.48 (2.00)
SecRoBERTa 74.53 81.13 89.12 92.97 GPT 4o 52.10 (0.01) 27.70 (0.01) 26.47 (0.01) 19.30 (0.01)
GPT 4 51.50 (0.01) 29.60 (0.01) 29.05 (0.01) 22.10 (0.01)

ROUGE-L measures the Longest Common Sequence (LCS)


on which it is decided whether the CVE can be exploited
of words between G and R. Finally, ChrF [37] measures the
using a given offensive technique. For the multi-label task,
similarity of G and R based on character n-grams using the
we computed confusion matrices for each class. In training,
harmonic mean of precision and recall.
the number of epochs was chosen by observing the behaviour
B. Data Annotation Results in longer experiments, based on when the validation loss
Table I shows the results of the Automated CTI Report converged to a constant value. A learning rate search based
Analyser, the numbers behind the metrics indicate which n- on the best average validation score for each model was
gram was used. The LLMs (GTP 4o, GPT 4, fine-tuned performed for all models to find the optimal value. For all
SecRoBERTa) were analysed using the top 50 CTI reports transformers, the AdamW optimiser was used, the batch size
from the MITRE ATT&CK TRAM training data. was set to 16, and the learning rate during training was set
Both zero-shot models showed low performance in BLEU- to 3 × 10−5 . No input truncation was needed as the longest
4, ROUGE-L and ChrF scores, indicating a low ability to weakness was shorter than the 512 token input limit set by
produce accurate and lexically rich mappings with higher BERT.
alignment to the reference data. A low BLEU-4 score of 3.23 Table II shows the results per technique for five different
GPT-4o reflects the complexity and diversity of language in LLMs including the best performing model, SecRoBERTa.
CTI reports. GPT 4 achieves the highest ROUGE-2 score of The results shown have been calculated after the model
34.36 among the zero-shot models, reflecting a slightly higher has been trained on the full training and validation dataset
ability to capture important bi-gram overlaps. and tested on the fixed test dataset. The models with the
In comparison, our fine-tuned SecRoBERTa model showed best F1-micro scores were SecRoBERTa (78.92%), SecBERT
much better performance as indicated by high BLEU-4 (78.83%) and CyBERTs (78.61%). In addition, analysis of the
(74.53), ROUGE-2 (81.31), ROUGE-L (89.12) and ChrF multi-label macro dataset showed high precision for CyBERT
(92.97) scores. This shows that the fine-tuned SecRoBERTa (83.02%), high recall for SecRoBERTa (77.08%) and high
model was able to understand the methodology used to F1 macro for SecRoBERTa (65.48%), as shown in the bar
generate the MITRE ATT&CK TRAM training data and was plots in Fig. 2 (a, b, c and d). These results suggest that
able to extract the most relevant information from the CTI the models effectively extracted relevant information during
reports. training and demonstrated some ability to generalise to unseen
data, indicating significant potential for extracting valuable
C. Mapping CVEs to MITRE ATT&CK insights from the textual descriptions of CVEs. However,
In order to evaluate the performance of our CTI Report despite fine-tuning, the models still showed some limitations
Analyser, we trained three encoders on BERT-based models in dealing with the significant class imbalance inherent in the
that were fine-tuned using cybersecurity datasets, as well dataset.
as GPT-4 and GPT-4o as a baseline for comparison. Given the relatively large number of classes for the current
First, A Pretrained Language Model for Cyber Security multi-label classification task (macro), the macro F1 score of
Text (SecBERT) is a BERT-based model pre-trained on a 78.92%, which gives equal value to each class, shows that
cybersecurity corpus. Second, SecRoBERTa was fine-tuned although the model naturally tends to classify the class with
on the same cybersecurity corpus as SecBERT, but as its name the largest number of samples better, it still differentiates
suggests, the base model is RoBERTa, which is an optimised between the remaining classes and thus extracts relevant
variant of BERT. Third, Cyber Security BERT (CyBERT) is linguistic features from the description of the CVE. As shown
a domain-specific BERT model that has been fine-tuned with in Table II, GPT-4 and GPT-4o show low performance in a
a large corpus of textual unlabelled cybersecurity data. zero-shot setup compared to the fine-tuned models. Adding
These five models were used to encode the CVE textual a short description of each technique in the prompt did not
description contained in the single (micro) and multi-label improve the results, resulting in slightly worse performance.
(macro) annotation datasets generated using the architecture
of Fig. IV-B. The input layer is followed by a pooling D. Attack Chain Generation
and linear layer with 50 output nodes, one for each class The Attack Chain Generator creates an attack action space
within the 50 most common techniques of 2024 proposed by of 1831 samples for the PDDL domain to cover as much TTP
MITRE, coupled with a loss function (BCEWithLogitsLoss). as possible. Compared to the baseline library from Atomic
The output represents a probability for each class, based RedTeam [32], this is an improvement of ∼ 15%. These
100 100 TABLE III: Performance of our fine-tuned SecRoBERTa
83.02 82.22 82.15
77.00 77.08 model compared with Atomic Red Team (baseline).
Weighted Precision

80 80 75.92

Weighted Recall
60 52.10 51.50 60
Model Attack Space Completeness Correctness
40 40
27.70 29.60
SecRoBERTa 1813 0.914 0.931
20 20
Atomic (baseline) 1570 0.387 0.411
0 0
RT

4o

4o

4
T

RT
Ta

Ta
ER

ER
PT

PT
ER

ER
BE

BE
PT

PT
cB

cB
G

G
VI. C ONCLUSION
oB

oB
G

G
Cy

Cy
Se

Se
cR

cR
Se

Se
(a) Weighted Precision in %. (b) Weighted Recall in %.
This paper introduced an architecture for constructing
100 100 CSKGs and automating the generation of attack chains. Using
80 78.61 78.83 78.92
Weighted F1 Macro
LLMs and structured graph representations, our approach
Weighted F1 Score

80
64.10 65.48
60 60 58.70 enables efficient annotation of CTI reports, integration of
40 40 multiple data sources, and automated planning of attack
26.47 29.05
20 20 19.30 22.10 simulations. Our results demonstrate significant improvements
0 0
in the quality and completeness of CSKGs, achieving over
50% expansion of attack techniques covered and a 15% larger
RT

4o

4
T

RT

4o

4
Ta

Ta
ER

ER
PT

PT
ER

ER
BE

BE
PT

PT
cB

cB
G

G
oB

oB

total attack space compared to baseline methods, as well as


G

G
Cy

Cy
Se

Se
cR

cR
Se

Se

significantly higher completeness and correctness in attack


(c) Weighted F1 micro in %. (d) Weighted F1 Macro in %.
chain generation. These results highlight the effectiveness
Fig. 2: Precision (a), recall (b), F1 micro (c), and F1 Macro (d) of our approach in creating actionable and comprehensive
from five LLMs mapping CVEs to offensive techniques [5]. cybersecurity knowledge representations to support ACD for
critical network infrastructures.
results are shown in Table III and suggest that our automated Future work will focus on achieving full automation of
approach can contribute to open source projects like Atomic attack chain execution, extending the system’s ability to
Red Team to enrich the description of attack chains. dynamically adapt to real-world scenarios and adversary
We also compared the TTPs covered by the attacks built by strategies. In addition, we plan to make the datasets and
our model with those covered by the attack plans manually models developed in this research publicly available to the
built by MITRE. The results show that our model increases NATO community, enhancing collaboration and advancing
the attack technique space by 52.5%. An advantage of our cybersecurity research among Allies. These steps will further
model is that MITRE’s attack plans are manually constructed enhance the operational readiness and strategic defence
over a period of five years, and our model can automatically capabilities of ACD in protecting critical infrastructure.
generate attack chains within minutes.
R EFERENCES
We also evaluated the functionality of the constructed
[1] MITRE, “ATT&CK Knowledge Base,” https://attack.mitre.org, 2024,
attacks, focusing on their completeness and correctness. Since [Accessed 25-11-2024].
the constructed cyberattacks contain multiple attack steps, it [2] G. Deng, Y. Liu, V. Mayoral-Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang,
is important to ensure that each attack step can be executed Y. Liu, M. Pinzger, and S. Rass, “Pentestgpt: An llm-empowered
automatic penetration testing tool,” 2024.
without missing prerequisites, and that each attack step can [3] J. F. Loevenich, E. Adler, R. Mercier, A. Velazquez, and R. R. F. Lopes,
achieve some malicious goal or provide prerequisites for “Design of an Autonomous Cyber Defence Agent using Hybrid AI
further steps. Completeness is the ratio of actions in the models,” in 2024 International Conference on Military Communication
and Information Systems (ICMCIS), Koblenz, Germany, 2024, pp. 1–10.
plan that can be executed without missing prerequisites. [4] J. F. Loevenich, E. Adler, T. Hürten, and R. R. F. Lopes, “Design and
Correctness is the proportion of actions in the plan that can Evaluation of an Autonomous Cyber Defence Agent Using DRL and
be used to achieve malicious goals identified by the MITRE an Augmented LLM,” SSRN, pp. 1–18, 2024.
[5] J. Loevenich, E. Adler, T. Hürten, and R. R. F. Lopes, “Design and
ATT&CK technique label in real-world attacks. evaluation of an Autonomous Cyber Defence agent using DRL and an
Our experimental results show that our Attack Chain augmented LLM,” Computer Networks, vol. 262, p. 111162, 2025.
Generator outperforms tools such as Atomic Red Team, [6] A. Velazquez, A. Bécue, J. F. Loevenich, R. R. F. Lopes, F. Free-Nelson,
and T. Braun, “Challenges and framework for developing autonomous
as indicated by higher completeness scores of 0.914 and cyber defense agents for tactical edge scenarios,” in NATO STO
correctness scores of 0.931, as shown in Table III. We have Information Systems Technology Research Symposium (IST-208-RSY)
identified two main reasons for this: First, our model uses - Towards the convergence of Edge Computing, Adaptive Networking,
and Information Management at the Tactical Edge, 10 2024.
predicates to map attack actions using their preconditions [7] J. F. Loevenich, A. Velazquez, A. Bécue, R. R. F. Lopes, F. Free-
and effects, whereas the baseline method randomly selects Nelson, and T. Braun, “Towards robust and secure autonomous cyber
actions. Second, by analysing the effects of each attack action defense agents in coalition networks,” in NATO STO Information
Systems Technology Research Symposium (IST-208-RSY) - Towards the
in detail, our model selects the executable actions from the convergence of Edge Computing, Adaptive Networking, and Information
respective documents. This is particularly important when Management at the Tactical Edge, 10 2024.
organising the Atomic Red Team’s library, where some scripts [8] J. F. Loevenich, E. Adler, A. Bécue, A. Velazquez, K. Wrona,
V. Boshnakov, J. Falkcrona, N. Nordbotten, O. L. Worthington,
only simulate the side effects of real attack commands, or J. Röning, and R. R. F. Lopes, “Training Autonomous Cyber
serve as proof-of-concept demonstrations. Defense Agents: Challenges & Opportunities in Military Networks,”
in MILCOM 2024 - 2024 IEEE Military Communications Conference A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada:
(MILCOM), Washington, USA, 2024, pp. 158–163. Association for Computational Linguistics, Jul. 2023, pp. 13 484–
[9] J. F. Loevenich, T. Hürten, F. Spelter, E. Adler, J. Braun, L. Moxon, 13 508.
Y. Gourlet, T. Lefeuvre, and R. R. F. Lopes, “Towards Robust and [22] V. Samuel, H. Aynaou, A. G. Chowdhury, K. V. Ramanan, and
Secure Autonomous Cyber Defense Agents in Coalition Networks,” A. Chadha, “Can llms augment low-resource reading comprehension
in MILCOM 2024 - 2024 IEEE Military Communications Conference datasets? opportunities and challenges,” 2024.
(MILCOM), Washington, USA, 2024, pp. 152–157. [23] Z. Liu, Z. Sun, J. Chen, Y. Zhou, T. Yang, H. Yang, and J. Liu,
[10] T. Hürten, J. F. Loevenich, F. Spelter, E. Adler, J. Braun, L. Moxon, “STIX-based Network Security Knowledge Graph Ontology Modeling
Y. Gourlet, T. Lefeuvre, and R. R. F. Lopes, “Hierarchical Multi-Agent Method,” in Proceedings of the 2020 3rd International Conference on
Reinforcement Learning for Autonomous Cyber Defense in Coalition Geoinformatics and Data Analysis, ser. ICGDA ’20. New York, NY,
Networks,” in MILCOM 2024 - 2024 IEEE Military Communications USA: Association for Computing Machinery, 2020, p. 152–157.
Conference (MILCOM), Washington, USA, 2024, pp. 176–181. [24] M. Iannacone, S. Bohn, G. Nakamura, J. Gerth, K. Huffer, R. Bridges,
[11] Y. Gourlet, T. Lefeuvre, J. F. Loevenich, T. Hürten, F. Spelter, E. Ferragut, and J. Goodall, “Developing an ontology for cyber security
E. Adler, J. Braun, L. Moxon, and R. R. F. Lopes, “BRETAGNE: knowledge graphs,” in Proceedings of the 10th Annual Cyber and
Building a Reproducible and Efficient Training AI Gym for Information Security Research Conference, ser. CISR ’15. New York,
Network Environments,” in MILCOM 2024 - 2024 IEEE Military NY, USA: Association for Computing Machinery, 2015.
Communications Conference (MILCOM), Washington, USA, 2024, pp. [25] C. Onwubiko, “Cocoa: An ontology for cybersecurity operations
164–169. centre analysis process,” in 2018 International Conference On Cyber
[12] E. Adler, J. F. Loevenich, L. Moxon, T. Hürten, F. Spelter, J. Braun, Situational Awareness, Data Analytics And Assessment (Cyber SA),
Y. Gourlet, T. Lefeuvre, and R. R. F. Lopes, “Exploring the Potential 2018, pp. 1–8.
of Large Language Models for Red Teaming in Military Coalition [26] Z. Syed, A. Padia, T. Finin, L. Mathews, and A. Joshi, “UCO: A unified
Networks,” in MILCOM 2024 - 2024 IEEE Military Communications cybersecurity ontology,” in Workshops at the thirtieth AAAI conference
Conference (MILCOM), Washington, USA, 2024, pp. 170–175. on artificial intelligence, 2016.
[13] A. Velazquez, R. R. F. Lopes, A. Bécue, J. F. Loevenich, P. H. L. [27] S. Zhang, S. Li, P. Chen, S. Wang, and C. Zhao, “Generating network
Rettore, and K. Wrona, “Autonomous Cyber Defense Agents for NATO: security defense strategy based on cyber threat intelligence knowledge
Threat Analysis, Design, and Experimentation,” in MILCOM 2023 - graph,” in Emerging Networking Architecture and Technologies,
2023 IEEE Military Communications Conference (MILCOM), 2023, W. Quan, Ed. Singapore: Springer Nature Singapore, 2023, pp. 507–
pp. 207–212. 519.
[14] J. F. Loevenich, J. Bode, T. Hürten, L. Liberto, F. Spelter, P. H. L. [28] A. Kuppa, L. Aouad, and N.-A. Le-Khac, “Linking CVE’s to MITRE
Rettore, and R. R. F. Lopes, “Adversarial attacks against reinforcement ATT&CK Techniques,” in Proceedings of the 16th International
learning based tactical networks: A case study,” in MILCOM 2022 - Conference on Availability, Reliability and Security, ser. ARES ’21.
2022 IEEE Military Communications Conference (MILCOM), 2022, New York, NY, USA: Association for Computing Machinery, 2021,
pp. 986–992. pp. 1–12.
[15] P. Gao, F. Shao, X. Liu, X. Xiao, H. Liu, Z. Qin, F. Xu, P. Mittal, S. R. [29] B. Ampel, S. Samtani, S. Ullman, and H. Chen, “Linking Common
Kulkarni, and D. Song, “A system for efficiently hunting for cyber Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A
threats in computer systems using threat intelligence,” in 2021 IEEE Self-Distillation Approach,” in arXiv, 2021, pp. 1–5.
37th International Conference on Data Engineering (ICDE), 2021, pp. [30] O. Grigorescu, A. Nica, M. Dascalu, and R. Rughinis,
2705–2708. “CVE2ATT&CK: BERT-Based Mapping of CVEs to MITRE
[16] M. T. Alam, D. Bhusal, Y. Park, and N. Rastogi, “Looking Beyond ATT&CK Techniques,” Algorithms, vol. 15, no. 9, 2022.
IoCs: Automatically Extracting Attack Patterns from External CTI,” [31] A. Haddad, N. Aaraj, P. Nakov, and S. F. Mare, “Automated Mapping
in Proceedings of the 26th International Symposium on Research in of CVE Vulnerability Records to MITRE CWE Weaknesses,” in arXiv,
Attacks, Intrusions and Defenses (RAID ’23), ser. RAID ’23. New 2023, pp. 1–15.
York, NY, USA: Association for Computing Machinery, 2023, p. [32] MITRE, “Atomic red team,” https://github.com/redcanaryco/atomic-red-
92–108. team.
[17] R. P. B. Jordan and T. Darley, “STIX version 2.1,” https://docs.oasis- [33] ——, “Threat Report ATT&CK Mapper (TRAM),”
open.org/cti/stix/v2.1/os/stix-v2.1-os.html, 2021, [Accessed 12-01- https://github.com/center-for-threat-informed-defense/tram/.
2025]. [34] Rapid, “Metasploit: The world’s most used penetration testing
[18] E. K. J. Hooi, A. Zainal, M. A. Maarof, and M. N. Kassim, “Tagraph: framework,” https://www.metasploit.com/.
Knowledge graph of threat actor,” in 2019 International Conference on [35] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method
Cybersecurity (ICoCSec), 2019, pp. 76–80. for automatic evaluation of machine translation,” in Proceedings of the
[19] G. D. Pasquale, I. Grishchenko, R. Iesari, G. Pizarro, L. Cavallaro, 40th Annual Meeting on Association for Computational Linguistics, ser.
C. Kruegel, and G. Vigna, “ChainReactor: Automated privilege ACL ’02. USA: Association for Computational Linguistics, 2002, p.
escalation chain discovery via AI planning,” in 33rd USENIX Security 311–318.
Symposium (USENIX Security 24). Philadelphia, PA: USENIX [36] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of
Association, Aug. 2024, pp. 5913–5929. summaries,” in Proceedings of the ACL Workshop: Text Summarization
[20] NATO, “Summary of NATO’s revised Artificial Intelligence (AI) Braches Out, 01 2004, pp. 74–81.
strategy,” July 2024. [37] M. Popović, “chrF: character n-gram F-score for automatic MT
[21] Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, evaluation,” in Proceedings of the Tenth Workshop on Statistical
and H. Hajishirzi, “Self-instruct: Aligning language models with self- Machine Translation, O. Bojar, R. Chatterjee, C. Federmann,
generated instructions,” in Proceedings of the 61st Annual Meeting of B. Haddow, C. Hokamp, M. Huck, V. Logacheva, and P. Pecina, Eds.
the Association for Computational Linguistics (Volume 1: Long Papers), Lisbon, Portugal: Association for Computational Linguistics, Sep. 2015,
pp. 392–395.

You might also like