25managing Cybersecurity and Privacy Risks of Cyber Threat Intelligence
25managing Cybersecurity and Privacy Risks of Cyber Threat Intelligence
Downloaded from
https://kar.kent.ac.uk/90779/ The University of Kent's Academic Repository KAR
Additional information
Versions of Record
If this version is the version of record, it is the same as the published version available on the publisher's web site.
Cite as the published version.
Enquiries
If you have questions about this document contact [email protected]. Please include the URL of the record
in KAR. If you believe that your, or a third party's rights have been compromised through this document please see
our Take Down policy (available from https://www.kent.ac.uk/guides/kar-the-kent-academic-repository#policies).
MANAGING CYBERSECURITY AND
PRIVACY RISKS OF CYBER THREAT
INTELLIGENCE
By
Adham Albakri
March 2021
A thesis submitted to
The University of Kent, School of Computing
in the subject of Computer Science for the degree of Doctor of Philosophy.
Supervised by:
Prof. Eerke Boiten
Prof. Peter Rodgers
Declaration
I declare that the work was solely conducted during registration for the above
award with the University of Kent, under University supervision. I declare that
no material contained in the thesis has been used in any other submission for an
academic award.
I confirm that the work represented in this submission was undertaken solely by
myself except where otherwise attributed.
“This work has received funding from the European Union Framework Pro-
gramme for Research and Innovation Horizon 2020 under grant agreement No
675320.”
ii
Acknowldgment
The journey is easier and more enjoyable when you travel with a good com-
panion. This thesis is the result of four years of hard work during the uncertain
times and COVID-19 pandemic, where I have been encouraged and supported by
many obliging people.
First, I would like to express my deepest gratitude to my supervisor, Prof Eerke
Boiten, for his precious observations and continuous support since the early stage
of this research. I am very blessed to have such a dedicated supervisor.
Second, my sincere thanks to Prof Peter Rodgers for his valuable comments and
suggestions. Also, I would like to thank Dr Rogério de Lemos for his supervision
and feedback during the first 1.5 years of my research.
I am highly grateful for the grant from the NeCS project, a network funded by
the European Union Horizon2020 Marie Skłodowska-Curie Actions program. This
life-changing opportunity has contributed to my research, training, and network-
ing.
I would also like to express my gratitude to my doctoral committee for their en-
couragement and invaluable comments. Furthermore, I would like to thank the
school of computing at the University of Kent and the cyber technology institute
at De Montfort University, where I have spent most of my research as a research
visitor.
My appreciation also goes out to my family for their support throughout these
years. I am grateful especially to my parents for their infinite care. I want to
commend the support I have got from my fellow researchers and friends. Also,
my special thanks to my friends who are my main supporters, especially during
my PhD; Imad, Dr Ali, M Khattab, Alva, Fenia and many others who were always
there during this exciting journey.
iii
Abstract
iv
Contents
Declaration ii
1 Introduction 1
1.1 Research Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 List of publications . . . . . . . . . . . . . . . . . . . . . . . . . . 8
v
2.4.3 Cyber threat intelligence standards . . . . . . . . . . . . . 43
2.4.4 Information sharing challenges . . . . . . . . . . . . . . . 51
2.4.5 Threat types . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.6 Legal requirements on cyber information sharing . . . . . . 56
2.5 Risk assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.6.1 Risks of sharing cyber threat information . . . . . . . . . 67
2.6.2 Risk assessment of sharing cyber threat information . . . 70
2.6.3 Sharing cyber threat information under laws and regulations 72
vi
4.2 Associated Risk Model (ARM) . . . . . . . . . . . . . . . . . . . . 107
4.2.1 Dataset analysis . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2.2 Threat analysis . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2.3 Total Associated Risk (TAR) . . . . . . . . . . . . . . . . 110
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.1 Experiment set up . . . . . . . . . . . . . . . . . . . . . . 111
4.4 Expert selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.5.1 Use Case 1: CTI contains malware information & personal
information - sharing for detections . . . . . . . . . . . . . 116
4.5.2 Use Case 2: “CTI contains malware information & personal
information – aggregation of data” . . . . . . . . . . . . . 128
4.5.3 Use Case 3: “Cyber threat intelligence contains malware
information and personal information - sharing for detection”136
4.6 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.8 Risk assessment questionnaire . . . . . . . . . . . . . . . . . . . . 147
vii
5.2.3 Use Case 3: Sharing CTI incident report for legal obligation 166
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6 Conclusion 171
6.1 Revisiting the Contribution . . . . . . . . . . . . . . . . . . . . . 172
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Bibliography 176
viii
List of Tables
ix
21 Severity value and Associated threats . . . . . . . . . . . . . . . . 118
22 Threats and matched property . . . . . . . . . . . . . . . . . . . . 118
23 UC1 Likelihood and total risk value (public sharing communities) 120
24 Likelihood and total risk value (trusted communities) . . . . . . . 121
25 UC1 Likelihood and total risk value for sub-dataset . . . . . . . . 122
26 UC1 Summary: Responses Returned . . . . . . . . . . . . . . . . 124
27 UC1 Part1, Threat Summary . . . . . . . . . . . . . . . . . . . . 125
28 UC1 Part2, Threat Summary . . . . . . . . . . . . . . . . . . . . 125
29 UC2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
30 UC2 Associated threats and Severity value . . . . . . . . . . . . . 130
31 UC2 Threats and matched property . . . . . . . . . . . . . . . . . 130
32 UC2 Likelihood and total risk value (public sharing communities) 131
33 UC2 Likelihood and total risk value (trusted communities) . . . . 132
34 UC2 Likelihood and total risk value for sub-dataset . . . . . . . . 133
35 UC2 Analysis Summary: Responses Returned . . . . . . . . . . . 133
36 UC2 Part1, Threat Summary . . . . . . . . . . . . . . . . . . . . 134
37 UC2 Part2, Threat Summary . . . . . . . . . . . . . . . . . . . . 135
38 Use Case 3 (CTI Dataset) . . . . . . . . . . . . . . . . . . . . . . 138
39 UC3 Associated threats and Severity value . . . . . . . . . . . . . 139
40 UC3 Threats and matched property . . . . . . . . . . . . . . . . . 140
41 UC3 Likelihood and total risk value (public sharing communities) 141
42 UC3 Likelihood and total risk value (trusted communities) . . . . 141
43 UC3 Likelihood and total risk value for sub-dataset . . . . . . . . 142
44 UC3 Responses Returned . . . . . . . . . . . . . . . . . . . . . . . 143
45 UC3 Part1, Threat Summary . . . . . . . . . . . . . . . . . . . . 144
46 UC3 Part2, Threat Summary . . . . . . . . . . . . . . . . . . . . 144
47 ABE attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
48 Proposed DataTags relating to four proposed classes of access . . 157
x
49 UC3 - Sample of the Cyber Incident Report . . . . . . . . . . . . 167
xi
List of Figures
xii
Chapter 1
Introduction
Over the past three decades, the internet has become a crucial part of the way
we live and work. It has entered every sector and how we communicate, store,
transfer and process information. At the same time, communities, businesses,
and governments rely on these technologies. The digital economy is expanding
rapidly, driven by generating, analysing and collecting information. This digi-
tal information grows from individuals' digital footprints, business work streams,
evolving internet of things (IoT) and more online activities. This growth creates
many new roles and opportunities; for example, according to 2017 and 2019 digital
economy reports by the United Nations, more than 100 million individuals work
in the ICT sector. In 2015, e-commerce sales were about $25.3 trillion [4]. By
2030, general-purpose technology such as data analytics is expected to provide an
additional economic output of around $13 trillion [5]. Everyone is using a smart-
phone, computer and IoT devices that connect to the internet. We store personal
and confidential information and use it for online banking, shopping and commu-
nication via emails and social media. At the same time, cybersecurity threats are
evolving. Therefore, we need to exercise caution and define countermeasures to
protect the confidentiality, integrity, and availability of the systems and services.
It is difficult to come up with a specific definition of cybersecurity and what it
1
A Albakri Chapter 1
contains. Cyber attacks are becoming more sophisticated and creative in differ-
ent ways. Thus, cybersecurity requires attention and commitment. Currently,
it is not difficult for any user to obtain malware and try to perform a cyberat-
tack against any organisation. Cybersecurity protects the devices connected to an
organisation's infrastructure, and the systems accessed by users, from unautho-
rised access and potential damage. The main goal of cybersecurity is protecting
an organisation's infrastructure and systems against any cyber-attacks and at-
tackers. Most businesses depend on the internet to reach customers and provide
services.Therefore, it is essential to define and implement measures to prevent
cybercriminals from gaining access and stealing users' data and devices. Like in-
dividuals, organisations need to protect their IT assets from cyber-attacks taking
place due to internal or external threats. Organisations need to convert unknown
threats to known threats to identify and mitigate threats based on business risks.
The relationship between defenders and attackers is asymmetric. The defenders
need to prepare and be aware of all threats that may exploit their organisations'
systems and infrastructure. On the other hand, the attackers need to exploit one
vulnerability to gain access and cause damage to the organisation. This asymme-
try gives the attacker a big head start comparing to the defender. The defenders
need to collect information from all available sources, whether public or closed.
For example, in 2020 [6] a threat actor was able to inject malicious code in the
body of the update of the SolarWinds Orion [7], an IT system which helps to
manage and monitor organisations' networks and infrastructure and is used by
thousands of organisations. This system can give the attackers a complete view
of those networks so they can steal sensitive information. Therefore, that attack
enabled access to critical infrastructures and industrial control system organisa-
tions.
The attackers gained access to more than 18000 private and government organi-
sations. They had the ability to take control of any affected installation because
2
A Albakri Chapter 1
of malware installed in the previous version. The breach was discovered when a
cybersecurity company called FireEye [8] which uses this software faced a breach.
It had faced unauthorised access to its Red Team tools, a set of tools used by its
security engineer team to exploit vulnerabilities in organisations' infrastructure.
FireEye shared a list of countermeasures and rules in various standards to help
the community detect the Red Team tools in their products and avoid any future
attacks through these tools. Sharing information by affected organisations about
the course of action and how to respond to this intrusion is essential to mitigate
this risk for others. There are various potential sources of cyber threat data such
as vendors, governments, private sources and open sources such as VirusTotal [9]
and Cisco Talos Intelligence [10].
Cyber Threat Intelligence (CTI) has various definitions and meanings based on
its goal and use. Henry Dalziel [11] indicated that cyber threat intelligence should
have three main features: be relevant, actionable and valuable. Therefore, CTI
should relate to the organisation's business and enable defenders to make mean-
ingful and productive decisions. It should also be proactive instead of following a
reactive tactic by providing awareness and insight about potential attacks.
In order to support sharing and analysing threat information, researchers and
organisations are working to develop formats and standards for exchanging CTI
information. The main standards can be listed as follows: Structured Threat In-
formation Expression (STIX) [12] which is currently the most applied standard,
Incident Object Description Exchange Format (IODEF) [13] and OpenIOC [14].
A reason for increasing the sharing of CTI information is the cost of data breaches,
the number of attacks, and threat actors' motivations and capabilities [15]. Shar-
ing helps organisations get better defence and increase threat detection accuracy
[16]. For example, in [17], authors found that sharing URL lists related to ma-
licious activity with hosting providers will minimise the possibility of using this
URL to exploit systems. Besides, sharing this list contributes to blocking and
3
A Albakri Chapter 1
stopping a malware attack quickly and effectively and identifying attack types.
CTI may contain sensitive and identifiable information about the victim's network
infrastructure, existing vulnerabilities, credentials, business processes and finan-
cial information. As sharing information has become more common, privacy and
confidentiality are considered to be a major concern and challenge. It is essential
that evaluating the risk of sharing CTI is incorporated whenever sharing CTI is
presented.
Privacy is a difficult-to-define concept across different communities. In the realm
of laws and regulations, privacy may relate to personal information (e.g. an ad-
dress or date of birth). In this thesis, we will use privacy to refer to any iden-
tifiable information in CTI datasets. There are various privacy-enhancing tech-
nologies (PETs) that can preserve privacy, confidentiality and mitigate potential
threats against some adversaries. Different techniques have been proposed such as
anonymisation techniques and models including k-anonymity [18][19] and differen-
tial privacy [20]. Chapter 2 provides a more in-depth exploration of the literature
and privacy-enhancing technologies.
Sharing CTI datasets is a desirable action, but sharing without a qualified evalu-
ation to quantify the risks of sharing CTI dataset would put the organisation at
risk; for example, disclosing a vulnerability to the public will encourage attackers
to exploit the systems especially when organisations require more time to patch
their systems [21]. In the same context, organisations may well be justified in
perceiving risks in sharing and disclosing cyber incident information, but they
seem to be more reluctant to express such worries in clear and well-defined terms.
Such concerns could also arise because of the risks of breaching regulations and
laws in relation to privacy. With regulations such as the General Data Protection
Regulation (GDPR) designed to protect citizens' data privacy, the managers of
CTI datasets need clear guidance on how and when it is legal to share such in-
formation. Thus, it is paramount for CTI analysts and managers to understand
4
A Albakri Chapter 1
and delineate the legal risks and manage them by proposing a model to make the
right decision of sharing.
However, evaluating the risk of sharing CTI datasets is challenging due to the
nature of the CTI context, which is associated with the evolution of the threat
landscape and new cyber attacks that are difficult to evaluate. Currently, a qual-
ified evaluation remains unavailable. CTI managers face a tricky situation when
deciding what to share, when, how and with whom. The scope of the challenge
requires a coherent model that can assess the CTI dataset before sharing.
In this chapter, we introduce the research aims, research questions and summarise
the major contributions and present the dissertation's outline.
1. Define and identify the risks of sharing cyber threat intelligence and
determine the associated threats.
5
A Albakri Chapter 1
3. How to assess the risk of sharing CTI datasets caused by sharing with dif-
ferent entities in various situations?
4. How to evaluate the legal requirements for supporting decision making when
sharing CTI?
1.3 Contribution
This thesis presents five main contributions by answering the research questions
in Section 1.2. The contributions of this thesis are the following:
2. A quantitative risk model to assess the risk of sharing CTI datasets enabled
by sharing with different entities in various situations
6
A Albakri Chapter 1
4. A set of guidelines for disciplined use of the STIX incident model in order
to reduce information security risk
5. A model for evaluating the legal requirements for supporting decision making
when sharing CTI, which also includes advice on the required protection level
7
A Albakri Chapter 1
• Chapter 5 answers the fourth and fifth research questions by defining the
impact that GDPR legal aspects may have on the sharing of CTI. In addi-
tion, it defines a flow diagram related to cybersecurity information sharing
and adequate protection levels for sharing CTI to ensure compliance with
the GDPR. It also presents a model for evaluating the legal requirements for
supporting decision making when sharing CTI, which also includes advice
on the required protection level. Finally, we evaluate the model through two
use cases of sharing CTI datasets between different entities and discuss the
results.
• Chapter 6 concludes the thesis and discusses outstanding research issues for
future research.
8
A Albakri Chapter 1
• Albakri, A., Boiten, E. and De Lemos, R., 2018, August. Risks of shar-
ing cyber incident information. In Proceedings of the 13th International
Conference on Availability, Reliability and Security (pp. 1-10).
Chapter 3 has been extended by presenting the full table analysis and expanding
to other sharing cyber threat information standards.
Chapter 4 is based on the work published in the following paper.
• Albakri A., Boiten E., Smith R.,2020. Risk Assessment of Sharing Cyber
Threat Intelligence. In: Boureanu I. et al. (eds) Computer Security. ES-
ORICS 2020. Lecture Notes in Computer Science, vol 12580. Springer,
Cham. https://doi.org/10.1007/978-3-030-66504-3_6
The Chapter has been extended by presenting additional experiments and evalu-
ating an open-source STIX dataset and an additional use case study.
Chapter 5 is based on the work published in the following paper.
• Albakri, A., Boiten, E. and De Lemos, R., 2019, June. Sharing Cyber Threat
Intelligence Under the General Data Protection Regulation. In Annual Pri-
vacy Forum (pp. 28-41), vol 11498. Springer, Cham. https://doi.org/10.1007/978-
3-030-21752-5_3
9
Chapter 2
This chapter serves as a background for the remainder of the thesis. We first look
at the concept of privacy, then we study privacy preserving techniques. After
that, we delve into cyber threat intelligence and its standards, challenges and
legal requirements. Finally, we discuss a research area related to risks of sharing
cyber threat information, risk assessments and sharing cyber threat intelligence
under laws and regulations.
10
A Albakri Chapter 2
is “the right of the individual to decide what information about himself should
be communicated to others and under what circumstances”. In this definition,
they give the user the right to control which information they want to share, with
whom and how. Moreover, in [25] Solove defines privacy as “a concept in disarray.
Nobody can articulate what it means. As one commentator has observed, privacy
suffers from an embarrassment of meanings.”. Solove has published a taxonomy of
privacy harm aiming to classify the harms that may occur from privacy violation.
Solove aimed at providing further understanding of privacy violations in various
contexts. The main harmful activities Solove included are the following:
Information collection: for example, surveillance which includes monitor-
ing all user’s activities, and interrogation consists of various forms of questioning
or probing for information. This can be seen when starting with network traffic
surveillance and then analysing network traffic. The analysis might lead to de-
termine what type of data is being exchanged, and the kind of network protocols
that are in use [26].
Information processing: In this group, there are activities related to how
data is stored, operated and used. Aggregation includes a grouping of different
pieces of data about an individual. Hence, identification is linking information
to a specific person. Insecurity includes the failure to protect the data against a
data breach or unauthorised access. A secondary use is when using the data for
a different purpose without the data subject’s consent. Finally, exclusion entails
preventing the data subject from knowing about the data others store about them.
Information dissemination: in this group, the main activities include breach
of confidentiality which is failing to keep the information confidential. Disclosure
includes revealing information that might make an impact on someone. Exposure
includes exposing specific physical and emotional characteristics about an indi-
vidual to others. Increased accessibility involves expanding the accessibility of
information. Blackmail is the activity of threatening someone to reveal a piece of
11
A Albakri Chapter 2
12
A Albakri Chapter 2
13
A Albakri Chapter 2
attacks [30] against a dataset where each person has a “secret bit” to be pro-
tected. Revealing “ordinary” information might be problematic if an individual
is following specific behaviour over time, such as buying bread and then stopped
this action. Therefore, an analyst might conclude that specific individual, in the
dataset, is trying to lose weight, and that could be harmful to that individual.
On the other hand, giving the analyst supervised access to the dataset will pro-
duce better research results. At the same time, the analyst must convince the
provider that the analysis is respectful of individual privacy. The differential pri-
vacy [29] approach also works in this context, potentially distorting the data to
provide privacy at a minimal expense to accuracy. Differential privacy offers a
way to protect all data subjects, even the outlier individuals in the dataset whose
privacy could be at risk due to various statistical attack types. This model allows
the data analyst to run queries adaptively, deciding the level of accuracy of the
answer. All these approaches will be discussed in more detail in the following
sections.
14
A Albakri Chapter 2
addresses or any personal information. After a short period, the New York Times
published one identified individual’s information from that data set. For exam-
ple, over three months period, the user No.4417749 conducted a series of queries
containing information such as “numb fingers” to “60 single men” to “dog that
urinates on everything.” After following the queries of this user, it became easier
to identify this person especially when geographical information has been stated
such as “landscapers in Lilburn, Ga,” and “homes sold in shadow lake subdivision
Gwinnett county Georgia.”. Ultimately, the researchers were able to identify the
user.
In another example, as part of the Netflix Prize contest, Netflix -the world’s
biggest streaming media service- publicly published a dataset consisting of movie
ratings of 500,000 Netflix customers. The prize was $1 million for developers who
can enhance the accuracy of the company’s current movie recommendations sys-
tem based on personal viewing and movie rating history [32]. The dataset was
planned to be anonymous, and all personally identifying information had been
extracted. In [33] they were able to identify users in the Netflix database by
using the Internet Movie Database (IMDb) from imdb.com as an external data
set. They proposed an algorithm that can be used against any dataset containing
anonymous individual records such as transaction and preferences. Therefore, re-
moving identifying information is not enough for anonymity and does not provide
enough privacy guarantee. Targeted re-identification could occur after a normal
conversation with a colleague at work about movies they watched, and their rating
of these movies. This action could put their privacy at risk. Researchers found
how much the attacker needs to know about a Netflix customer to identify their
record if it exists in the dataset.
15
A Albakri Chapter 2
16
A Albakri Chapter 2
and replacing the removed bits by zeros. We may use different algorithms based
on the type of data stored. For example, the Black marker algorithm [37] removes
and replaces any field with a fixed value, but like most anonymisation methods,
that will reduce the usefulness of the dataset. On the other hand, the Enumeration
algorithm [36] starts by sorting the data then selecting a value higher than the
first value and adds this value to all fields. This algorithm can be applied only
to numeric fields [36]. These algorithms come close to perturbation which we will
discuss in more detail in Section 2.3.2.
Generalizations: In this technique, we group values together. The idea is
based on converting the values of the attributes of a specific domain by a more
general value. For example, the value ‘31’ of attribute age would be replaced by
[25-35]. Some algorithms are designed for time, such as the random time shift
algorithm [36] that adds a random offset to the timestamp attribute. For example
[35], Table 1 shows the health information for the patients in the hospital. The
Table contains quasi identifier attributes (Zip code and Age), and the sensitive
attribute is the disease. Table 2 shows a 3-anonymity version derived from Table
1. The “*” refers to a suppressed value, for example, “age =2*” represents the
age between 20 and 29.
17
A Albakri Chapter 2
18
A Albakri Chapter 2
hospital and Alice wanted to discover what Bob’s disease was and she had ac-
cess to the 3-anonymous Table. She already knew that he is living in the zip code
36677 and is 28 years old and since all the patients have the same disease, which is
Heart Disease, she found out that Bob has Heart Disease as well. Accordingly, to
prevent Background knowledge and Homogeneity attack, an extended technique
has been proposed: l-diversity.
l-diversity [34] is an extension of k-anonymity to protect the published data
against Background Knowledge and Homogeneity Attack. l-diversity ensures that
not only all users are k-anonymous, but also each group of users shares a variety
of sensitive information. The variations of sensitive attributes ensure that all sen-
sitive attributes are adequately distributed to avoid attribute disclosure.
To achieve l-diversity one may need to insert dummy data to increase the variation
of sensitive information, hence, extracting useful information may be a big chal-
lenge. Also, l-diversity is subject to various types of attack that could cause an
attribute disclosure [35]. The first attack is Skewness attack. It is an attack when
the overall distribution is skewed and does not consider the overall distribution
of sensitive values. Let us assume that we have a single sensitive attribute with
two values, and the level of sensitivity is different between those values. Then the
classes present different levels of privacy risk.
The second attack is similarity attack when the values of the sensitive attribute in
an equivalence class are different but similar in a semantic way. In this case, an at-
tacker can infer important information about individuals. For example [35], Table
4 shows the 3-diverse version of the Table 3. Comparing with the 3-anonymous
Table Alice cannot know from the database the association of Bob’s record and
his sensitive attribute value. But if intruder knows that Bob's record related to
the first group, then we can infer that Bob has stomach-related disease and his
salary to some extent low.
As a result, it has been argued that it is challenging to achieve l-diversity, and even
19
A Albakri Chapter 2
20
A Albakri Chapter 2
the granularity and makes the distribution of the sensitive attribute in any equiv-
alence class close to the distribution of the entire attribute. In this approach,
researchers measured privacy based on the information gain of an observer. They
proposed the information gain as the difference between the observer expectation
of the sensitive attribute value of an individual before releasing the data and the
posterior expectation after releasing the data and seeing the value.
The novelty of this approach is mainly separating the information gain into two
parts: the first is about the full attribute values in the released data, and the
second is about specific individuals. The definition of t-closeness principle is “An
equivalence class is said to have t-closeness if the distance between the distribu-
tion of a sensitive attribute in this class and the distribution of the attribute in
the whole table is no more than a threshold t. A table is said to have t-closeness
if all equivalence classes have t-closeness” [35]. For example [35], Table 6 shows
0.167-closeness with regard to Salary and 0.278-closeness with regard to Disease
anonymisation derived from Table 5.
21
A Albakri Chapter 2
Table 6: Table that has 0.167-closeness with regard to Salary and 0.278-closeness
with regard to Disease
In this method, the limitation that requires the distribution of a sensitive at-
tribute in any equivalence class is close to the distribution of a sensitive attribute in
the overall table, which constitutes a challenge with multiple sensitive attributes.
Moreover, the relationship between the value t and information gain for measuring
the privacy is vague. From the above three mentioned techniques, we can see that
the anonymisation techniques preserve the consistency of the database, and all
the operations are at the record level. All the previous techniques try to reduce
information loss to make the released data more useful. To achieve this, many al-
gorithms focus on the information loss of the released dataset. Many information
loss metrics have been proposed, such as The Classification Metric (CM)[39], The
Discernibility Metric (DM) [40] and the Generalized Loss Metric [39]. Some of
these measures are suitable for specific data mining algorithms and cannot be used
for general applications. In addition, [41] proposed a framework to identify the
utility of attributes in a data set. Choosing the correct anonymisation methods
depends on the dataset and the types of attributes such as quasi-identifier and
22
A Albakri Chapter 2
sensitive attributes.
In [42] researchers proposed a cyber threat intelligence framework prototype. The
main component in that framework is an anonymisation tool. The purpose of
this tool is to anonymise attributes such as IP addresses, NI numbers and E-mail
addresses. They used specific regular expressions to extract identifiable informa-
tion from the dataset. They defined a set of anonymisation rules which can be
activated based on different anonymity level depending on the TLP protocol [43]
where they can be modified according to the organisation's requirements.
2.3.2 Perturbation
23
A Albakri Chapter 2
hid Income
8393 1360
3236 4243
7188 11163
9503 18145
9204 25149
2866 26310
5386 32538
8787 32600
6376 35300
7781 36099
3672 37228
5089 38001
hid Income
8393 5588.49
3236 5588.49
7188 5588.49
9503 23291.45
9204 23291.45
2866 23291.45
5386 33479.17
8787 33479.17
6376 33479.17
7781 37109.44
3672 37109.44
5089 37109.44
24
A Albakri Chapter 2
non-numerical data.
0
h i
∀ T ⊆Y, Pr [M (x, q) ∈T ] ≤ eε .Pr M x , q ∈T (1)
The value of ε is small; for example, ε= 0.1, a smaller ε provides better privacy.
In DP, we have a ‘privacy budget’ measure which is given based on the value
of ε. The value of ε will be increased after every query and when the budget
value is exceeded, the user will not have access to run any queries on the dataset.
Figure 1 illustrates how the steps for DP can be applied to big data [1]. In the
process, there is no direct access to the database that contains the original data,
but there is an intermediate layer called DP privacy guard between the analyst
and the database to preserve the privacy. The steps are:
1. The analyst is able to send a query to the database through the DP guard.
2. The guard checks the query the analyst wants to ask of the database and
measures the privacy consequences of this query combined with the queries
25
A Albakri Chapter 2
that have come before it. This evaluation relies on the sequence of the
queries the analyst asks without considering the real data in the database.
3. The guard sends the request to the database and receives the response with-
out noise.
4. The guard adds noise to the query based on the privacy risk and sends the
new result to the analyst.
To be able to apply the queries against the full database without the need
for the providers’ trust or understanding the analysis, a new programming lan-
guage for differential privacy, Privacy Integrated Queries (PINQ) [48], has been
proposed. The analysts write PINQ queries analogous to writing to the database
and get the aggregate result after the noise has been added to it.
26
A Albakri Chapter 2
The global approaches are based on homomorphic encryption and multiparty com-
putation that enforce privacy by design. The measure of achieving confidentiality
and privacy depends on adversary models. Cryptographic approaches can deal
with different adversarial models; there are two main types of adversary models:
honest-but-curious (HBC) and malicious adversary.
Honest-But-Curious (HBC) Adversary: In this model, all parties act exactly as
the honest party and follow the actions in the protocol. However, the adversaries
try to know more about the private information of other parties.
Malicious adversary: In this model, the adversaries may deviate arbitrarily from
the specified actions in the protocol such as sending distorted messages or ac-
tively colluding with other malicious parties to violate the privacy or integrity of
the others players’ private data.
Homomorphic encryption [49] is a type of encryption that allows computation
on encrypted data and generates an encrypted result that matches the result of
operations on original data. This will preserve the confidentiality and the privacy
of the data during the operations. There are many applications which can employ
Homomorphic encryption schemes such as cloud computing [50].
As a definition [51], consider a cryptosystem C has an encryption function, plain-
text Xn , and some operation 4.
C is considered additively homomorphic if:
Therefore, we can apply this definition to any other operation. There are two
forms of homomorphic encryption: Fully homomorphic encryption and partially
27
A Albakri Chapter 2
28
A Albakri Chapter 2
secret will be reconstructed only after combining enough parts together through a
trusted client. Extension schemes of the secret sharing have been developed, such
as multi-secret sharing [55] and verifiable multi-secret sharing [56] to improve the
existing protocols and provide solutions against attacks.
Secure multi-party computation (MPC) was introduced by [57]. In MPC each
party holds some private data, the parties will perform multi-party computation
on this private data and only the receiver can reconstruct the output. One of the
methodologies for secure MPC is secret sharing. There are many practical appli-
cations of MPC [58] such as collecting and analysing financial data for consortium
of information and communication technology companies [59].
In [60] researchers provide a comprehensive analysis of the mechanisms and chal-
lenges of privacy preservation in big data focusing on the infrastructure and all
big data life cycle stages such as generating, storing, and processing. In the data
generation phase, the main techniques restrict the access or distort the data. Some
techniques and tools could be used to distort the data such as MaskMe [61] for
hiding online identity.
We need to ensure that the data is secure against any disclosure threat in the
data storage phase in distributed environments. The techniques used in this phase
are encryption techniques and the current approaches are:
29
A Albakri Chapter 2
• Storage Path Encryption [65] In this scheme, the user will use a trapdoor
function to secure storage of data. In this approach, instead of encrypting
the big data itself we just encrypt the storage path.
Organisations might share private sets for union, intersection, and difference
operations. This sharing brings privacy risks such as disclosing organisations’ set
instead of getting the result of multiset operations. In [66], researchers proposed
a framework for privacy-preserving operations such as union, intersection and
element reduction. One practical example of set interaction problem is ‘do-not-
fly’ list that requires private intersection between the government's list and the
airline's passenger lists.
One definition of sticky policy is “machine readable policies that can stick to data
to define allowed usage and obligations as it travels across multiple parties en-
abling users to improve control over their personal information” [67]. In sticky
policy, an obligation management system in service providers manages informa-
tion lifecycle management depending on personal preferences and organisational
guidelines. In many situations, a company needs to reveal personal or maybe
sensitive information in order to get a specific service. However, achieving the
goal of sharing needs a mechanism to ensure addressing all policies. The main
characteristics of the sticky policy are: define the purpose of sharing, such as
research, use the data within a specific technical environment, define what they
can and cannot do with the data, define the retention policy and finally, define
the list of trusted authorities that can provide assurance and accountability in the
procedure of giving access to the protected data. Therefore, with these features,
30
A Albakri Chapter 2
the company will be able to define how their data should be processed, stored and
shared by defining their conditions explicitly.
The main advantages of sticky policy according to [68] are as follows: the data
owner can set and manage their preferences on their data before sharing them
with others because the policy transfers with the data, protecting the entire data
life cycle. The management of the policies and access control would be easier since
the third party will be responsible for supervising and managing policy enforce-
ment systems. Besides these advantages, there are shortcomings. It is difficult to
provide an adequate set of policies when data is coming from different domains,
with different formats and semantics. Consequently, it is challenging to develop
a standard. The computational cost of processing and transmitting the policies
among the data is high. The main shortcoming is that sender needs to trust re-
cipients to respect the sticky policy. In addition, to address one of the previous
challenges, many models have been proposed for defining sticky policies associated
with data sharing agreements [69] [70]. These models introduced a novel way to
represent sticky policy generically and structurally.
Privacy patterns are another way of ensuring privacy and providing practical
guidance for software engineering. To represent privacy concerns among different
parties we need to provide privacy patterns that might help to standardize lan-
guage for privacy preserving technologies, identify the standard solution to privacy
issues, and help designers to pinpoint and deal with privacy concerns [71] [72]. As
Alexander in 1977 wrote about patterns in general [73] “Each pattern describes
a problem which occurs over and over again in our environment, and then de-
scribes the core of the solution of the problem, in such way that you can use this
solution a million times over, without ever doing it the same way twice”. There
are different types of patterns related to different issues and privacy problems
introduced such as onion routing for anonymous communication, obligation man-
agement pattern and aggregation gateway pattern. The obligation management
31
A Albakri Chapter 2
32
A Albakri Chapter 2
33
A Albakri Chapter 2
assets that can be used to inform decisions regarding the subject’s response to
that menace or hazard” [84]. [85] proposed another definition of cyber threat in-
telligence derived from a definition of intelligence: “Intelligence is the collecting
and processing of that information about threats and their agents which is needed
by an organization for its policy and for security, the conduct of non-attributable
activities outside the organisation’s boundaries to facilitate the implementation
of policy, and the protection of both process and product, as well as persons and
organizations concerned with these, against unauthorized disclosure”. Thus, cyber
threat intelligence is defined as “Cyber Threat Intelligence is nothing more than
the application of intelligence principles and tradecraft to information security. Its
outcome is nothing different from traditional intelligence: to inform and empower
decision-making at all levels with knowledge of threats.” Another definition was
introduced by Lee [86] as “The process and product resulting from the interpre-
tation of raw data into information that meets a requirement as it relates to the
adversaries that have the intent, opportunity and capability to do harm.”
In summary, all definitions endeavour to delineate the purpose of threat intelli-
gence in different aspects. For the rest of this thesis, we will rely on McMillan’s
definition as it is the most comprehensive.
The intelligence process life cycle consists of the following six phases [87]: planning
and direction; collection; processing and exploitation; analysis and production;
dissemination and integration; and evaluation and feed-back.
Planning and direction: This phase includes various tasks such as the iden-
tification and prioritisation of intelligence requirements. It determines the goal
for collecting this intelligence, whether strategic, technical, tactical or operational
and defines a collection plan for the intelligence. Another task is designing an
34
A Albakri Chapter 2
35
A Albakri Chapter 2
result of the previous phase will be delivered to and consumed by the intelli-
gence consumer automatically or manually. These results include context on in-
dicators of compromise (IoCs), threat actors' tactics, techniques and procedures
(TTPs), prioritised and filters security alerts and threats, threat intelligence re-
ports, high-level business strategic reports. Intelligence reports intend to meet
decision makers' requirements at strategic, operational, tactical, and technical
levels. Dissemination and sharing cyber threat intelligence enhance defensive and
mitigation strategies for specific risks. Sharing helps the organisation to obtain
an efficient situational awareness process and helps cyber risk management.
Evaluation and feedback: This stage also provides continuous feedback about
the cyber threat intelligence lifecycle. The feedback determines the quality and
usability of the extracted intelligence by the consumer, avoiding requirement gaps,
which can be conspicuous once the intelligence report is generated.
36
A Albakri Chapter 2
37
A Albakri Chapter 2
which helps security engineers update endpoint devices and security controls with
IoCs feed. This information is shared via open source, public forums, and trusted
communities to stop similar attacks automatically.
There are several categories for information sharing architectures, which can be
divided into three main architectures based on [92]: centralized, peer to peer and
hybrid architecture. In a centralized architecture, the participants share CTI in-
formation with the hub where it acts as a repository for intelligence, and then
the hub distributes this data to other participants. Distributing this information
could be direct to all participants or after performing analysis operations such as
aggregation, correlation or adding more context via enrichment of this data. A key
advantage of using a centralised architecture is to reduce the cost because it uses
standard data formats and protocols. Since the data will go through the central
hub, less maintenance and operations are required as they need few connections.
The main downside of using this architecture is that a single point of failure might
cause a risk of delaying the exchange of information. Another downside is that
the centralized hub may raise the motivation for cyberattacks because of the cen-
tralisation of a huge amount of sensitive information. Finally, it is a single point
of trust, so all community members must trust the centralised hub, and they will
be affected in the case where the central hub is not able to keep the sensitive
information safe. For example, in the UK the NCSC [93] and its Cyber Security
Information Sharing Partnership (CiSP) [94] are considered a centralised informa-
tion sharing organisation. It is considered the authority for sharing knowledge,
threat intelligence and providing a guide on cybersecurity. Also, it provides ef-
fective cyber incident management and response against cyber threats to reduce
harm to the UK. This body is the interface and builds collaboration between gov-
ernment, industry, SMES and the public to guarantee that the UK is safe online.
In the peer to peer architecture model, members share information directly with
one another instead of sharing it with the central repository. This form would
38
A Albakri Chapter 2
be useful when sharing sensitive information about a specific attack with specific
peers. The main advantage of this model is that the information is exchanged
swiftly between the sharing community members, and there is no single point of
failure for targeted attack. On the other hand, if organisations do not support
standard formats and protocols, it would be hard for the participants to share ef-
fectively. Also, in this type, it is the member’s responsibility to enrich each event
and perform the analysis operations. Thus, the operations’ costs will increase
significantly as the number of members increases.
Finally, the hybrid form is a combination between the previous architecture types
where sharing decisions will be different based on the context and the situation.
For example, an organisation may share more Indicators of Compromise (IoCs)
such as hash files and IP addresses using peer-to-peer architecture, while using
centralized architecture for sharing enriched events.
There are various sources whereby information can be collected [95] [96] [97]. The
first type of sources is internal, such as system logs and network events. The sec-
ond type is known as Externally Sourced Observables or Feeds such as abuse.ch
[98], blocklist.de [99] and MISP (CIRCL) [100]. Also, organisations have open-
source intelligence (OSINT) as publicly available sources to be used during data
collection phase. An example of OSINT, social media which is one of the key
sources to trace information being shared by academics, professionals, or even
commercial organisations or research centres. Whois lookup identifies informa-
tion about the owner of a website, domain name and registers user details. For
example, threat actors may use a registered domain to start a social engineering
campaign. Other information could be collected through search engines, web ser-
vices and website analysis.
To achieve cooperative cyberdefense, organisations need to extract the feeds which
are related to the cyber-attack and find the correlations with the threat landscape
39
A Albakri Chapter 2
and make the process automated to help the decision-making process. Measur-
ing the feeds’ quality and using the proper system controls require a developed
standard to help various communities to share and automate cyber threat infor-
mation. At the same time, the standard should be human-readable as well as
machine automated. This will allow cyber threat intelligence analysts to get more
context. Also, it allows human processing and decision if needed.
Software vendors and researchers have started to develop and provide models and
platforms that help sharing this information, such as “Threat Intelligence Sharing
Platforms” (TIPs) [101]. These platforms provide automated support to infor-
mation sharing and associated analysis. TIPs support organisations to start the
processes of collection, processing, analysis, production, and finally dissemination
and integration of threat intelligence. There are two main types of TIPs: open
source and commercial. These types make it challenging for experts to select
which one to implement. Most platforms provide similar operations to aggregate,
analyse and support multiple data formats.
For example, Collective Intelligence Framework (CIF) [102] is an open source cy-
ber threat intelligence management system. CIF allows combining information
such as IP addresses and URLs from various sources and uses this information for
threat identification, detection and mitigation. GOSINT (Cisco, 2017) is an open
source threat intelligence platform that can be used for collecting, processing, and
analysing indicators of compromise (IoCs). Cisco CSIRT developed this platform
with the capabilities to parse indicators from different sources, such as Twitter.
It supports searching/sorting, editing and deleting indicators’ operations and the
ability to add and remove tags of an indicator.
Malware Information Sharing Platform (MISP) [103] is a free and an open source
threat intelligence platform for sharing cyber threat intelligence. It supports col-
lecting, storing and processing the relationship between IoCs of cyber-attacks and
financial fraud. MISP users created public and closed sharing communities. Each
40
A Albakri Chapter 2
community has different joining conditions [104]. In this platform, vendors cre-
ated specific formats to support CTI platforms and applications, MISP format is
a JSON format to exchange events between MISP instances [100].
Yeti [105] is another open source platform to categorise observables such as ge-
olocation IPs, IoCs, cyber attacker techniques, and information on threats in a
centralised repository. It collects observables from various sources such as MISP
instances, malware trackers XML feeds and JSON feeds. It provides web API
as an interface for machines and web interface for users to integrate with other
controls. This platform helps incident responders to skip the “Google the arti-
fact” step of incident response. Also, there are many open source platforms such
as MITRE’s Collaborative Research Into Threats (CRITs) [106] and Palo Alto’s
MineMeld [107].
There are commercial TIPs available in the market, such as Anomali Threat-
Stream [108] which supports threats detection by integrating the platform with
other security solutions. It collects cybersecurity information from various sources
such as commercial and open source intelligence providers, structured and unstruc-
tured feeds. IoCs can be sent automatically to other security controls for acting
and monitoring. EclecticIQ Platform [109] is also a commercial threat intelli-
gence platform. This platform supports STIX and TAXII standards and provides
analyst-friendly graphs with advanced search as well as the ability to integrate
with other threat intelligence providers.
Some additional commercial threat intelligence platforms are LookingGlass [110],
NC4 Soltra Edge [111], Micro Focus’ Threat Central [112], ThreatConnect [113],
ThreatQuotient ThreatQ platform [114] and TruSTAR threat intelligence ex-
change platform [115]. Both commercial and open source platforms have various
limitations [116] based on experts’ views, literature and feedback. Some of these
limitations are:
41
A Albakri Chapter 2
2. Recently, the focus was on building platforms and data formats and stan-
dards, but now it should be more about how to manage this information
and reduce the manual work which depends on the analyst [118] [117].
3. Most of the platforms look for tactical indicators and indicators of compro-
mise without always including the context. This limitation might prevent
the analysts and the recipients of the information completing the analysis.
4. Most of the existing platforms focus on the collection phase more than the
analysis, production and dissemination phases.
6. Level of trust is related to the users and the platform providers [101]. When
sharing cyber threat intelligence information the main trust relationships
are between the organisations and the platform provider, which handles
the shared information and prevents exposure of confidential information to
unauthorized participants. The organisations trust the rest of the partici-
pants, and how they will handle the shared data; the platform provider and
the participants trust the entity who shared data regarding the reliability
and credibility.
42
A Albakri Chapter 2
8. Most of the feeds do not record a level of confidence which is related to the
quality of the shared information. Therefore, there is a need to measure the
quality and confidence of the information from different aspects such as the
receiver, the sender and the sharing community [118].
9. Adding time to live property to the feeds is not provided by most of the
platforms. This property is critical because the result of this intelligence
would be used to prioritise the action during a specific time [118].
10. Threat intelligence platforms do not support a capability to help the or-
ganisation calculate the risk of the CTI dataset they are willing to share.
In Chapter 4, we propose a new risk assessment model to evaluate the risk
of sharing cyber threat intelligence. Our model could be implemented as a
component inside the CTI platform.
In order to build an effective exchange of cyber threat intelligence and use the data
correctly for automation, data formats and standards are needed. Many standards
have been proposed, and others are still under development for the automated ex-
change of cyber threat information. These include Cyber Observable eXpression
(CybOX™) [119], Structured Threat Information Expression (STIX™) [120], An
Open Framework for Sharing Threat Intelligence (OpenIOC), Incident Object De-
scription Exchange Format (IODEF) [13] and Automated eXchange of Indicator
Information (TAXII) [121] [101].
43
A Albakri Chapter 2
44
A Albakri Chapter 2
Figure 2: A STIX Package includes the STIX individual component data models
[2]
45
A Albakri Chapter 2
• Campaign: The STIX campaign data model describes one or more in-
stances of cyber threat actors observed via sets of incidents and TTP that
intend to attack the organisation.
46
A Albakri Chapter 2
• Report: The STIX Report data model gives context to the STIX compo-
nents. It consists of properties such as Title and Time.
The STIX versions 2 and 2.1 have been developed based on STIX 1.X but using
JSON instead of XML as a serialisation mechanism. It becomes more lightweight
and dynamic with proposing several new objects, such as ‘sighting’ and represent-
ing the relationship between the objects via a ‘relationship’ object that can be
utilised to link any STIX object [123].
STIX 2.1 represents the cyber threat intelligence by the following objects: STIX
Bundle Object and two main categories of objects, STIX Core Objects and STIX
Meta Objects. STIX Core Objects have three subtypes: STIX Domain Objects
(SDO), STIX Cyber-observable Objects (SCO) and STIX Relationship Objects
(SRO).
STIX Meta Objects contain two types: Language Content Objects and Marking
Definition Objects; STIX is a connected graph model consisting of nodes and
edges. SDO and SCO represent graph nodes, and SRO represents graph edges.
This graph-based language supports analysing related information and allows flex-
ible and agile representations of complex information of CTI.
Each STIX Domain Object consists of properties’ information and related infor-
mation. We can group some SDOs based on similarity into different categories.
For example, Attack Pattern, Infrastructure, and Malware represent types of tac-
tics, techniques, and procedures (TTPs). They describe behaviours and resources
that attackers use to perform their attacks and gain access.
For example, with the “Relationship” object we can define a relationship type
47
A Albakri Chapter 2
“uses” to represent the relationship between “Attack Pattern” object and “Tool”
object. The data in the example could describe a specific tool such as LOIC (Low
Orbit Ion Cannon) [124] which is used to create the behaviour identified in the
attack pattern such as a DDOS attack.
pattern
{
“type”: “relationship”,
“id”: “relationship–XX”,
“spec_version”: “2.1”,
“created”: “2020-03-06T10:14:22.231Z”,
“modified”: “2020-03-06T10:14:22.231Z”,
“relationship_type”: “uses”,
“source_ref”: “attack-pattern–05”,
“target_ref”: “tool–06”
}
To share STIX reports, we can use a standard called Trusted Automated Ex-
change of Intelligence Information (TAXII), which defines the technical specifi-
cation, supporting documentation and the requirements for transporting STIX
messages. TAXII is an application layer protocol used to exchange cyber threat
information over HTTPS. Thus, it is used to support various protocol bindings
and sharing data in various formats [121].
48
A Albakri Chapter 2
As a result, we can notice that there are various formats and standards used
to represent cyber threat intelligence. This adds challenges and limitations for
existing cyber threat intelligence platforms. Much work needs to be done in the
49
A Albakri Chapter 2
The researchers found that the best formats for an automated exchange of
cyber incidents information are STIX and STIX2. This result was based on the
fact that STIX 1.X and 2.X provide a clear and detailed data model with less
ambiguity. STIX introduces extension ability and easy automation because it is
machine-readable. Also, STIX is supported by various cybersecurity products,
services and user communities (MITRE, 2018). As a summary, this support helps
to enhance this standard to enable effective sharing of cyber threat information.
In this thesis, we will use STIX 1.X, which described thoroughly in Section 2.4.3.
50
A Albakri Chapter 2
There are barriers and challenges when sharing information for cyber intelligence
analysis. The most obvious of these is the risk associated with disclosing protec-
tive capabilities and sensitive information, such as identifiable information and
financial loss. Other barriers include business decisions [131], trust between users
and platforms providers, different privacy laws, as well as technical issues arising
from different platforms and standards [132] [133]. There is the challenge of col-
lecting, processing and managing CTI information, primarily when information is
collected from multiple sources in multiple formats, as we mentioned previously.
There are various formats and standards that provide free text fields which make
it hard to perform an automated analysis [116].
The human aspect is essential in CTI sharing. The process of sharing and dealing
with CTI data still needs human users in the process, especially in the identifica-
tion, remediation and prevention phases [134]. The prioritisation of threats and
how to evaluate them differ between organisations where each organisation has a
different opinion about the severity of threats [135]. Each organisation will have
different types of assets that need to be evaluated and prioritised based on the
possible risks and the potential impact on the organisation’s business. Thus, they
need to identify the possible threats and adversaries that may target them. It
is necessary to establish a win-win environment where all entities get the benefit
from sharing information and avoiding entities that do not cooperate but want
to get benefit from the others (“free-riders”). Also, in general, trust between the
sharing partners needs to be established, for example, when they are potential
competitors. A straightforward method of achieving this is to share information
via a trusted central authority such as CERT-UK or CISP in the UK. Industry
sector regulators could also be considered for this, but regulation may be a factor
that inhibits the sharing of information. Also, one of the critical challenges when
sharing cyber threat intelligence is to preserve the confidentiality of individuals
51
A Albakri Chapter 2
In order to identify the cybersecurity and privacy threats associated with sharing
CTI, we need to examine the existing cyber threat taxonomies literature.
In [136], the authors propose a taxonomy for cyber-physical threats where they
study the attack vectors and the impact on systems and users in the smart home
environment. In this taxonomy, they classified attack vector into five main cat-
egories, which are: communication medium, supply chain, side channel, sensory
channel and control software. Also, they classify the impact on systems into
physical impact such as incorrect actuation and cyber impact such as integrity.
Finally, they classify the impact on domestic life (DL) into four main categories,
which are: emotion regulation and coping, emotional, user experience and direct
consequences. In [137] they build a taxonomy used to classify organisational cy-
ber harms based on the risk and impact of cyber-attack. In this taxonomy, they
have classified harms into five broad types. First, the primary type is physical or
digital harms, such as ‘identity theft’ and ‘Pain’. Economic harms are the second
52
A Albakri Chapter 2
type, and it includes ‘Reduced profits’ and ‘Disrupted operations’. Also, psycho-
logical harms represent the third type and include ‘Confusion’ and ‘Discomfort’.
The fourth type is reputational harms, such as ‘Damage public perception’ and
‘Loss of key staff’. Finally, social or societal harms entail ‘Disruption in daily life
activities’.
There are other threat taxonomies for grouping threats, for example, Open Threat
Taxonomy [138]. Open Threat Taxonomy is an open source taxonomy. It is a well-
known description of levels of threats to information systems that organisations
may face. The mission of this project is “To maintain a free, community driven,
open source taxonomy of potential threats to information systems.”. The authors
defined the threats based on the following components: threat providers or agents,
threat actions, threat targets and threat consequences. Based on these compo-
nents, we can characterise the following attack: a threat source such as Lazarus
Group [139] performed a threat action such as distributed denial-of-service (DDoS)
which led to threat consequences such as availability, confidentiality and integrity.
This taxonomy consists of 75 threats action classified into four main categories of
threats that could affect the confidentiality, integrity, or availability of information
systems such as the following:
• Resource threats: threats to the resources that are required by the infor-
mation systems. These types of threats could lead to failures of information
systems due to disruption of resources such as water, fuel, electricity required
for operations.
53
A Albakri Chapter 2
As part of developing this taxonomy, there was an effort to rank the identified
threats and assign a score to each of the identified threat actions. This ranking
aims to help the organisation build their risk profile and select controls to stop a
specific threat.
The Taxonomy of Operational Cyber Security Risks (TOCSR) [140] was updated
in 2014 [141] to follow the security and privacy controls of the 4th version of NIST
SP 800-53 [142]. In addition, they tried to create a relationship with other risk
frameworks such as The Federal Information Security Management Act of 2002
(FISMA) [143] and OCTAVE [140]. This taxonomy defines operational cyberse-
curity risks as “operational risks to information and technology assets that have
consequences affecting the confidentiality, availability, and integrity of information
and information systems”. This taxonomy tries to identify and classify the sources
of cybersecurity risk into four main classes based on a business risk perception.
The four main classes consist of:
54
A Albakri Chapter 2
• Failed internal processes: internal business processes that affect the ca-
pacity to establish sustainable cybersecurity, such as process design, mod-
elling, execution and monitoring.
In this taxonomy, each class consists of subclasses which are described by prop-
erties.
Finally, there is the threat taxonomy list from the European Union Agency for
Network and Information Security ENISA [116]. ENISA is a centre of expertise in
Europe launched in 2004 and based in Greece. It provides support to the European
Union (EU) member states and organisations to help better manage cyber risks
and meet security policies and regulations. In [144] Launius conducted a review
on some of the existing threat taxonomies and found that ENISA’s taxonomy
contains the most threat actions in the study. Also, ENISA’s taxonomy is second
in clarity for threat terms and events that are classified under the right class.
As a result, ENISA’s taxonomy has a high score in the ability to characterise all
potential threats to organisations. Thus, in our systematic analysis that follows
in Chapter 3, we will use the ENISA threat taxonomy [145] for categorising the
threats. ENISA’s taxonomy consists of three levels. The top-level categories of
this taxonomy include:
• Failures/ Malfunction
55
A Albakri Chapter 2
• Outages
• Legal
The next two levels of the taxonomy are threats and threat details. These
detailed levels make it one of the most comprehensive threat taxonomies. This
taxonomy focuses on perpetrators’ actions that can harm or disrupt information
systems and places them into a high-level threat category. This taxonomy defines
legal threats in one of the high levels of threat categories. It includes threats of
legal or financial penalty due to violation of the law, illegal use of data and court
orders.
Cyber information sharing takes place in a legal context which means we have
to consider different laws and regulations from different countries. Laws and
regulations may both encourage and inhibit aspects of cyber information shar-
ing depending on the country [146]. We get more effective results from sharing
CTI data when more participants are involved in the process; thus, encouraging
sharing would be ideal if it can be implemented in laws and regulations. In the
USA, an executive order, called Improving Critical Infrastructure Cybersecurity,
was signed by president Barack Obama [147]. The main goal of this order was
to enhance security and resiliency of critical infrastructure by improving the col-
laboration among federal agencies and voluntary private owners and operators
of critical infrastructure. Subsequently, a guideline about sharing cyber threat
information which describes what, when, and how to share cyber-threat infor-
mation was introduced under the Federal Information Security Management Act
56
A Albakri Chapter 2
(FISMA) [148]. This guideline uses FISMA as a legal baseline of sharing cyber
threat information. In the EU, the main relevant laws in this context are the
General Data Protection Regulation (GDPR), and the Directive on Security of
Network and Information Systems (NIS Directive), both enforced as of May 2018.
• Data Subject (Article 4 (1)): a natural person about whom data is being
collected or processed.
57
A Albakri Chapter 2
• Supervisory Authority (Article 51): each member state shall provide one or
more independent public authorities to be responsible for monitoring the
application of GDPR. The goal is to protect the fundamental rights and
freedoms of natural per-sons in relation to processing and to facilitate the
free flow of personal data within the Union. This role has in most cases
been assigned to existing data protection authorities such as the ICO [151]
in the UK.
Any processing of personal data needs legal grounds. Article 6(1) of the GDPR
defines the possible legal grounds for data processing as follows:
• After consent of the data subject for one or more specific purposes.
58
A Albakri Chapter 2
• Processing is necessary for the performance of a task carried out in the public
interest or in the exercise of official authority vested in the controller.
The legitimate interest “must be real and not too vague”. Recital 49 legitimises
the processing of personal data for cyber incident sharing, by admitting a legiti-
mate interest for “processing for the purpose of ensuring network and information
security, including preventing unauthorised access to electronic communications
networks, and stopping damage to computer and electronic communication sys-
tems”. Other legitimate interests are processing in order to identify and prevent
fraud (Recital 47), or the transmission of personal data within a group of under-
takings for internal administrative purposes, including client and employee data
(Recital 48) – both these purposes are also related to cyber intelligence sharing.
Article (33) introduces mandatory personal data breach notification to the rele-
vant supervisory authority, and to the data subjects when a data breach could
cause individual harm. Such notification can also be viewed as a mandatory
form of cyber incident information sharing. The GDPR is a main driver for im-
proving cyber security in Europe, as it asks to “implement appropriate technical
and organisational measures to ensure a level of security appropriate to the risk”
(Art.32). Such measures might include intelligence sharing – but what if there
is personal data contained in that? Figure 4 shows some indicative categories of
data in cyber incident reports which are more and less likely to contain personal
59
A Albakri Chapter 2
data under the GDPR (these are “properties” of the STIX 1.2 incident model,
which will be explained in more detail in Chapter 3).
Talking about the collection and use of personal data, requires transparency
in the systems which offers the user the ability to confirm that the information
is accurate. Transparency is considered the key to data protection requirements,
Articles 12-14 GDPR. Transparency ensures the availability of data before, during
and after the processing and can be reconstructed at any time. Thus, transparency
should cover what will happen to the data after the processing takes place. Also,
transparency is associated with accountability which includes clear documentation
covering the source code, technology, responsibilities, privacy policies, notification
and the communication with the data subject [116].
60
A Albakri Chapter 2
61
A Albakri Chapter 2
the costs. In a survey with 111 operators of essential services that implemented
the NIS Regulation, 61% of operators confirm that the process of recovery from
cybersecurity incidents has improved.
The EU Cybersecurity Act
The EU Cybersecurity Act [155] brings new tasks and a permanent mandate for
the European Network and Information Systems Agency (ENISA). It improves
the EU’s ability to respond to cyber-attacks by strengthening ENISA to scale up
the collaboration in cybersecurity among EU Member States and EU institutions,
agencies, and bodies. The main tasks of ENISA include developing cybersecurity
policy for critical infrastructure identified by the NIS directive, such as energy,
telecom, and finance. Also, it supports the network of Computer Security Incident
Response Teams (CSIRTs) at the EU level in cybersecurity operations and how
to handle incidents. It will be providing analysis and technical reports to become
the primary source for cybersecurity information from the EU Institutions and
bodies. Furthermore, it helps the EU Member states to improve skills and profi-
ciency. Finally, ENISA will conduct marketing related activities within the new
Cybersecurity Certification Framework. In addition, the EU cybersecurity act
introduces an EU framework for cybersecurity certification. This framework aims
to increase trust by providing the technical requirements and rules to evaluate
and certify companies’ products, processes and services across the EU.
62
A Albakri Chapter 2
steps in the risk management process [157]. The risk management process has
four steps. The framing risk step involves creating a risk context based on the
organisation. During this step, the organisation tries to define organisational risk
frame and risk assessment methodology. The second step is assessing risk. This
step involves evaluating the risk within the context of the organisation risk frame
as defined in the previous step. This step aims to identify threats and vulnerabil-
ities, then to estimate the severity of those threats and finally, the likelihood of
threat facing the organisation. The third step is response to a specific risk which
includes selecting the right courses of action to align with the organisational risk
tolerance and implementing risk response plan based on the chosen course of ac-
tion. Finally, the last step of the risk management process is monitoring risk. This
step involves the follow-up of the current effectiveness of risk responses based on
the identified risks. The threat landscape is evolving and at any point that might
change the acceptable risk of the organisation. The organisation needs to evaluate
the risk responses against threat landscape, business missions and processes and
supply chain.
The risk assessment component is the most critical part of the risk management
process. There are outstanding risk assessment methodologies to determine the
level of risk of security threat, including NIST SP 800-30 [157] which is a frame-
work to help conduct risk assessments of critical infrastructure systems and or-
ganizations. This framework allows senior management to select the course of
action in response to specific threats. In the NIST framework, the risk assessment
methodology consists of four phases. It starts with the risk assessment process,
which explains the process of evaluating information security risk. The second
step is risk modelling which describes risk factors such as threat source, threat
event, severity and likelihood, and defines the relationship between the factors.
The third step is the assessment approach, such as quantitative or qualitative
assessment. Finally, the analysis approach is determined by the organisation to
63
A Albakri Chapter 2
decide how to combine and analyse threat factors. The analysis approach can be
classified as threat-oriented, asset/impact-oriented or vulnerability-oriented.
Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE)
[158] focus on identifying vulnerabilities that exist in the organization’s structure
and implementing security strategies and plans. The OCTAVE methodology con-
sists of three phases. The first phase includes prioritising the existing assets in
the organisation based on threat profiles. The second phase includes identifying
the security level of the critical assets based on the organisations’ infrastructure
vulnerabilities and possible attacks. Finally, the third phase includes the result of
selecting critical assets and identifying their threat profiles. An evaluation of the
associated risk of each critical asset is conducted to respond against any possible
risk.
Privacy risk assessment is quite close to security risk assessment. In the NIST,
privacy risk assessment is defined as “A privacy risk management sub-process
for identifying and evaluating specific privacy risks”. Privacy risks are linked to
privacy events related to data processing. A privacy event is defined as “The oc-
currence or potential occurrence of problematic data actions”. Moreover, a Data
Action is defined as “system/product/service data life cycle operation, including,
but not limited to collection, retention, logging, generation, transformation, use,
disclosure, sharing, transmission, and disposal”. Data processing is defined as
“The collective set of data actions”. Thus, both cybersecurity and privacy risk
assessment are connected when cybersecurity incidents are occurring from privacy
events [3]. For example, consider installing smart meters and smart appliances as
part of the Smart Grid. The ability of these meters to collect, process and manage
detailed information about energy use can be conducive to identifying details of
individuals’ behaviours and their daily life inside their houses. The smart meters
are working as planned, but the data processing could suggest that people are
under surveillance. Figure 5 demonstrates the overlap and relationship between
64
A Albakri Chapter 2
privacy risks and cybersecurity risks. NIST defines three privacy risks factors to
be assessed and combined to get the risk score. The three risks factors are prob-
lematic data action, likelihood and impact.
Also, there is a relationship between privacy risks and organisational risks by link-
ing the privacy risks to the defined organisational impacts such as legal penalties
and loss of reputation. Besides, the privacy risk assessment involves additional
contextual descriptions and an extra level of granularity of the risk level. Using
the defined level as a result of the privacy risk assessment may give an impression
that it is always a fuzzy decision; thus the result will not be as informative as
necessary [159]. Even in the development life cycle, we can find security by design
integrated in every step to build secure software. This integration will prevent
an attacker from exploiting design flaw. In this aspect, threat modelling such as
STRIDE [160] plays a significant part in finding system security threats.
To build a privacy threat modelling that could be associated with the software
life cycle, [161] proposed the LINDDUN methodology, inspired by STRIDE, to
65
A Albakri Chapter 2
66
A Albakri Chapter 2
Finally, semi-quantitative risk assessment combines rules and methods for eval-
uating the risk based on numeric values and levels. For example, the range be-
tween 1 and 10 can easily be converted into qualitative expressions that help
risk communications for decision makers, for example, a score of 10 can be rep-
resented as extremely high. The role of expert judgment in assigning values in
the semi-quantitative risk is more evident than in a purely quantitative approach.
Moreover, if the scales or sets of levels provide sufficient granularity, relative prior-
itization among results is better supported than in a purely qualitative approach.
In this type of risk assessment, all ranges and values need to be explained and
defined by clear description and examples. Semi-quantitative assessments use var-
ious methods or rules for evaluating risk based on levels, scales or numeric values
that are meaningful in the context. For example, a score of 90 for a CTI dataset
can represent a very high risk. The role of experts’ judgment still exists and, as
with qualitative and quantitative models, each numeric value and range needs to
be defined and explained.
In [148] Johnson looked at the threat associated with sharing sensitive informa-
tion and financial transition between financial firms. Johnson identifies the con-
nections between the number of leaked documents and the number of threats and
vulnerabilities. In that study, they focus on peer-to-peer-file-sharing networks,
especially between employees, therefore reducing the risk of the disclosure which
reduces the possible threat activity arising from exploiting the leaks. Accidental
disclosure of sensitive information represents one of the primary information risks
67
A Albakri Chapter 2
against businesses. Existing tools and technologies for sharing sensitive informa-
tion create various security risks for these businesses.
Threat actors can apply queries that can be used to extract data from the or-
ganisation’s files. For example, they might find that John, an employee, is using
Microsoft Office 2007 to create sensitive files which leaked accidentally. This risk
could be reduced by proposing file-name conventions and enforcing its new poli-
cies. However, they discussed only the risks to business without covering the
cybersecurity threats that might occur as a result of that sharing.
The authors in [162] focus their study on two factors: the willingness to share
cyber threat information and the usage of cyber threat information. They used a
survey to assess the privacy risk via ordinal probability range and nominal data
types in factorial vignettes survey. They classified the data into different cate-
gories which are high-usage, low-risk versus those that are low-usage, high-risk.
The list of properties includes Passwords, Usernames, Keylogging data, E-mails,
Chat history, operation system information and other properties. For example,
there is a high usage of the “Usernames” property in the CTI dataset, but on
the other hand, organisations are unwilling to share this property with others
due to the high risk level score. However, organisations are more willing to share
information such as IP addresses and specific network information. Even for less
sensitive information, cybersecurity professionals are asking for applying secure
sharing procedures, such as access control and encryption. For sensitive informa-
tion, cybersecurity professionals stress using data minimisation techniques, such
as anonymisation.
In this study, the authors only classified a specific list of properties that might
exist in the CTI reports based on the level of willingness to share. There was not
a precise analysis of the associated threats of sharing properties that the experts
are less willing to share.
68
A Albakri Chapter 2
Much work has been done in defining principles for sharing cybersecurity infor-
mation. [163] defined three principles that indicate sharing security information
within and between organisations; the principles are Least Disclosure, Qualitative
Evaluation, and Forward Progress. The purpose of these principles is to decrease
privacy risks in cybersecurity data sharing.
First, the Principle of Least Disclosure: sharing the minimum information within
or between organisations to reduce the risk of sharing. The associated corollaries
to this principle are internal disclosure in the collection phase, privacy balance to
choose carefully the trade-off between utility and privacy, and the final corollary
is Inquiry-Specific Release by giving the access to the minimal amount of infor-
mation based on approved specific uses. There are many approaches to achieve
this principle, such as anonymisation and Minimal Requisite Fidelity. In [69] re-
searchers proposed a model for collaborative information analysis systems. The
model addresses the trade-off between privacy leakage and utility to address pri-
vacy concerns. This model aims to select the best privacy preserving techniques
to optimise the trade-off between privacy and utility.
Second, the Principle of Qualitative Evaluation combines technical and legal con-
straints. Without implementing legal constraints, we might not be able to share
information, and at the same time, we cannot rely only on technical methods for
applying privacy. In [164], the authors propose a model for sharing cybersecu-
rity information by using a ledger model to store all transactions about sharing
CTI datasets and smart contracts by using blockchain technology to enable se-
cure sharing and collaboration. However, it focuses only on a specific issue about
sharing CTI datasets. Finally, the Principle of Forward Progress stipulates that
organisations should not stop sharing information under the pretext of legal re-
quirements or safety reasons because that will prevent the benefit of sharing and
finding solutions quickly.
The most relevant work targeting privacy preserving techniques for cybersecurity
69
A Albakri Chapter 2
In [166], the authors addressed the types of information that could be shared be-
tween SMEs while addressing the risk of disclosure cyber-attack scenarios. How-
ever, the study was limited to SMEs and a small size sample with specific security
metrics which could be different in various business scenarios. In our work, we
evaluate the risk and propose a more general model, not related to specific busi-
nesses, for evaluating the risk of sharing CTI datasets.
In [167] the authors proposed a cybersecurity risk model using a Bayesian net-
work model for the nuclear reactor protection system (RPS), they then apply the
analytical result to an event tree model. In their model, they only focused on four
cyber threats and six mitigation measures for the design specification of an RPS.
This evaluation was only on the network layers and did not cover other types of
possible threats.
In [168] the authors proposed a quantitative asset and vulnerability centric cyber
security risk assessment methodology for IT systems. They defined and extended
metrics based on Common Vulnerability Scoring System (CVSS) and presented
70
A Albakri Chapter 2
a formula for computation and aggregation. The work focused only on the Com-
mon Vulnerabilities and Exposures (CVEs) without considering the impact of
other factors. Also, the calculation was based on the defined CVSS list without
including zero-day attacks. The model did not consider the threat actor and the
attack vector, as the focus was only on the individual asset and the vulnerabilities
of the assets in the the system design. They proposed a base risk assessment
model and an attack graph-based risk assessment model.
In [169] the authors propose a model to evaluate the correlation between disclosed
security risk factors and future security breach announcements reported in the me-
dia. They used text-mining on the reports to enhance and enrich the classification
method. The results show that including mitigation steps to the disclosed secu-
rity risk factors would minimise the likelihood of future breach announcements.
They investigate how the market infers the context of security risk factors in an-
nual reports. Therefore, they develop a decision tree model, which categorises
the occurrence of future security breaches according to the textual contents of
the disclosed security risk factors. They claim the model can accurately associate
disclosure features with breach announcements about 77% of the time. The re-
sults indicate that the disclosed security risk factors with risk-mitigation themes
are unlikely to be pertinent to future breach announcements. They also examine
how the market analyses the nature of information security risk factors in annual
reports.
In this work, they focused on the disclosed reports of the firms. Also, there was a
limitation in the number of these reports, in addition to focusing on the market
and financial response without going into the details of cybersecurity risks.
In [21] the authors looked at the effect of sharing vulnerabilities of an ICT system
on responsible market and disclosure policies. They found that sharing vulnerabil-
ities with the public would immediately increase the probability of cyber-attack.
This sharing gives a road map to attackers to gain access and attack the systems.
71
A Albakri Chapter 2
This work only considered the disclosure of vulnerabilities which is a part of CTI
datasets. Also, they did not investigate legal or technical threats.
In [170], researchers propose a risk assessment and optimisation model to extend
the standard risk assessment process and find the balance between the existing
network vulnerabilities and financial investments.
In [171] the authors proposed an architecture to compute a privacy risk value of
cyber threat information extracted from a STIX report. They build a survey to
collect data by using factorial vignette and multi-level modelling.
However, these methods do not adopt a quantitative approach for risk evaluation
when sharing CTI datasets, such as the one presented in this thesis. In this the-
sis, I propose a new model to compute risk by identifying threats, severity and
probability of sharing CTI information, which will be described in more detail in
Chapter 4.
Many papers have addressed issues related to terms and rules extracted from reg-
ulations and policies for protecting personal data. In [172] the authors converted
the precursor of the GDPR, the 1995 EU Data Protection Directive [173] into ex-
ecutable rules to support access control policies. The authors presented a system
to automate legal access control policy to make an automated decision concerning
authorization rights and obligations based on the related legal requirements. In
[174] the authors developed a specialised tool for privacy control based on the
GDPR to share sensitive research datasets. They used DataTags to categorise
datasets. Data-Tags assign a label to a dataset. Each DataTag may contain
human-readable and machine-actionable rules. Thus, a dataset will be assigned
to a specific label after conducting a series of questions based on defined assertions
72
A Albakri Chapter 2
within a particular context. After assigning a label to the dataset, it will be possi-
ble to apply the associated machine-actionable actions to the dataset or building
a custom data sharing agreement to be compliant with the human-readable rules.
Thus, the authors defined the security measures of the data tags levels based on
the DANS EASY repository [175]. The authors focused on datasets managed by
researchers in a general context. In [176] [177] the authors extracted data access
rights from a legal test of the US Health Insurance Portability and Accountability
Act (HIPAA). They used an ontology to classify legal rules of privacy require-
ments from regulations to give a decision to grant or deny the access right.
In [178], researchers proposed a privacy by design solution to exchange cybersecu-
rity incident information between CSIRTs. This solution focused only on sharing
information between closed user circles such as the CSIRTs. The authors aimed
to illustrate the legal requirements about sharing CTI datasets which contain per-
sonal information between the CSIRTs without giving a systematic way to help
the CTI datasets manager to check the legality of sharing such information.
Previous research findings into the legal grounds of sharing CTI information un-
der the GDPR have been inconsistent [179] in comparison with our justification
for sharing CTI. They examined sharing under the legal bases of legitimate inter-
est or public interest. As argued by the researcher, they claimed that the legal
grounds for sharing CTI can be justified under the public interest under Article 6
(1)(e) of the GDPR, and the notification requirements of Article 14 makes relying
on the legitimate interest unjustifiable. A full discussion regarding the possible
requirements for sharing CTI datasets will be described in more detail in Chapter
5. In our work, we aim to build a set of sharing requirements that CTI datasets
managers will check to provide a decision about sharing CTI dataset(s) under the
GDPR.
73
Chapter 3
In this chapter1 , we present a specific and granular analysis of the risks in cy-
ber incident information sharing, looking in detail at what information may be
contained in incident reports and which specific risks are associated with its dis-
closure. We use the STIX incident model as indicative of the types of information
that might be reported. For each data field included, we identify and evaluate the
threats associated with its disclosure, including the extent to which it identifies
organisations and individuals. The main outcome of this analysis is a detailed
understanding of which information in cyber incident reports requires protection,
against specific threats with assessed severity. A secondary outcome of the anal-
ysis is a set of guidelines for disciplined use of the STIX incident model in order
to reduce information security risk. This chapter is divided into the following
sections. Section 3.1 describes the methods used for threat analysis. Section
3.2 discusses the analysis of disclosing cybersecurity incident information in the
STIX incident model with its key findings. Section 3.3 provides an evaluation of
1
This chapter is based on the conference paper “Risks of Sharing Cyber Incident Information”
[180]
74
A Albakri Chapter 3
other standards of sharing cyber threat information. Section 3.4 summarises this
chapter.
For categorizing sensitivity of data items in cyber incident reports, we use com-
mon characterizations from the literature on anonymisation and de-identification
methods [34] [35] which have been described in more detail in Section 2.3. The
attributes’ types are [181]:
75
A Albakri Chapter 3
In our systematic analysis that follows, we will use the threat taxonomy from
ENISA [145] for categorizing the threats and breaking down attacks in terms of
how they accomplished. This and alternative choices were described in Section
2.4.5. The high level categories of this taxonomy are:
• Outages: this category covers threats that rely on losing resources such as
loss of electric power to operate an IT infrastructure. The reason is likely
an external factor such as large loss of power in an area due to a fault in
underground power cables.
76
A Albakri Chapter 3
do not need to install extra tools on the victim machine, for example man
in the middle/session hijacking attack.
• Nefarious Activity/ Abuse: this category covers threats that need additional
steps by installing tools or software on the victim’s machine such as protocol
exploitation or spoofing.
• Legal: this category covers threats related to legal or financial penalty caused
by the existing legislation.
We have considered only threats relevant and associated with disclosing cyber
incident information. Table 9 shows the list of threats. We will use the value
of the ID column instead of the threat column’s value in the analysis and the
mapping tables.
In traditional risk assessment, risks are evaluated for impact and likelihood. The
latter is particularly problematic for risks that require action by an attacker to
materialise: we would need to find out how likely it is that some attacker will be
motivated to exploit a given weakness. To avoid having to guess that motivation,
we assess exposure: how easy would it be for a motivated attacker to exploit,
and what prejudicial effects might be caused? This approach is taken for privacy
risk in the standard for privacy risk management by the French data protection
authority CNIL (Commission Nationale de l’Informatique et des Libertés) [182].
We have generalized this to apply to cyber security risks as well. For privacy
risks, the exploitability depends on how easy it would be to identify a specific
77
A Albakri Chapter 3
ID Threat
T1 Social Engineering
T2 Loss of (integrity of) sensitive information
T3 Failure to meet contractual requirements
T4 Violation of laws or regulations / Breach of legislation
T5 Compromising confidential information (data breaches)
T6 Failure of business processes
T7 Identity theft (Identity Fraud/ Account)
T8 Unauthorized activities
T9 Targeted attacks (APTs etc.)
T10 Unauthorized physical access / Unauthorised entry to premises
T11 Terrorists attack
T12 Loss of reputation
T13 Manipulation of information
T14 Misuse of information/ information systems
T15 Judiciary decisions/court orders
T16 Man in the middle/ Session hijacking
T17 Generation and use of rogue certificates
T18 Abuse of authorizations
T19 Information leakage/sharing due to human error
T20 Start or Failure or disruption of main supply
T21 Failure or disruption of service providers (supply chain)
T22 Denial of service
T23 Malicious code/ software/ activity
T24 Abuse of Information Leakage
T25 Abuse of vulnerabilities, 0-day vulnerabilities
T26 Brute force
Table 9: Threat List
individual, i.e. the level of identification. Table 10 shows the description of the
scores for this on a 1-4 scale, as taken from [182].
The prejudicial effects value of each threat is also scored on a 1-4 scale as given
in [182]. Table 11 describes this.
Finally, The CNIL standard [182] computes the severity value by adding the
78
A Albakri Chapter 3
79
A Albakri Chapter 3
80
A Albakri Chapter 3
Property Value
TTP Malware Type Capture Stored Data, Remote Access Trojan
Indicator Name File hash for malicious malware
Indicator Description This file hash indicates that a sample of malware alpha is present.
Hashes.‘SHA-256’= ‘ef537f25c895bfa7jfdhfjns73748hdfjkk5d89fjfer8fjkdndkjn7yfb6c’
Indicator Value Windows-registry-key:=
“HKEY_LOCAL_MACHINE\\SYSTEM\\ControlSet001\\Services\\MSADL3”
CVE-2009-3129, CVE-2008-4250, CVE-2010-3333,
Vulnerability
CVE-2012-0158, CVE-2011-3544
Incident associated with CyberA campaign. The malware was designed to steal
Incident Title
encrypted files - and was even able to recover files that had been deleted.
Date 2012-01-01T00:00:00
Reporter Name Alex John
Reporter Email Address [email protected]
Reporter Address US-LA
Victim Name CyberA / The CEO Device
Victim sector Financial sector
Victim Device IP address: 146.227.239.19
Victim Email Address [email protected] / [email protected]
Victim Address CyberA Ltd, IT Department, LONDON, W5 5YZ
Affected Assets Type Desktop, Mobile phone, Router, Server, Person
Confidentiality (Classified, Internal, Credentials, Secrets, System)
Affected Assets Property
Integrity (Software installation, Modify configuration, Alter behaviour)
Incident Status Not solved
Total loss £ 65,000
The Complex Type column indicates that the property’s type is a composite of
other types. Therefore, its analysis may be derived from that of the component
types.
The Include Free Text column indicates that the property or one of its con-
stituents is a free text field. In principle, any information could be exposed through
such an unconstrained field. Taking this to an extreme would trivialize our anal-
ysis: most of the information contained in an incident report would be potentially
81
A Albakri Chapter 3
82
A Albakri Chapter 3
The Identification column indicates whether the property could identify an indi-
vidual or the organisation. For each property, we provide an identification value,
which will be one of the following:
• Quasi Identifier (QI): the information could be linked with other information
or an external source to re-identify an individual or the organisation.
For identifying personal information that refers to individuals rather than or-
ganisations, we have added a Personal information column to indicate that the
disclosure of the property could reveal personal information. This is also an indi-
cation of a possible privacy risks and consequently a data breach based on legal
risks.
The Threat column indicates the possible threats when revealing information
associated with the property, based on the property description, sub-properties in
case it is complex, and the actual information.
The severity of the threats is given in the PS and CSS columns with scores
assigned as described in Section 3.1.3. In fact in the table we include the original
scores as e.g. 2+3 for exploitability and impact without translating to the 1-4 scale
as per Table 12. The goal of this exercise is to identify potential threats when
sharing incident information, to provide an explanation what the sensitivity and
identifiability are, and ultimately to address the potential threat when disclosing
information associated with properties of the STIX incident model.
83
A Albakri Chapter 3
The full analysis is given in Table 3.4 at the end of this chapter. Here we ex-
plain a sample of this information in full detail. Table 15 gives an example of
some properties in the IncidentType class of the STIX incident model. Cells in
columns “Complex Type” (CT), “Include Free Text” (IFT) , “Sensitivity” (S),
“Identification” (I), “Personal Information” (PI), “Justification” and “Threat”
represent our analysis. The values in the column “Property” are summarized
from the STIX incident model. This table gives grounds behind our analysis of
properties. Some properties have only a cybersecurity severity value such as “Se-
curity_Compromise”, and some properties have both privacy and cybersecurity
severity values, such as “COA_Requested”, which contains identifiable informa-
tion for the source of information, in addition to the sensitive information about
the system and the infrastructure as well. We explain the values of PS and CCS
for the following properties:
“Description” property: It is a free text field to describe the incident. It is
not unlikely that the reporter will include critical information in this field, which
could contain cybersecurity and identifiable information. The PS value is 2+2, as
the level of identification is 2: it is possible to identify individuals with difficulty.
The second value is the prejudicial effect which is also 2 due to the possible disclo-
sure of the identity without further information. Similarly, the CSS is scored as
2+2 by assigning 2 as the difficulty of exploitation (any vulnerabilities are likely
described at a very high level in this field), and 2 as the prejudical effects due to
the problem of the data breach.
84
A Albakri Chapter 3
We have computed the severity values for each property of the STIX incident
model based on the method proposed in Section 3.1.3. In particular, Figure 6
shows the cybersecurity severity results for the first level properties of the STIX
incident model. Figure 7 shows the privacy severity results for the first level
properties of the STIX incident model.
At first glance it may be surprising that Prejudicial Effects never achieve the
highest score 4, for irrecoverable damage. Our explanation for this is rather differ-
ent between the two dimensions. In the privacy dimension, this is an impact of the
particular context of cyber incident reporting. Personal data never plays a central
85
A Albakri Chapter 3
role in this, and there is no sensitive personal data involved in this scenario at all.
Thus, any privacy risks will be limited. For the cyber security dimension, it is due
to the nature of cyber security itself. It is extremely rare for a successful cyber
attack, particularly in a critical infrastructure context, to exploit only a single
vulnerability. Conversely, exploiting a single vulnerability is always unlikely to
lead to irrecoverable damage by itself.
This suggests an extension to our analysis per property is necessary. For a
full awareness of overall risks, we need to look at combinations of properties that
together provide a feasible composite attack threat. Although this is in theory
unfeasible (nearly 2123 combinations of the 123 different properties), it can be
triaged by focusing on known effective combinations of types of threats and the
most severe individual threats. As an illustration, we describe a composite threat
that could lead to irrecoverable damage to the system. In order to launch any
serious attacks, the attackers need to collect data about the target’s activity.
86
A Albakri Chapter 3
The ‘Reporter’ property will be an entry point for online research leading to a
social engineering attack. This may lead to the installation of a key logger or
other malware. The “Security_Compromise” property might then reveal which
security hole in a critical system can be exploited starting from the Reporter’s
computer. A real-world example of a successful attack against critical infrastruc-
ture is the Ukraine Attack [183]. This attack started by weaponising the network
with BlackEnergy malware using spear-phishing attacks, then hijacking SCADA
systems, and remotely controlling electricity substations.
Table 16 provides examples of attack vectors associated with STIX incident
properties that the adversaries could use to have an initial access within a sys-
tem or a network. Most of the attacks happen in one or more steps. Cells in
columns “Initial Access Attack” and “Description” are taken from the MITRE
ATT&CK framework [184]. The values in column “STIX incident Report Prop-
erty” are our proposals as to which attributes might be used to initiate the attack.
87
A Albakri Chapter 3
The values in column “Attacks example/ Threat actor groups” are real-world ex-
ample of successful attacks or threat actor groups targeting critical infrastruc-
ture. As an illustration, the adversary first investigates the intended victim to
gather necessary background information so information such as “Description”,
“Short_Description”, “Reporter”, “Responder”, “Coordinator”, “Victim”, “Con-
tact”, “History”, “Information_Source” properties could be very useful for this
step. Then, the adversary would be able to take advantage of this information to
choose a specific individuals or entities and try to exploit a vulnerability on the
victim’s system. For example, APT19 [185] sent spearphishing emails containing
malicious attachments in RTF and XLSM formats to deliver and execute initial
exploits. This threat actor group targeted at least seven global law and investment
firms.
88
Attacks example /
Initial Access Attack [184] Description [184] STIX incident Report Property
Threat actor groups
Description, Short_Description,
The use of software, data, or commands to take COA_Taken, Related_Indicators,
Exploit Public-Facing advantage of a weakness in an Internet-facing Related_Observables, Leveraged_TTPs, CVE-2016-6662 [186],
A Albakri
Application computer system or program in order to cause Related_Incidents, Security_Compromise, CVE-2014-7169 [187]
unintended or unanticipated behaviour. Discovery_Method, COA_Requested,
Affected_Assets
Description, Short_Description,
Computer accessories, computers, or networking
COA_Taken, Related_Indicators, Passive network tapping [188],
hardware may be introduced into a system as
Related_Observables, Leveraged_TTPs, Keystroke injection [189],
Hardware Additions a vector to gain execution. While public references
Related_Incidents, Security_Compromise, adding new wireless access
of usage by APT groups are scarce, many penetration
Discovery_Method, COA_Requested, to an existing network [190]
testers leverage hardware additions for initial access.
Affected_Assets
Description, Short_Description,
All forms of spearphishing are electronically delivered APT19 [185], APT28 [191],
Spearphishing Attachment/ Reporter, Responder, Victim,
social engineering targeted at a specific individual, APT29 [192], APT32 [193],
Link/ via Service Coordinato, Contact, History,
company, or industry. APT37 [194]
Information_Source
Supply chain compromise is the manipulation of Description, Short_Description,
CCBkdr [195],
products or product delivery mechanisms prior to COA_Taken, Security_Compromise,
Supply Chain Compromise Elderwood [196],
89
receipt by a final consumer for the purpose of data Related_Incidents,COA_Requested,
Smoke Loader [197]
or system compromise. Affected_Assets
Adversaries may steal the credentials of a specific
Description, Short_Description,
user or service account using Credential Access APT28 [191], APT3 [198],
Reporter, Responder, Coordinator,
Valid Accounts techniques or capture credentials earlier in their APT32 [193], Carbanak [199],
Victim, Contact,
reconnaissance process through social engineering Cobalt Strike [200].
History, Information_Source.
for means of gaining Initial Access.
Table 16: Examples of attack methods associated with STIX incident properties
Chapter 3
A Albakri Chapter 3
The analysis has provided a broad and detailed insight into the disclosure risks
associated with cyber incident reports, when encoded in the STIX incident model.
It has highlighted individual pieces of sensitive information as well as the specific
threats arising from their disclosure. The STIX incident model consists of a
hierarchy of classes containing 123 properties, and these were analysed separately.
Properties may be sensitive both through their immediate content and through
their specific context within complex properties. For example, the “Reporter”
property tells us not only an employee name but also identifies the person who
had reported the incident and so is likely in a central cybersecurity role in the
organisation. The object oriented structure of the STIX incident model implies
that some sensitivity arises also through class inheritance: it may be inherited
from a superclass, as well as arise in a specific subclass. In the following, we
present general observations that follow from the analysis performed on the STIX
incident model.
Controlled/Uncontrolled properties identified in STIX incident model.
STIX is designed to be flexible and liberal about the information contained and
how it is represented. The incident model suggests specific value sets for many
properties, but also allows the content creator to choose any arbitrary value. This
lack of constraints implies that undisciplined use may disclose arbitrary sensitive
information. In particular, many properties consist of free text, which may contain
critical information about the incident, including organisation name, IP addresses,
impact and Course of Action, that must be protected. Tools for extracting sensi-
tive and identifying information from text are available: these can be characterized
as rule-based or machine learning-based [201]. The rule-based tools usually handle
the re-identification goal with pattern matching, regular expressions and dictio-
nary lookups. For example, the strings “DDoS” and “146.227.156.60” within some
free text property could be classified into the categories of incident category and
90
A Albakri Chapter 3
IP addresses.
Categories of information and associated threats. Intuitively, we ex-
pected to find threats relating to different kinds of information disclosure: per-
sonal, organizational, financial and cybersecurity. Indeed, most STIX properties
related specifically with one of these kinds, and have a matching set of associated
threats. Moreover, for each of these types a significant number of properties is
present in the STIX incident model.
Disclosing personal information. The number of properties that iden-
tify individuals in the organisations is high, such as the Reporter property that
characterizes the entity that reported the incident, and the Responder property
that characterizes the entity playing the role of the responder for the Incident.
Thus, disclosing any of these properties will be associated with multiple threats
including targeted attacks (APTs etc.) and social engineering attacks, such as
phishing and spear phishing. In [202], CERT-UK provides a case study of tar-
geting a system administrator of a UK organization by a spear phishing attack.
The attackers identified the system administrator and sent a spam email to the
system administrator. The goal of this attack was to install a RAT (Remote Ac-
cess Trojan) and getting advantage of the administrator permission to get access
to the network and collect sensitive information about the critical systems in this
targeted organization.
Disclosing the organisation’s information. The number of properties
that potentially identify organisations is high. For example, the Affected_Asset
property that specifies a list of one or more assets affected includes a description of
the asset and the security effect on the asset, for example, a HR database server for
an organisation. Thus, disclosing any of these properties will be associated with
threats including physical attack as well as targeted attacks and social engineering.
Disclosing financial information. The STIX incident model contains spe-
cific financial information that covers the estimated cost to the victim, which is
91
A Albakri Chapter 3
based on the loss of revenue from system downtime and operation cost to fix the
damage. For example, the Total_Loss_Estimation property specifies the total
estimated financial loss for the Incident and the Response_And_Recovery_Costs
property specifies the level of response and recovery-related costs. The loss of this
confidential information forms a data breach threat by itself but it also has an
associated threat of loss of reputation.
Disclosing cybersecurity information. The STIX incident model con-
tains cybersecurity information about the incident, such as the Course_Of_Action
property. This property refers to the course of action requested and taken for the
incident. In addition, it includes specific information about the incident, such as
whether non-public data was compromised and whether that data was encrypted
or not. The organisation’s analysis of the incident can be reported through the
Leveraged_TTPs property. Tactics, Techniques and Procedures (TTPs) consists
of the specific adversary behavior (attack patterns, malware, exploits) exhibited
and resources leveraged (tools, infrastructure, personas) [203]. This information
contributes to providing a complete understanding of the magnitude of the threat.
However, disclosing cyber information details like these could give hackers a road
map to conducting additional targeted attacks including physical ones.
Some information is critical only in combination. Some properties are
in general not sensitive, but become critical when combined with other properties
or externally available information. For example, the First_Malicious_Action
property specifies the time that the first malicious action related to the Inci-
dent occurred. This information is not sensitive by itself, but patterns in this
information may lead to attribution (identification of the attacker) [204]. In gen-
eral, privacy risks only materialise when a sensitive feature is revealed about an
identified actor but the identifying and sensitive features could occur in different
STIX properties. As an extreme example, for financial damages, strictly speaking
neither the Amount nor the Iso_currency_code property by itself is sensitive;
92
A Albakri Chapter 3
however, together they specify the estimated financial loss, which is sensitive. We
have discussed the issue of critical combinations of cyber security vulnerabilities
in detail in Section 3.2.3.
As our analysis above indicates, there are clear drawbacks to the flexibility of the
current STIX incident model. From the perspective of disclosure, free text fields
and unconstrained properties allow for information leaks. In addition, they offer
little perspective for data validation and thus scope for undetected human errors.
The potential for automated processing is also greatly reduced by variability of
inputs. This calls for disciplined use of the STIX model, which is likely most eas-
ily provided by ensuring that the more flexible fields are filled through templates,
possibly by a system generating STIX reports for the user from higher level infor-
mation. (As STIX 1.2 is XML based, which is not intended for human reading and
writing, some such interface is essential for human interaction in any case.) Sector
organisations could also develop custom versions of the STIX incident model that
specialize to their specific risk profile. Implementation of STIX in cyber informa-
tion sharing platforms could actively support this. In any case, consistent and
disciplined use of incident reporting should be supported by appropriate training
and policies within individual organisations.
In this section, we discuss the impact of the transition to STIX 2 [123], which is
promoted by OASIS [205], on our analysis of STIX 1.2 when sharing cyber threat
93
A Albakri Chapter 3
information. There are two main differences between STIX 1.2 and STIX 2, as
follows [206]:
STIX 2 currently defines twelve STIX Domain Objects (SDO), which are At-
tack Pattern, Campaign, Course of Action, Identity, Indicator, Intrusion Set, Mal-
ware, Observed Data, Report, Threat Actor, Tool, and Vulnerability. The incident
object has not been developed yet but it is intended to be included in STIX 2.1
[206]. However, many inherited classes in STIX 1.2 are defined as new objects
in STIX 2, such as “Course of Action” and “Identity” objects. The “Course of
Action” object contains the “Description” property. “Description” property type
is a free text field that describes the actions to prevent or respond to an attack. As
we have mentioned earlier any free text field might contain sensitive information,
hence any exposure of this information would be associated with multiple threats.
Another embedded property in STIX 1.2 incident model that was defined as a new
object in STIX 2 is the “Identity” object. The Identity object consists of many
properties such as name, description, and contact_information. This information
is sensitive because it can refer to the identity of the victim. Consequently, the
disclosure of this information can lead to threats, such as, loss of reputation and
spear phishing attack.
Besides STIX 1.2 and STIX 2 for describing threat intelligence, we have looked at
Incident Object Description and Exchange Format (IODEF) version 2 [125], which
94
A Albakri Chapter 3
was released in November 2016. It uses XML to represent and share cyber inci-
dent information between Computer Security Incident Response Teams (CSIRTs).
Similar to STIX 1.2, IODEF consists of a hierarchy of many classes and sub classes
used to describe Assessment, Method such as the attacker techniques, contact in-
formation such as email addresses and phone numbers, and Mitigation such as the
course of action. IODEF contains an optional attribute called restriction. The
purpose of this attribute is to inform the receiver how they should deal with this
information. The suggested values are public, need-to-know and private. How-
ever, using this property will not enforce the receiver to apply it. We found that
IODEF incident class properties from that constitute incident are similar to the
properties already existing in STIX 1.2. It has a “Description” property that is a
free-text filed to describe the incident, a “Method” property which describes the
techniques used to conduct the attack and the existing weakness, contact infor-
mation and other properties which match in general our understanding of what
kinds of information exists and the resulting consequences of disclosing this infor-
mation. Finally, our analysis pointed out that the risks of sharing cyber incident
information are still the same even when using different standards of representing
cyber threat information.
3.4 Conclusion
In this chapter, we have performed a comprehensive analysis of incident reporting
information through the STIX incident model to identify the threats of disclosing
sensitive and identifying information. We assigned the sets of possible threats
based on the ENISA threat taxonomy. We identified the threats associated with
each property, and evaluated those for severity in both the privacy and cyber
security dimension. We now have a full overview of which incident information
needs protecting, and why. In addition, we have provided guidance for disciplined
95
A Albakri Chapter 3
use of the STIX incident model to reduce and focus information security risks.
The following chapter will extend this work by proposing a new risk assessment
model for sharing cyber threat information validated by empirical evaluation.
96
A Albakri Chapter 3
Title ✕ ✓ ✓ ✕ ✕ This is a free text field which can T5, T1 2+2 2+2
refer to particular business
information.
External_ID ✕ ✓ ✕ ✕ ✕ This refers to the report not the N/A 0 0
incident.
Time ✓ ✕ ✓ QI ✕ The "TimeType" class has sensitive T1, T5, T3, T4, 0 2+2
and QI properties. T22, T24, T26
Description ✓ ✓ ✓ ✓ * This is a free text field which can T5, T1, T4, 2+2 2+2
refer to particular business T19, T20, T21,
information and may contain T24, T25
sensitive and identifying
information.
Short_Description ✓ ✓ ✓ ✓ * This is a free text field which can T5, T1, T4, 2+2 2+2
refer to particular business T19, T20, T21,
information and may contain T24, T25
sensitive and identifying
information.
Reporter ✓ ✓ ✓ ✓ ✓ The identity of the reporter can be T1, T7, T8, T10, 4+2 2+2
revealed. T26
Responder ✓ ✓ ✓ ✓ ✓ The identity of the responder can be T7, T1, T8, T10, 4+2 2+2
revealed. T26
Coordinator ✓ ✓ ✓ ✓ ✓ The identity of the coordinator can T7, T1, T8, T10, 4+2 2+2
be revealed. T26
Victim ✓ ✓ ✓ ✓ ✓ Can refer to a particular critical T1, T5, T7, T8, 4+2 2+3
infrastructure and the identity of the T9, T11, T12,
victim can be revealed. T21, T24, T25,
T26
Affected_Assets ✓ ✓ ✓ ✕ ✕ The "AffectedAssetsType" class has T1, T3, T4, T6, 0 2+2
sensitive properties. T9, T10, T11,
T19, T20, T21,
T22, T23, T24,
T25, T26
Impact_Assessment ✓ ✓ ✓ ✕ ✕ The "ImpactAssessmentType" class T1, T3, T4, T12, 0 2+3
has sensitive properties. T13, T14,
Status ✓ ✓ ✓ ✕ ✕ This field can refer to particular T3, T5, T9, T12, 0 2+2
business information. T22, T25, T26
Related_Indicators ✓ ✓ ✓ ✓ ✓ This can refer to specific information T1, T3, T5, T8, 2+2 2+2
about the incident and adversary T9, T14, T22,
Tactics, Techniques, and Procedures T26
(TTPs). And may contain identifying
information about the adversary.
Related_Observables ✓ ✓ ✓ ✓ ✓ The "RelatedObservablesType" class T1, T3, T5, T22, 2+2 2+2
has sensitive and identifying T26
properties.
97
A Albakri Chapter 3
Leveraged_TTPs ✓ ✓ ✓ ✓ ✓ This can refer to the adversary T1, T5, T9, T23, 2+2 2+2
Tactics, Techniques, and T24
Procedures (TTPs) and victim
weaknesses and it
specifies the identity of the
information source.
Attributed_Threat_Actors ✓ ✓ ✓ ✓ ✓ The "AttributedThreatActorsType" T1, T5, T9, T24 2+2 3+3
class has sensitive and identifyale
properties.
Discovery_Method ✕ ✕ * ✕ ✕ Not generally but there is revealing T5, T12, T24 0 2+2
of some security controls.
History ✓ ✕ ✓ ✓ ✓ The "History" class has sensitive T1, T5, T9 2+2 2+2
and identifible properties.
98
A Albakri Chapter 3
Affected_Asset ✓ ✓ ✓ ✓ ✕ This can refer to specific information T1, T3, T4, T5, 4+2 2+2
about the organisation and the T6, T9, T10,
business (knowing the appropriate T11, T12
actions are taken for the affected
asset). This class has sensitive and
identifying information.
Type ✓ ✕ ✓ QI ✕ This can refer to a specific business T1, T9, T10 1+1 1+2
type such as critical IT infrastructure
in key sector of the economy. Quasi-
identifier: it can be combined with
other property to create a unique
identifier.
99
A Albakri Chapter 3
Description ✕ ✓ ✓ ✓ * This is a free text field which can T1, T4, T5, T9, 2+2 2+2
refer to particular business T10, T11
information and may contain
sensitive and identifying
information.
Business_Function_Or_Role ✕ ✓ ✓ QI ✕ This is a free text field which can T1, T4,T9 1+1 2+2
refer to particular business
information and may contain
sensitive and identifiable
information.
Nature_Of_Security_Effect ✓ ✓ ✓ ✕ ✕ This class is derived from a sensitive T3, T4, T5, T9 0 2+2
class which contains information
about the PropertyAffectedType
including: The security property that
was affected by the incident;
description of how the security
property was affected; In what
manner the availability of this asset
was affected.
Structured_Description ✓ ✓ ✓ ✓ * This class has sensitive and T1, T3, T9 2+2 2+2
identifying attributes which can
represent stateful properties or
measurable events pertinent to the
operation of computers and
networks.
vocab_name ✕ ✓ ✓ QI ✕ This refers to specific information T1, T9, T10 1+1 2+2
about the type of the assets (Backup,
Database, DHCP,Log,Mail,
Manager,Camera, Person)
Property_Affected ✓ ✓ ✓ ✓ ✕ This class contains sensitive and T1, T3, T5, T9, 2+2 2+2
identifying information that can refer T18
to specific information about the
vulnerability, confidentiality or
integrity of the data and the
property.
100
A Albakri Chapter 3
Description_Of_Effect ✕ ✓ ✓ ✓ * This is a free text field which can T5, T9, T18 2+2 2+2
refer to a particular business
information and may contain
sensitive and identifying
information.
Non_Public_Data_Compromised ✓ ✕ ✓ QI * This can refer to specific information T1, T4, T5, T12 0 1+1
about the data secrecy and
confidentiality, and the impact on
the organisation’s reputation.
Vocab_name ✕ ✕ ✓ QI ✕ This can refer to specific information T1, T9, T10 0 1+1
about the type of the assets (Backup,
Database, DHCP, Log, Mail,
Manager, Camera, Person).
Data_encrypted ✕ ✕ ✓ ✕ ✕ This can refer to specific information T4, T5, T12 0 1+1
about the data secrecy and
confidentiality.
Total_Loss_Estimation ✓ ✕ ✓ ✕ ✕ This can refer to a specific financial T5, T12, T15, 0 2+3
information about the business and T24
organisation.
101
A Albakri Chapter 3
Actual_Total_Loss_Estimation ✓ ✕ ✓ ✕ ✕ This can refer to estimated financial T5, T12, T24 0 2+2
loss for the incident.
102
A Albakri Chapter 3
Related_Indicator ✓ ✓ ✓ ✓ ✓ This can refer to specific T1, T5, T9, 4+2 2+2
information about the incident T19, T21, T22,
triggers and the adversary Tactics, T26
Techniques, and Procedures (TTPs)
and victim weaknesses. In addition it
can refer to the identifying
information.
Related_Observable ✓ ✓ ✓ ✓ ✓ This can refer to specific information T1, T5, T19, 4+2 2+2
about the incident details. Besides, it T21, T22, T26
can refer to identifying information
for the source of information.
Threat_Actor ✓ ✓ ✓ ✓ ✓ Sensitive: This can refer to specific T1, T5, T9, T24 4+2 2+2
information about the cyberattack
threat including presumed intent and
historically observed behaviour.
The class "RelatedThreatActorType"
contains "InformationSourceType"
property which gives detail about the
source of a given data.
103
A Albakri Chapter 3
History_Item ✓ ✓ ✓ ✓ ✓ This can refer to specific information T1, T5, T9 4+2 2+2
about the actions taken during the
handling of the incident. Moreover,
it specifies the author of the
JournalEntry note.
Full analysis for Class “IncidentType”. In columns, (CT) stands for Complex Type, (IFT) for Include-Free-Text, (S) for Sensitivity, (I) for Identification,
(PI) for Personal Information, (PS) for Privacy Severity (Level of Identification + Prejudicial Effects), (CSS) Cybersecurity Severity (Ease of Exploitation
+ Prejudicial Effects). For the values of the properties, ‘✓’ denotes ‘yes’, ‘✕’ denotes ‘No’, ‘*’ denotes ‘It depends’.
104
Chapter 4
In this chapter1 , we present a quantitative risk model to assess the risk of sharing
CTI datasets enabled by sharing with different entities in various situations. The
model enables the identification of the threats and evaluation of the impacts of
disclosing this information. We present three use cases that help to determine the
risk level of sharing a CTI dataset and consequently, the mitigation techniques to
enable responsible sharing. Risk identification and evaluation have been validated
using experts’ opinions.
4.1 Introduction
In the previous chapter, we performed a comprehensive analysis of incident re-
porting information through the STIX incident model to identify the threats of
disclosing sensitive and identifying information. We identified the threats associ-
ated with each property and evaluated those for severity in both the privacy and
cybersecurity dimension. The next step is to provide a risk model for evaluating
1
This chapter is based on the conference paper “Risks of Sharing Cyber Incident Information”
[207]
105
A Albakri Chapter 4
106
A Albakri Chapter 4
experts. Section 4.6 describes threats to validity of the model. Finally, Section
4.7 summarises this chapter.
107
A Albakri Chapter 4
First, we need to identify the associated risk of disclosing any property of the
shared CTI dataset. Each property may have a different severity level in an or-
ganisation. In Chapter 3, we have estimated the cybersecurity severity score for
each property in the STIX 1.2 incident model [208]. The severity score range is
[1,8], where 1 is the lowest level of severity and 8 is the highest level of severity.
Based on the severity score, severity was assigned to four impact levels: negligi-
ble, limited, significant and maximum which can be represented as 10, 50, 75 and
100. There are limitations to using ordinal scales in risk assessments [209]. In
[209], researchers discuss the possibility of bias and subjectivity coming from dif-
ferent levels of professionals’ experiences and inconsistency in understanding each
incident’s factors and indicators. However, various risk assessment standards con-
sidered “best practice” to estimate cyber risks use ordinal scores. For example,
the NIST 800-30 standard for conducting information systems risk assessments
[157] is based on ordinal scores. Also, the main advantage of using the ordinal
scale is the ease of comparison between variables. In this model, estimating the
severity level was defined based on CNIL methodology. CNIL methodology [182]
has a detailed and precise definition for each selected score. We have extended
this method to the cybersecurity risks, estimating cybersecurity severity (CSS)
score. However, changing the scale would not affect the model as it also depends
on the organisation risk profile, so we will leave the organisation to decide how to
handle the risks and the acceptance level. Also, changing the numbers of ranks
108
A Albakri Chapter 4
on the severity scale would not change the final evaluation process as it would
be related to the organisation risk profiles.Let each property be represented as a
single bit in the property vector:
→
−
P = {Pi } ∈ {0, 1} ∀i , i = 1, 2, . . . n (4)
The second step in our model is to perform a threat analysis, which consists of
identifying the potential threat action that may exploit the system or the organ-
isation based on the CTI information disclosure. Information about threats can
be collected from the organisation’s CTI database and threat taxonomies (see
Section 2.4.5) which define a list of potential threats to the organisation.
Let each threat be represented as a single bit in the threat vector:
→
−
T = {Tj } ∈ {0, 1} ∀j , j = 1, 2, . . . m. (5)
109
A Albakri Chapter 4
Total Associated Risk (TAR) is the sum of associated subrisks of disclosing CTI
information and can be computed as follows:
110
A Albakri Chapter 4
n X
m
Lij ∗ Si ∗ Pj ∗ Ti whereT AR ∈ R+
X
T AR = (6)
i=1 j=1
4.3 Evaluation
To evaluate the ARM model, we have conducted an experiment on a repository
containing STIX documents [122] and another experiment on three case studies
that were analysed manually using our model by independent experts.
Dataset information
CTI datasets are collected using different platforms that are either open source
or commercially based. In this chapter, we have used the largest known public
dataset of STIX incidents as provided by MITRE [211] [122]. A sample of a STIX
incident report is shown in Figure 9.
The dataset consists of 4788 STIX incident reports. Our experiment focuses on
evaluating the risk of sharing these incidents. We have implemented a parser to
parse each file in the dataset to extract the properties included in the report and
look at the value of each property in order to evaluate them against the associated
111
A Albakri Chapter 4
risk model.
Parameter Settings
112
A Albakri Chapter 4
Based on Chapter 3, we defined the list of threats associated with disclosing these
properties. Table 19 shows the list of threats.
Property List
Title Coordinator Related Observables
Time Victim Leveraged TTPs
Description Affected Assets Attributed Threat Actors
Short Description Impact Assessment Intended Effect
Reporter Status Security Compromise
Responder Related Indicators Discovery Method
Related Incidents
Threat List
Social engineering Loss of reputation
Failure to meet contractual requirements Manipulation of information
Violation of laws or regulations Misuse of information/systems
Compromising confidential information Failed business process
Identity theft (Identity Fraud/ Account) Man in the middle/ Session hijacking
Unauthorised activities Generation and use of rogue certificates
Targeted attacks (APTs etc.) Abuse of authorisations
Judiciary decisions/court orders
Results
One of our goals is to evaluate the cybersecurity risk of sharing CTI data. To assess
the effectiveness of achieving this goal, we presents some measurements based on
113
A Albakri Chapter 4
the results obtained from the associated risk values of the sample dataset. These
measurement are:
For the maximum value, we have used the worst-case scenario. We assumed
that we have an incident report which contains all the properties in this dataset
along with their values. The histogram of the data processed is presented in
Figure 10. The histogram has one peak for the risk range between 281 and 300.
The minimum value is 70 and the maximum value is 345. There are no gaps or
extreme outliers.
The analysis as a whole is also essentially a worst case one: it assumes any
information contained in a property represents all the associated threats. We can
observe that when the risk value is low then the incident report is unlikely to be
useful when sharing for analysis. For example, the incident with title “advertising
servers were compromised and made to serve up malware (darkleech)” has a risk
114
A Albakri Chapter 4
value of 115. This report only contains general and public information without
specific information about the victim, technical or business details.
On the other hand, with a high-risk value the report will contain more information
about the incident, which may be useful for analysis. For example, the incident
report with the title “Embedding Scripts in Non-Script Elements” contains tech-
nical information about the attack pattern which has a high severity impact, the
victim information and location, in addition to the affected assets information and
their properties such as the loss of availability for days.
115
A Albakri Chapter 4
This scenario consists of two cyber threat companies, CyberA and CyberB. Cy-
berA has been attacked by specific malware. This malware was designed to steal
encrypted files - and was even able to recover files that had been deleted. CyberA
wants to share this incident dataset with others in their sharing community. The
purpose of this sharing is to let recipients check if they have the same malware on
their system.
Table 20 shows the sample CTI dataset, which contains the properties that might
be shared.
116
A Albakri Chapter 4
Property Value
TTP Malware Type Capture Stored Data, Remote Access Trojan
Indicator Name File hash for malicious malware
Indicator Description This file hash indicates that a sample of malware alpha is present.
Hashes.’SHA-256’= ’ef537f25c895bfa7jfdhfjns73748hdfjkk5d89fjfer8fjkdndkjn7yfb6c’
Indicator Value Windows-registry-key:=
“HKEY_LOCAL_MACHINE\\SYSTEM\\ControlSet001\\Services\\MSADL3”
CVE-2009-3129, CVE-2008-4250, CVE-2010-3333,
Vulnerability
CVE-2012-0158, CVE-2011-3544
Incident associated with CyberA campaign. The malware was designed to steal
Incident Title
encrypted files - and was even able to recover files that had been deleted.
Date 2012-01-01T00:00:00
Reporter Name Alex John
Reporter Email Address [email protected]
Reporter Address US-LA
Victim Name CyberA / The CEO Device
Victim sector Financial sector
Victim Device IP address: 146.227.239.19
Victim Email Address [email protected] / [email protected]
Victim Address CyberA Ltd, IT Department, LONDON, W5 5YZ
Affected Assets Type Desktop, Mobile phone, Router, Server, Person
Confidentiality (Classified, Internal, Credentials, Secrets, System)
Affected Assets Property
Integrity (Software installation, Modify configuration, Alter behaviour)
Incident Status Not solved
Total loss £ 65,000
To compute the associated risk of sharing this CTI dataset, we apply our model
as follows. The first step is to identify and analyse the severity for each property
in the dataset. Table 21 defines the threats associated with disclosing the CTI
dataset as derived from Table 20. We have assigned the sets of potential threats
for each property and evaluated those for severity in cyber security contexts.
117
A Albakri Chapter 4
Table 22 represents the same relationship between the threats and the prop-
erties of the CTI dataset by focusing on the threats.
Based on the CTI dataset disclosure and the associated threats we estimate
the likelihood of a threat occurring based on the property value and the context
which varies depending on the organisations’ requirements.
118
A Albakri Chapter 4
Table 23 presents our estimates of the likelihood Lij of the threats and the total
risk score TAR when sharing with public sharing communities. In table 23, the
risk level for each threat will be the likelihood of the threat and the severity of
the exposure of the associated properties. From table 23, we can evaluate that
the sub-risk value of (P1, P3, P4, P5, P6 and P9) properties when the threat is
“Social Engineering” (T2). The total value is 28 using equation (6):
(1 * 10) + (0.1* 10) + (0.1 * 10) + (1 * 10) + (0.5 * 10) + (0.1 * 10) = 28
the sub-risk associated with sharing (P1, P3, P4, P5 and P6) properties when the
threat is “Targeted attacks (APTs etc.)” (T4) is:
(1 * 10) + (0.5* 10) + (0.5 * 10) + (1 * 10) + (1 * 10) = 40.
The total risk will be adding all sub risk and the total value is 275
Table 24 presents the estimated likelihood of the threats and the total risk
score value when sharing with trusted communities.
Finally, we evaluated the risk in three different scenarios: sharing the CTI dataset
with public communities, sharing when involving/considering a high level of trust
with the receiver and finally, sharing after removing the unrelated information.
119
A Albakri Chapter 4
P1 P2 P3 P4 P5 P6 P7 P8 P9 SUB-RISK
T1 0.1 0 0 0 0 0 0 0 0.1 2
T2 1 0 0.1 0.1 1 0.5 0 0 0.1 28
T3 0.5 1 0.5 0.5 0 0 0 0 0 25
T4 1 0 0.5 0.5 1 1 0 0 0 40
T5 0 0 0.1 0.1 0 0 0 0 0 2
T6 0 1 0.1 0.1 0 1 1 1 0 82
T7 0 0 0 0 0.5 0 0 0 0 5
T8 0 0 0 0 0.5 0 0 0 0 5
T9 0 0 0 0 0.5 0 0 0 0 5
T10 0.5 0 0 0 0.5 0 1 0.5 0 65
T11 0 0 0 0 0 0 0.1 0.1 0 6
TAR 275
Table 23: UC1 Likelihood and total risk value (public sharing communities)
120
A Albakri Chapter 4
P1 P2 P3 P4 P5 P6 P7 P8 P9 SUB-RISK
T1 0.1 0 0 0 0 0 0 0 0 1
T2 0.5 0 0.1 0.1 0.5 0.1 0 0 0.1 14
T3 0.1 0.5 0.5 0.5 0 0 0 0 0 16
T4 0.1 0 0.1 0.1 0.1 0.5 0 0 0 9
T5 0 0 0.1 0.1 0 0 0 0 0 2
T6 0 0.5 0.1 0.1 0 0.5 0.5 0.5 0 42
T7 0 0 0 0 0.1 0 0 0 0 1
T8 0 0 0 0 0.1 0 0 0 0 1
T9 0 0 0 0 0.1 0 0 0 0 1
T10 0.1 0 0 0 0.1 0 0.5 0.5 0 32
T11 0 0 0 0 0 0 0 0 0 0
TAR 119
When sharing with public communities, the risk value is 275. On the other
hand, sharing within trusted communities decreases the risk value to 119.
In this scenario, the purpose of sharing is to check the existence of the same mal-
ware thus we need to know the type and description of the malware, in addition
to the indicators of compromise such as hash file value and windows registry key.
Therefore, the properties needed for sharing are P2 and P3. Therefore, the asso-
ciated risk value if we only share these essential properties will be reduced to 34
as shown in Table 25. Reducing the risk value is important for encouraging CTI
sharing, and to achieve that, the organisation filters out the sensitive information
that is not relevant to the purpose of this sharing.
121
A Albakri Chapter 4
P2 P3 SUB-RISK
T1 0 0 0
T2 0 0.1 1
T3 1 0.5 15
T4 0 0.5 5
T5 0 0.1 1
T6 1 0.1 11
T7 0 0 0
T8 0 0 0
T9 0 0 0
T10 0.1 0 1
T11 0 0 0
TAR 34
Table 25: UC1 Likelihood and total risk value for sub-dataset
Our model allows for each risk assessment to be combined in different ways
for different purposes. For instance, Figure 11 demonstrates a risk assessment
visualisation for the same CTI dataset. For each field in the CTI dataset, we
displayed the sum of the risks posed by that property in case of disclosure. This
visualisation shows which properties of CTI datasets are the greatest risk when
sharing and might be used in the context of raising organisational awareness of
the CTI dataset fields.
122
A Albakri Chapter 4
Figure 11: A risk assessment visualisation showing risk value per type of informa-
tion
This section presents the results of the data collection from a questionnaire, see
Section 4.8, conducted within privacy and cybersecurity workshops with 15 ex-
perts in privacy and cybersecurity. The study provided anonymity to the partic-
ipants. The questionnaire contains 3 parts. The first part focuses on identifying
the threats associated with disclosing the CTI dataset. We proposed a list of
threats and free text for extra suggestions. This part will validate our analysis of
identifying the threats of disclosing sensitive and identifiable information in cyber
incident information as proposed in Chapter 3. The second part focuses on the
security controls that might be applied to preserve privacy of the dataset such as
redaction/selection, anonymisation, aggregation, encryption, and so on. This part
will give an insight of the required protection level and the technical methods that
would help organisations to share CTI dataset and to ensure the confidentiality.
Finally, the third part focuses on giving a risk value to the dataset in both cases,
before and after applying the security controls. This part will help validate our
ARM model.
Fifteen experts filled out the questionnaire, and a summary of the data collected
is presented in Table 26 and discussed in more detail below.
123
A Albakri Chapter 4
Question Part 1: sharing with public Part 2: sharing with trusted entities
Q-1 15 12
Q-2 15 13
Q-3.1 (Redaction/Selection) 8 0
Q-3.2 (Anonymisation) 7 7
Q-3.3 (Aggregation) 6 7
Q-3.4 (Enc) 7 7
Q-3.5 (others) 3 3
Q4 14 14
Fifteen experts answered question Q1 for sharing the CTI dataset with public
sharing community, and 12 experts answered the same question when sharing
with trusted communities. Nine experts selected in detail the possible associated
threats of disclosing this dataset. Table 27 presents the threats and how many
experts have selected that threat as a possible threat in case of disclosing this CTI
dataset. For example, six experts out of nine agreed that disclosing this dataset
would be associated with “Compromising confidential information” and “Loss of
reputation” threats. The remaining experts did not consider these as possible
threats. To reduce the effect of experts’ subjectivity, we will measure the level of
agreement between all opinions in addition to comparing them to our opinion. To
find the level of agreements between our selection and the experts’ selection, we
compute the Fleiss’ Kappa agreement score [213]. In this use case, we find that
we have a “moderate” agreement level with six experts with k = 0.428 for data
containing seven experts, including our own rating for 17 possible threats. Still,
the rest of the experts agreed with some of the proposed threats. Therefore, the
result indicates that the list we have proposed in Table 22 matches significantly
with the experts’ selections in Table 27.
124
A Albakri Chapter 4
Table 28 presents the number of experts who decided which threats might be
associated with disclosing the CTI dataset when sharing with trusted entities. The
possible threats have decreased due to the increase of trust level among the sharing
organisations. However, the result still shows that the list we have proposed in
Table 22 matches the experts’ selections in Table 28.
For question Q2, eight experts indicated that we cannot share this dataset.
On the other hand, seven indicated that we could share after mitigation. This
result indicates that sharing this dataset without applying any security controls
will be a high risk to CyberA.
For questions Q3.1 and Q3.2, experts selected values that should be anonymised
or removed from the dataset before sharing, such as “Reporter Name”, “Reporter
125
A Albakri Chapter 4
126
A Albakri Chapter 4
For question Q3.5, three experts confirmed that specific fields such as IP addresses
and email addresses should be generalised.
For question Q4.1, experts were asked to evaluate overall risk on a 1-5 scale, with
5 being the worst. Nine experts indicated that the risks are between 4 and 5 which
constitutes a high level of risk. On the other hand, after applying the suggested
controls, five experts suggested that the risk value would be between 1 and 2 which
constitutes a low risk level. However, when sharing the CTI dataset with trusted
entities, the overall value changed from a medium risk level to a low risk level.
Eight experts stated that the risk value is between 2 and 3, and after applying
the security controls, eight stated it was between 1 and 2.
As a result, the case study findings suggest that sharing this CTI dataset is pos-
sible after applying specific security controls, mainly by removing unrelated data
and applying encryption. From the questionnaire results, we find out that our
model reached an acceptable match with respect to the cybersecurity and privacy
experts. All the threats we identified were also identified by the experts. Experts
identified different controls to reduce the risk of sharing and they agreed that
sharing this dataset without applying these controls is high risk. Although some
experts had different decisions, this difference can be attributed to the different
expertise levels and the experts’ subjective view of how they define the granularity
level of the risk. Also, threat and technical details such as network information
can have different meaning between security experts. For example, five experts
have not selected encryption as a security control which should have been applied
before sharing, and others focused mainly on the anonymisation techniques as a
security control. In our model, the dataset admin is free to select the security
control of choice, for example, homomorphic encryption [49] [216] or Secure mul-
tiparty computation [217] [59].
127
A Albakri Chapter 4
This scenario consists of cyber threat companies, CyberA and other companies,
which share threat intelligence with one another. CyberA has been attacked by
a specific threat actor and would like to know how many companies have been
attacked by the same threat actor. Sharing the threat actor information is sen-
sitive due to the possibility of identifying the techniques and procedures used in
the attack, the victim information and the targeted sector such as oil business,
health and diplomatic offices. The incentive of this sharing is to understand and
analyse this threat actor. CyberA needs to determine how many companies have
been targeted by the same threat actor.
In this case study, we have used the STIX report about the “Red October” Cam-
paign [218]. Before sharing the STIX report, we need to evaluate the associated
risk of sharing this information within the CTI sharing communities.
Table 29 shows the sample CTI dataset which contains the properties that might
be shared.
128
A Albakri Chapter 4
Property Value
Command and Control, capture stored data, Scan network,
TTP Malware Type Exploit vulnerability, Remote Access Trojan, Downloader,
Export data, Spyware/Keylogger, Brute force
TTP Attack Patterns CAPEC-98
CVE-2009-3129, CVE-2008-4250, CVE-2010-3333,
Vulnerability
CVE-2012-0158, CVE-2011-3544
Incident associated with Red October campaign. Phishing email with malware
Title attachment leading to infection, C2, credential compromise, and lateral movement
through the network.Goal to steal classified info and secrets
External ID 4F797501-69F4-4414-BE75-B50EDCF93D6B
Incident Date 2012-01-01T00:00:00
Reporter Alex John, W-baker org, [email protected], (LE1 9BH, Leicester, UK)
Victim Japan Fair Trade Commission – [email protected]
International Affairs Division (16th floor),
Japan Fair Trade Commission,
Victim Address
6-B building, Chuo Godo Chosha, 1-1-1
Kasumigaseki, Chiyoda-ku Tokyo 100-8987
Affected Assets Type Desktop, Mobile phone, Router or switch, Server, Person
Confidentiality (Classified, Internal, Credentials, Secrets, System)
Affected Assets Property
Integrity (Software installation, Modify configuration, Alter behaviour)
Security Compromise Yes
Discovery Method Ext - suspicious traffic
Threat Actor Title Lone Wolf Threat Actor Group
Notes: Basing on registration data of command and control servers and numerous
artefacts left in executables of the malware, we strongly believe that the attackers
Threat Actor Description have Russian-speaking origins. Current attackers and executables developed by
them have been unknown until recently, they have never related to any other
targeted cyberattacks
Threat Actor The Lone Wolf / Gookee Organisation
Threat Actor/ Country Russia
Threat Actor/ Administrative Area Moscow
Threat Actor Electronic /Address Identifier [email protected] / facebook.com/theLonewolf
Threat Actor Language Russian
Threat Actor Motivation Espionage
“example:ttp-fcfe52c2-3060-448b-b828-3e09341485b1”
129
A Albakri Chapter 4
Analogous to use case 1, we have evaluated the associated risk of sharing the
CTI dataset, we are applying our model as follows. Table 30 defines the threats
associated with disclosing the CTI dataset and identifies the cybersecurity severity
for each property as derived from Table 29.
Then we have Table 31, which represents Table 30 in a different way by focusing
on the threats.
130
A Albakri Chapter 4
Table 32: UC2 Likelihood and total risk value (public sharing communities)
131
A Albakri Chapter 4
Table 33: UC2 Likelihood and total risk value (trusted communities)
When sharing with public communities, the risk value is 498. On the other
hand, sharing within trusted communities decreases the risk value by 58% making
the value 208. To reduce the risk of sharing and preserve the privacy in the shared
information, data minimisation should be applied to exclude sensitive information
that is not relevant to the analysis from the original dataset. The sanitised dataset
would fulfil the purpose and usefulness of sharing. In this use case, we keep two
properties which are “TTPs” and “Threat_Actors”. The total risk score of the sub
dataset after removing unrelated properties will be reduced to 280 as explained
in Table 34.
132
A Albakri Chapter 4
Table 34: UC2 Likelihood and total risk value for sub-dataset
This section presents the result of the data collection using the same questionnaire
as used for Section 4.5.1. Eleven experts filled the survey and a summary of the
data collected is presented in Table 35 and discussed in more detail below.
Question Part 1: sharing with public Part 2: sharing with trusted entities
Q-1 11 10
Q-2 11 9
Q-3.1 (Redaction/Selection) 7 5
Q-3.2 (Anonymisation) 3 5
Q-3.3 (Aggregation) 3 1
Q-3.4 (Enc) 3 4
Q-3.5 (others) 0 0
Q4.1 9 9
Q4.2 6 6
The first question was answered by 11 experts for sharing the CTI dataset with
133
A Albakri Chapter 4
Table 37 presents the number of experts who decided which threats might be
associated with disclosing the CTI dataset when sharing with trusted entities. As
shown in Table 37 the set of possible threats has been reduced due to the increase
134
A Albakri Chapter 4
of trust level among the sharing organisations. However, the result still shows
that the list we have proposed in Table 30 is very similar to the experts' selections
in Table 37.
For question Q2, eleven experts indicated that we cannot share this dataset,
or we can share after applying specific security controls. This result indicates that
we need to apply security controls before sharing this dataset in order to reduce
the risk of sharing.
For questions Q3.1 and Q3.2 experts select values that should be anonymised or
removed from the dataset before sharing. Many of the experts propose that we
need to remove all personal information and victim information such as the organ-
isation’s name. In this case the victim information is not related to the purpose
of sharing which matches our model and evaluation.
For question Q3.3, three experts gave answers which included Address, Date and
Affected Assets. This indicates that some information needs to be grouped and
aggregated before sharing as part of reducing the risk of sharing individual infor-
mation.
For question Q3.4, three experts indicate that some attributes should be en-
crypted, such as threat actor and TTPs information and we can use techniques
135
A Albakri Chapter 4
This use case has been conducted within the AIR4ICS [220] project. The project
develops a new agile incident response framework for industrial control systems.
136
A Albakri Chapter 4
The events in the project simulated a high pressure situation of a live cyber
incident, cyber Red Team vs Blue Team scenarios. We have conducted the ques-
tionnaire during two events: the first event represented an infrastructure of UK
deep seaport called CTI port. This includes the docking and berthing of ships,
loading and unloading of cargo and the warehousing and distribution of goods via
road, rail and sea. The port's systems consist of 3 main elements: an enterprise
network, an operational technology component and ships systems.
In the first scenario, CTI Port has been attacked by specific malware. This mal-
ware was designed to steal encrypted files - and was even able to recover files
that had been deleted. CTI Port wants to share this incident dataset with others
in their sharing community and the board has agreed to share this report. The
purpose of this sharing is to identify the threat actor behaviour and how they get
in. Also, to check if the attacker is targeting specific business.
Table 38 shows the sample cyber threat intelligence dataset, which contains the
properties that might be shared.
137
A Albakri Chapter 4
Property Value
Incident associated with CTI port campaign.
Incident Title The main techniques are Brute force, credential compromise.
The goal is to steal classified information and secrets.
Incident Category Unauthorised Access (A group gains logical access without permission)
Command and Control, capture stored data, Scan network,
TTP Malware Type Exploit vulnerability , Remote Access Trojan, Downloader,
Export data, Spyware/Keylogger, Brute force
File hash for malicious malware. This file hash indicates
Indicator of Compromise
that a sample of malware alpha is present.
Hashes.‘SHA-256’:
‘ef537f25c895bfa7jfdhfjns73748hdfjkk5d789c2b76589fjfer8fjkdndkjn7yfb6c’
Indicator Value Windows-registry-key:
“HKEY_LOCAL_MACHINE\\SYSTEM\\ControlSet001\\Services\\MSADL3”
IP: 147.228.151.30
CVE-2009-3129, CVE-2008-4250, CVE-2010-3333,
Vulnerability
CVE-2012-0158, CVE-2011-3544
Incident Date 2019-09-25 10:18:00
Reporter Alex John, W-baker org, [email protected], - (LE1 9BH, Leicester, UK)
Victim CTI Port [email protected]
Victim Address CTI port - Main building (5th floor) - LE1 9BH, Leicester, UK
CRM/Finance, Web server, Finance Lead WS, Accounting WS
Affected Assets Type 192.168.125.112, 192.168.125.114,
192.168.125.129, 192.168.121.151
Confidentiality (Classified, Internal, Credentials, Secrets, System)
Affected Assets Property
Integrity (Software installation, modify configuration, Alter behaviour)
Security Compromise Yes
Monitoring Service
Discovery Method
(This incident was reported by a managed security event monitoring service. )- suspicious traffic
Threat Actor Title Lone Wolf Threat Actor Group
Based on the registration data of CRM/Finance server and several
Threat Actor Description pieces of evidence left in executables of the malware, we strongly
believe that the attackers have Russian-speaking origins.
Threat Actor Org-Name The Lone Wolf / Gookee Organisation
Threat Actor Country Russia
Threat Actor Admin- Area Moscow
Threat Actor E- Address Identifier [email protected] / facebook.com/theLonewolf
Threat Actor Language Russian
Threat Actor Motivation Espionage
“example: ttp-fcfe52c2-3060-448b-b828-3e09341485b1”
Threat Actor Observed TTPs “example:ttp-2a884574-bf2b-4966-91ba-3e9ff6fea2e3”
IP address: 147.228.151.33 / 147.228.151.35
Block communication between the threat actor agents
Course of Action and the Finance/CRM Server. This server contains records
about the shipments so there should be a high operational impact
Total Loss 75000£
138
A Albakri Chapter 4
Analogous to use case 1 and use case 2, we have evaluated the associated risk of
sharing the CTI dataset, we are applying our model as follows. Table 39 defines the
threats associated with disclosing the CTI dataset and identifies the cybersecurity
severity for each property as derived from Table 38.
139
A Albakri Chapter 4
140
A Albakri Chapter 4
Table 41: UC3 Likelihood and total risk value (public sharing communities)
Table 42: UC3 Likelihood and total risk value (trusted communities)
When sharing with public communities, the risk value is 541. On the other
hand, sharing within trusted communities decreases the risk value by 49% making
the value 261. To reduce the risk of sharing and preserve the privacy in the shared
information, data minimisation should be applied to exclude sensitive information
that is not relevant to the analysis from the original dataset. The sanitised dataset
would fulfil the purpose and usefulness of sharing. In this use case we keep the
following properties which are “TTP Malware Type”, “Vulnerability”, “Threat
141
A Albakri Chapter 4
Table 43: UC3 Likelihood and total risk value for sub-dataset
This section presents the results of the data collection from a questionnaire, see
Section 4.8, conducted within the AIR4ICS [220] project.
We have asked the red team to evaluate an incident report created based on the
first event’s scenario. All Red team members had several years of Red teaming,
cyber incident response and cyber security experience. Eight experts filled out
the questionnaire and a summary of the data collected is presented in Table 44
and discussed in more detail below.
142
A Albakri Chapter 4
All the experts answered Q1 for sharing the CTI dataset with both public
sharing communities and trusted communities. All experts selected in detail the
possible associated threats of disclosing this dataset. Table 45 presents the threats,
and how many experts have selected that threat as a possible threat in case
of disclosing this CTI dataset. For example, most of participants agreed that
disclosing this dataset would be associated with “Social Engineering” and with
“Loss of reputation” threat. To reduce the effect of experts' subjectivity, we will
measure the level of agreement between all opinions besides our opinion. To
find the level of agreement between our selection and the experts' selection, we
compute the Fleiss' Kappa agreement score [213]. Kappa value evaluates the level
of experts' agreement. The perfect agreement is when the Kappa value is 1. On
the other hand, maximum disagreement value is 0. We find k= 0.417, which is
considered “moderate” agreement for data contains nine experts, including my
rating, to evaluate 16 threats.
The result indicates that the list we have proposed in Table 39 matches the experts’
selections in Table 45.
143
A Albakri Chapter 4
Table 46 presents the number of experts who decided which threats might be
associated with disclosing the CTI dataset when sharing with trusted entities. The
values have been changed significantly if cyber threat intelligence is shared with
trusted entities. Most of the possible threats have been decreased with total 57%
due to the increase of trust level among the sharing organisations. However, the
result still shows that the list we have proposed in Table 39 matches the experts'
selections in Table 46.
For question Q2, three experts indicated that we cannot share this dataset.
On the other hand, three experts indicated that we can share after applying the
selected security controls. This result indicates that sharing this dataset without
144
A Albakri Chapter 4
mitigation will be a high risk to CTI port. For question Q3.1, three experts se-
lected what are the properties that should be shared to achieve the sharing goal.
The properties include information such as TTP Malware Type, Vulnerability,
Threat Actor information, Indicator of Compromise and Incident Category. Most
of them agreed to remove Total loss, Victim information and Course of Action
properties which matches our sanitised dataset. Most of the experts suggested
anonymisation and aggregation as a security control that can be used for specific
properties such as reporter information, affected assets and course of action, but
none of them suggested encryption techniques. The reason could be that none of
them were familiar with encryption techniques such as homomorphic encryption
[49] [216] or Secure multiparty computation [217][59].
For question Q4.1, experts were asked to evaluate overall risk on a 1-5 scale, with
5 being the highest risk. Most of the experts' 75% indicated that the risks are
between 5 and 4, which constitute a high level of risk. On the other hand, after
applying the suggested controls, all experts suggested that the risk value would
be between 2 or 1 which constitutes a low risk level. However, when sharing the
CTI dataset with trusted entities, the overall value changed from a medium risk
level to a low risk level. Four experts stated that the risk value is 3, and three
stated that the risk value is 4, and after applying the security controls, all experts
stated that the risk value is 2 or 1.
As a result, the case study findings suggest that sharing this CTI dataset is pos-
sible after applying specific security controls, mainly by removing unrelated data.
From the questionnaire results we find out that our model reached a very high
match with the cybersecurity experts. All the threats we identified were also iden-
tified by the experts. Experts identified different controls to reduce the risk of
sharing and they agreed that sharing this dataset without applying these controls
is high risk.
145
A Albakri Chapter 4
4.7 Conclusion
In this chapter, we present a new quantitative risk model for sharing CTI datasets.
The main objective of this model is to develop a framework to support sharing
146
A Albakri Chapter 4
decisions regarding which information to share, and with whom. We have ex-
tended our previous work, in Chapter 3 we performed a comprehensive analysis of
incident reporting information through the STIX incident model to identify the
threats of disclosing sensitive and identifying information. Here we have identi-
fied the potential threats associated with sharing a CTI dataset, computed the
severity for each property, and we propose an estimation of the likelihood of the
threats in case of property disclosure. Finally, we have calculated the total risk
score of sharing a CTI dataset, and we addressed all risks associated with the
data which will be shared. Based on the risk value, the organisations can select
appropriate privacy preserving techniques to reduce the risk of sharing. In order
to evaluate the model, we have asked experts' opinions for risk identification and
evaluation for three different use cases.
Q1- Which of the following can be possible risk(s) of disclosing this incident in-
formation? Please choose all that apply.
147
A Albakri Chapter 4
Q3 -If they can share this dataset, what controls and modifications:
Q4- In your opinion, what is the risk level of sharing this dataset in the fol-
lowing two cases:
148
A Albakri Chapter 4
Q1- Which of the following can be possible risk(s) of disclosing this incident in-
formation? Please choose all that apply.
149
A Albakri Chapter 4
Q3 -If they can share this dataset, what controls and modifications:
Q4- In your opinion, what is the risk level of sharing this dataset in the fol-
lowing two cases:
150
Chapter 5
In this chapter1 , we consider how cyber intelligence sharing interacts with data
protection legislation. Specifically, we present a model for sharing cyber threat
intelligence under the GDPR. It is an approach for defining the required protection
level on cyber threat intelligence datasets, if they contain personal data, as defined
by the GDPR. Based on the GDPR rules, this approach would help to make the
decision of sharing and processing personal information clear. Moreover, it helps
to provide some practical and clear rules to build data sharing agreements between
organisations, because during the evaluation phase, we establish the purpose of
the sharing, the legal basis and security measures for compliance with the law.
This chapter has two main contributions. First, to provide a decision process
about sharing CTI datasets containing personal data in the context of the GDPR.
Second, to convert existing legal grounds into rules that help organisations share
1
This chapter is based on the conference paper “Sharing Cyber Threat Intelligence Under
the General Data Protection Regulation” [221]
151
A Albakri Chapter 5
such data whilst being legally compliant with the GDPR. These rules establish an
association between the CTI policy space and the defined protection levels. This
chapter is divided into the following sections. Section 5.1 describes the steps of the
methodology to build the approach. Section 5.2 gives several use cases of sharing
CTI datasets to validate our approach. Section 6 summarises this chapter.
5.1 Methodology
This section presents the methodology we used to build an approach to evalu-
ate the possibility of sharing personal data in the context of CTI datasets under
the GDPR. The methodology consists of three main steps and is inspired by the
DataTags project [222]. The first step is to define the possible levels of security
requirements which agree with the principles considered by the GDPR when pro-
cessing personal data in CTI datasets. The second step is to identify a policy
space, i.e. a set of concepts, definitions, assertions and rules around the GDPR
to describe the possible requirements for sharing CTI datasets. The last step is
to build the decision graph, which defines the sequence of questions that should
be traversed to establish and assess the legal requirements for CTI data sharing,
represented with an outcome as so-called “tags”. The DataTags project, developed
by Latanya Sweeney’s group at Harvard University, helps researchers and insti-
tutions to share their data with guarantees that releases of the data comply with
the associated policy, including American health and educational legislation [223].
It consists of labelling a dataset with a specific tag based on a series of questions.
Each question is created based on a set of assertions under the applicable policy.
152
A Albakri Chapter 5
The first step to achieving our goal is to define the tags that will be the possible
decisions reached after a series of questions that interrogate CTI datasets for
GDPR requirements. The legal requirements of the GDPR indicate in the first
instance whether we can share or not. However, when the answer is positive,
additional obligations for such sharing arise out of the principles and articles of
the GDPR, in particular: the principle of data minimisation; the requirement that
personal data must be processed securely; and that the data must not be retained
when no longer relevant. Hence, the decision process also leads to conclusions
on how sharing can take place by translating these constraints into technical
requirements. All of this is represented in the “data tags” of the leaves of our
decision graph. The organisations that are sharing CTI datasets should ensure
that the receiving organisation understands the sensitivity of this information and
receives clear instructions on what they are allowed to do with the information,
e.g. potential on-sharing. We will follow the Traffic Light Protocol (TLP) [43]
levels as a springboard, and expand them by adding security measures for each
level in order to address the GDPR requirements of processing personal data when
sharing CTI datasets. TLP was created to facilitate the sharing of information
by tagging the information with a specific color. TLP has four colors, indicating
different levels of acceptable distribution of data, namely [43]:
• WHITE - Unlimited.
153
A Albakri Chapter 5
This protocol records whether recipients may share this information with oth-
ers. TLP protocol is used by CSIRT communities, Information Analysis and
Sharing Centres (ISACS) and various industry sectors. This protocol is easy to
use by tagging the dataset with a specific colour. Organisations have common
understanding of these tags. That helps them to apply the TLP automatically
without complex trainings and documentations. TLP simplicity makes it suitable
for many real-world scenarios. However, it is not optimal for automated shar-
ing, and it does not cover complicated situations. For example, a cyber incident
report could be TLP: RED for all the receiving entities, except the sharer who
can change the information, thus TLP: AMBER would be practical for the sharer
[224]. We have extended this protocol by adding appropriate security measures
that are required for the legality of CTI sharing. To increase the trustworthiness
between the entities and to encourage entities to share CTI, we require the re-
ceiving organisation to apply these security measures whilst keeping in mind that,
in general, organisations use different approaches and levels of security practices.
However, enforcing the receiver to apply these security measures is a challenge
in itself and is beyond the scope of this thesis, similarly to the enforcement of
sticky policies as discussed in Section 2.3.5 . Table 48 shows the levels that we
are going to use in order to label the shared datasets. Cells in columns “Type”,
“Description”, and “Examples” are taken from the TLP description [225]. The
values in columns “Security Measures” and “Transfer/Storage” are our propos-
als to meet the legislative requirements for securely sharing this data. We have
proposed technical methods that would help organisations to achieve what the
GDPR mandates as a technical requirement to ensure confidentiality and protect
data subjects (Article 32). When proposing the security measures, we had to take
into consideration with whom we are going to share CTI datasets and their trust-
worthiness, because recipients who cannot be relied upon to protect the shared
information need to be eliminated from further sharing. We combine the notion of
154
A Albakri Chapter 5
privacy preservation of the data with the trust level of the recipient organisation,
and because of that, we recommend the use of the Attribute based Encryption
(ABE) technique [62] [63]. For encryption, ABE can use any combination of a set
of attributes as a public encryption key. Decryption privileges of the data in this
type of encryption are not restricted to a particular identity but to entities with
a set of attributes which may represent items such as business type and location.
For example, an organisation chooses to grant access to an encrypted log of its
internet traffic, but restricts this to a specific range of IP addresses. Traditional
encryption techniques would automatically disclose the log file in case the secret
decryption key is released. Table 47 lists example values of some attributes in the
data. The first attribute is the location of the organisation. Due to the different
legal systems associated with international transfer information exchange, we will
consider three levels: National, EU and International. The second attribute is
the sector of the organisation, because of the similarity of the working processes
and procedures and likely similar threat models. The value might contain energy,
health, education, finance and so on. Finally, the size of the organisation may
be relevant because the number of employees has been empirically related to the
number of threats [148]. To use ABE, before sharing the data with other organisa-
tions and in case it is not shared to the public, the Setup Key Authority generates
a master secret key along with a public key. It publishes the public key so ev-
eryone has access to it. The key authority uses the master secret key to generate
a specific secret key for the participating organisation in the sharing community.
For example, there might be an organisation called “Alpha” which gets a specific
secret key from the key generator authority. “Alpha” is an organisation operat-
ing at the national level in the telecom sector. Before sharing any dataset with
“Alpha”, the user will encrypt the dataset that has its own specific access policy.
Hence, this user encrypts the dataset such that anyone at the national level work-
ing with the telecom business will be able to decrypt it. The organisation sharing
155
A Albakri Chapter 5
Attribute Value
Location National, EU, Global
Central authority, similar business,
organisation sector/similarity of business
connected groups, . . .
organisation size Small, medium, big
CTI datasets generates ciphertext with this policy. As a result, the organisation
“Alpha” will be able to decrypt the dataset.
At all levels, Green, Amber and Red, data will be encrypted using the ABE
method. In addition, we need to consider the data minimisation principle as de-
fined in GDPR Art.5(1)(c) “1. Personal data shall be: (c) adequate, relevant
and limited to what is necessary in relation to the purposes for which they are
processed (data minimisation)”. Hence, sharing should be designed to provide
only the required data to successfully achieve a specific goal. This implies that we
should use the minimum amount of personal information to decrease any privacy
risk on individuals whose personal data might be included. This corresponds with
the approach in the case studies in Chapter 4, where we chose to share only the
essential information. Doing so will reduce the risks of the following potential
privacy attacks on the data:
Identity disclosure [226] [227]: this threat occurs when the attacker is able
to connect a data subject with their record in a CTI dataset. For example, an
attacker might identify a victim because the dataset contains direct identifying
information such as an email address, IP address or credential information.
Membership disclosure [228]: this threat occurs when an attacker can derive
that a specific data subject exists in the dataset. For example, the dataset con-
tains information about specific malware victims. Any person established to be
in the dataset reveals that this victim has been hacked by this malware.
Attribute disclosure [34]: this threat occurs when data subjects are linked with
information about their sensitive attributes such as biometric data that is used
156
A Albakri Chapter 5
Transfer
Type Description Examples Security Measures
/ Storage
Information does not contain Sharing public reports and
Anonymization (Identity
any personal data or notifications that give
WHITE disclosure, Membership Clear
sensitive information so it a better understanding
disclosure, Attribute disclosure).
can be shared publicly. of existing vulnerability.
Sharing cybersecurity
Information shared with information within Anonymization (Identity
community or a group of a close community. disclosure)
GREEN Encrypted
organisations but not shared For example, sharing email Attribute-Based Encryption
publicly. with malware link targeting (ABE)
specific sector.
Sharing cybersecurity
Share information with a specific information that contains Anonymization (Identity
organisation; sharing confined indicators of compromise, disclosure)
AMBER Encrypted
within the organisation to take course of action to Attribute-Based Encryption
effective action based on it. a specific community or sector (ABE)
e.g. financial sector.
Information exclusively and Sharing that you have been Attribute-Based Encryption
directly given to single attacked or notifying (ABE).
RED Encrypted
identified party. Sharing central authority Data minimisation to
outside is not legitimate. about an incident. share only relevant data.
We build the policy space of our model as a set of assertions using the context of
the CTI dataset. The evaluation of cases will be based on the defined assertions.
157
A Albakri Chapter 5
The assertions will contain the legal grounds under which personal data can be
processed, in this case for the purpose of ensuring network and information secu-
rity. For instance, assertions for sharing CTI information with other parties are
based on both the purpose of sharing which is “GDPR Recital 49 - ensuring net-
work and information security” such as the prevention of any access to the critical
system after credentials leaks, and the related legal basis which is “GDPR Art 6.1
(c) - processing is necessary for the purposes of the legitimate interests pursued by
the controller or by a third party”. These steps offer a clear, practical framework,
justifying the sharing of Cyber Threat Intelligence. The tagged data which meets
the rules based on applicable assertions will be derived from the decision graph. In
order to build the CTI policy space, we use a JSON file maintained by Computer
Incident Response Center Luxembourg CIRCL [225] for the related context of use
of data by CSIRTs. The goal of the file is to track processing personal information
activities and support automation. Many assertions refer to GDPR Art. 30 which
prescribes all the recordable details of processing activities. The main categories
of the assertions contain:
• Purpose: “The purpose of the processing. Ref GDPR Art. 30 (1) (b)”, for
example, “GDPR Recital 49 - the processing of personal data to the extent
strictly necessary and proportionate for the purposes of ensuring network
and information security”
• Personal data: “Personal data processed. Reference GDPR Art. 30 (1) (c).”,
for example, information extracted from computer and networking systems.
158
A Albakri Chapter 5
Based on the previous assertion list, we need to extract the relevant asser-
tion categories specifically related to CTI sharing. We will consider only those
assertions that are directly related to CTI sharing. In the GDPR the purpose of
processing personal data should be precise and for that the GDPR offers clear
recognition of “ensuring network and information security” GDPR Recital 49 as
the purpose of processing personal data for actors such as public authorities and
CSIRTs. The legal grounds for processing personal data are provided in GDPR
Art. 6 & 5 (a). CIRCL has published a discussion [229] of the legal grounds of in-
formation leak analysis and the GDPR context of collection, analysis and sharing
information leaks. The legal grounds relevant in our context are “processing is
necessary for the compliance with a legal obligation to which the controller is sub-
ject” where it applies to CSIRTs and data protection authorities and “processing is
necessary for the purposes of the legitimate interests pursued by the controller or
by a third party” otherwise. In the “legitimate interest” sharing CTI information
159
A Albakri Chapter 5
will enable organisations to better detect and prevent attacks by, for example,
identifying the IP address of a malware communications and control hub. We do
not consider “consent” GDPR Art. 6 (1) (a) a credible legal basis for processing
personal data in the context of sharing cyber threat Intelligence. This is because
it is very hard to get consent of data subjects especially when dealing with huge
amounts of data [229] (e.g. 1bn Yahoo accounts were compromised from a 2013
hack [230]) or when personal data such as IP addresses concerns the perpetrator
of a cyber-attack. Also, vital interest Art.6(1)(d) is not feasible to be used to
justify sharing and processing CTI, as there is no personal data in CTI datasets
which would relate to a threat to life. However, the public interest Art.6(1)(e)
would be the justification to process personal data in the case of acting under
specific authorization from an official authority to check that the cyber incident
could affect the public interest. The description of the personal data that pertains
directly to the GDPR is described in Art.30(1)(c). The conditions under which
personal data can be transferred to third countries or an international organisa-
tion are described in GDPR Art.30(1)(e). As a result, the CTI policy space is
described in Figure 12.
In this step, we propose an assessment based on the previous assertions. This as-
sessment contains a set of questions, and the answer to each question will lead to
different questions or a final decision and as a result, we will assign a specific tag
to the CTI dataset or even in some cases, the decision would be to not share. This
assessment is not definitive, but it gives a chance to reflect on our understanding
of sharing CTI datasets under the GDPR. Figure 13 shows the decision graph
for sharing CTI datasets under the GDPR. Some of the decisions in the graph
still require human judgement, so we make no claims of the process being fully
automatable. This judgement could be assisted by the Data Protection Officer
160
A Albakri Chapter 5
(DPO) whose main duties are ensuring compliance with the GDPR and providing
support regarding data protection (Article 37) (Recital 97). The GDPR requires
the appointment of a DPO in a public authority or organisations performing spe-
cific risky types of processing actions (Article 37) (Recital 97). The process first
establishes whether the proposed data sharing falls within the scope of the GDPR.
Then it establishes the legal basis for any special category data included. This is
likely to be rare in CTI datasets, but we could imagine biometric data following an
attack that included a physical breach. Next, it establishes the legal basis for the
overall processing. Then, it checks and selects appropriate retention and security
protections. We assume the “trust level” node’s value has been determined based
on previous knowledge of the trustworthiness of the entity that we are looking
to share with. The outcome matches one of the TLP tags as described in the
previous section. Of course, the CTI datasets are also likely to contain “sensitive”
information about the infected asset and the exploitable vulnerability that should
be protected as discussed in detail in Chapter 3. The outcome reflects concerns for
the data protection angle only; included information that is sensitive in a differ-
ent dimension might independently require strengthening of the security measures.
161
A Albakri Chapter 5
Retention Period Retention schedule/storage limitation (GDPR Art. 30(1)(f) and Art. 5(e))
Yes
Security
Measures
No
Employment details
- Political opinions
Personal Data - Sexual life
Personal details
- Religious or other beliefs of a similar
nature
Assertions Potentially including special - Biometric data
categories of data - Genetic data
- Criminal proceedings, outcomes and
sentences
Network information - Offences (including alleged offences)
- Physical or mental health or condition
162
A Albakri Chapter 5
third use case, an organisation reports a security breach to the central authority.
This case study consists of two organisations, A and C (Central Authority) where
an organisation A wants to report an incident to organisation C about a remote
access tool (RAT) used by different threat actors. Before sharing the information,
the reporter wants to be sure that sharing it is legitimate under the GDPR.
163
A Albakri Chapter 5
evaluation. The organisation A is the owner of this dataset and has the right to
process this information, hence in this scenario the organisation A is considered
the controller. Although the incident information contains personal data, it does
not contain any special category data, such as, biometrics or political opinion,
religious or philosophical beliefs, etc. In order to share this information with a
Computer Security Incident Reporting Team CSIRT or the central authority, the
reporter can rely on GDPR Art. 6 (1) (c) where the legal ground states “processing
is necessary for the compliance with a legal obligation to which the controller is
subject”. Organisation A has a retention policy in place. The security measures
that should be applied to reduce the risk of harm to data subjects before sharing
this dataset are: encrypted storage associated with a secure protocol to transmit
this information. Moreover, the data will be encrypted by using ABE techniques
with the properties (National, CA, Big) so as a result the final tag for this data
will be RED. Figure 14 shows a sample questionnaire covering this case study.
164
A Albakri Chapter 5
Figure 14: Use case 1 decision graph (following the bold lines)
this information with trusted company O2. Because the dataset contains personal
information, sharing needs to be legitimate under the GDPR. The dataset does not
contain any special category data so we can continue and check the purpose of this
sharing, which is GDPR Recital 49 – “ensuring network and information security”.
The reporter can rely on GDPR Art. 6(1)(f). The legal ground for sharing this
information is “processing is necessary for the purposes of the legitimate interests
pursued by the controller or by a third party”. Presumably there is a retention
policy in place. The security measures that will be associated before sharing this
dataset are: encrypted storage associated with a secure protocol to transmit this
information, anonymisation of reporter information against any identity disclosure
and the data will be encrypted by using ABE techniques associated with the
properties (EU, Energy sector, Medium).By applying these controls, the shared
165
A Albakri Chapter 5
CTI dataset satisfies the data minimisation rules. The trust level based on an
assumed external calculation is high so as a result the final tag for this data will
be AMBER. Figure 15 shows a sample questionnaire covering this case study.
Figure 15: Use case 2 decision graph (following the bold lines)
This scenario will cover reporting a security breach on organisation Alpha to the
central authority. The incident covers the following “Sensitive information be-
longing to jobseekers has been put at risk on the government's new Universal
Jobmatch website, it has been reported. The security flaw was uncovered during
a Channel 4 News investigation. Hackers were said to have been able to register
as an employer on the site which is accessed through the Gov.uk portal – another
166
A Albakri Chapter 5
Property Value
“Sensitive information belonging to jobseekers has been put at risk on the government's new Universal
Jobmatch website, it has been reported. The security flaw was uncovered during a Channel 4 News
investigation. Hackers were said to have been able to register as an employer other site which is accessed
Title
through the Gov.uk portal – another website that has just been launched by the government to deliver
morepublic services online. The hackers were reported to have obtained information including passwords
and passport and driving licence scans after posting a fake advert for a cleaner on Universal Jobmatch” [231]
Initial_Compromise 2012-01-01T00:00:00
Reporter Name Alex John
Reporter Affliliation LLC
Reporter Email [email protected]
Reporter Addresses GB-London
Victim Name Universal Jobmatch
Victim Addresses GB-London
Affected_Assets Web application
Property_Affected Confidentiality (Personal Information)
Impact_Qualification Painful
Leveraged_TTP Used Malware
Security_Compromise Yes
Discovery_Method Agent Disclosure (This incident was disclosed by the threat agent (private blackmail).
Threat_Actors DarkHydruz
DarkHydrus [232] is a threat group that has targeted multiple victims including government
Threat actor description authorities and educational institutions in the Middle East since at least 2016.
The group uses open-source tools and custom payloads for achieving successful attacks.
Threat actor Motivation Financial or Economic
website that has just been launched by the government to deliver more public
services online. The hackers were reported to have obtained information includ-
ing passwords and passport and driving licence scans after posting a fake advert
for a cleaner on Universal Job-match.” [231] We have updated the report and
completed the values of the STIX incident report. The new report contains per-
sonal information such as reporter name, email address and victim information.
In addition, it contains several sensitive properties such as the impact assessment
value is “Painful” which means that this incident has a real critical effect on the
business process. The victim information which is an official website will lead
to loss of reputation. The initial compromise that tells us when the attack has
been discovered and more forensic investigation will provide information on how
long the attack has existed in the attacked system. There was not any detailed
information about the threat actors other than the location and the motivation,
but this information may reveal extra information about the techniques that were
used and the targeting victims. Table 49 shows the sample of the cyber incident
report which contains the properties that the reporter wants to share.
167
A Albakri Chapter 5
5.3 Conclusion
In this chapter, we have presented an approach that can help different entities
to make a decision compliant with the GDPR when sharing CTI datasets. We
168
A Albakri Chapter 5
have suggested adequate privacy preserving methods that should be applied when
sharing CTI datasets. Then we have defined the policy space that related to the
CTI in the context of the GDPR and finally built the decision graph that checks
the legal requirements and provides a decision on how to share this information.
There are limitations to our approach. In complex use cases, the decisions in the
assessment graph may still be very demanding, such as whether the Recital 49
objective justifies any privacy impacts on the data subject. Furthermore, includ-
ing additional regulations or local policies besides the way they will interact with
the GDPR requirements would make the decision graph more complex. Addi-
tional legal and technical requirements might make the data tag collection harder
to structure and manage, as well as complicating the decision process. In chap-
ter 3, we have identified the associated threats of disclosing CTI. Here we have
169
A Albakri Chapter 5
specifically addressed the legal risks associated with sharing CTI datasets. Our
overall work aims to mitigate all threats associated with sharing CTI datasets and
improve the sharing process.
170
Chapter 6
Conclusion
Sharing cyber threat intelligence may help organisations to better protect them-
selves against future cyber attacks. However, disclosing of organisation's threat
information may increase the risks for the organisation. This process entails risks
in various aspects, such as privacy, technical, legal, business, reputation, and or-
ganisational aspects. These risks can be evaluated and assessed by providing the
right risk model. Cyber threat intelligence enables organisations to continuously
monitor and support their business and strategic goals by providing insights re-
garding existing threat actors and perpetrators trying to target their business.
However, sharing such information should be evaluated and assessed to enhance
and stimulate cyber threat intelligence sharing, while mitigating the potential ad-
verse effects. Besides, sharing cyber threat intelligence among industry members
and governments poses a legal challenge. Thus, it is necessary to provide a model
that can help organisations to share cyber threat intelligence and stay compliant
with the law.
This chapter presents a thorough discussion of the conclusions of our research, re-
states the contribution, and identifies issues and opportunities for future research.
171
A Albakri Chapter 6
2. This research provides a means to apply risk assessment to the cyber threat
intelligence sharing process. It presents a methodology for evaluating the
risks of sharing threat intelligence based on quantitive assessments of the
properties in the dataset before sharing. It extends the first contribution, so
that after it identifies the potential threats associated with sharing a CTI
dataset and compute the severity for each property, it proposes an estima-
tion of the likelihood of the threats in case of property disclosure. Finally,
it computes the total risk score of sharing a CTI dataset. Based on the risk
value, the organisations can select appropriate privacy-preserving techniques
to mitigate sharing risk. During the creation of the risk model, the method-
ology was tested on an open-source dataset and multiple use cases. Then, it
empirically evaluated the risk model by using experts' opinion. Three teams
172
A Albakri Chapter 6
3. This research supports the effort to progress cyber threat intelligence sharing
by presenting an approach that takes into account the legal dimension. It
has suggested adequate techniques for protecting the privacy of data subjects
in relation to cyber threat intelligence datasets under the GDPR. Then it
defines a policy space as a set of assertions. These assertions consist of the
legal grounds under which personal data can be processed under the GDPR.
Finally, it builds the model as a decision graph based on the identified
assertions, and the final decision will be assigning a specific tag to encode
the right level of handling and sharing the cyber threat intelligence dataset
(objective 3 and objective 4).
173
A Albakri Chapter 6
content to see whether the property is essentially absent. Further studies are
required to refine this analysis using natural language processing techniques
to assess sharing CTI dataset risks and build the right sharing decision.
• Organisations have different cyber risk profiles [234] based on sectors, op-
eration standards, needs and regulations. Therefore it is unlikely that a
single approach for sharing cyber threat information fits all organisations
and governments. For example, there are vast numbers of cyber attacks
against the banking sector. Therefore, information sharing platforms and
methodologies should be designed to consider sector specific requirements.
174
A Albakri Chapter 6
175
Bibliography
[2] Aharon Chernin Sean Barnum, Desiree Beck and Rich Piazza. STIX Version
1.2.1. Part 1: Overview. Mitre Corporation, 2016.
[3] Kaitlin R Boeckl and Naomi B Lefkovitz. NIST Privacy Framework: A Tool
for Improving Privacy Through Enterprise Risk Management, Version 1.0.
2020.
[5] UNCTAD. Digital economy report 2019: Value creation and capture–
implications for developing countries. Naciones Unidas Ginebra,
https://unctad.org/system/files/official-document/der2019_en.pdf, 2019.
176
A Albakri Chapter 6
[11] Henry Dalziel. How to define and build an effective cyber threat intelligence
capability. Syngress, 2014.
177
A Albakri Chapter 6
[17] Marie Vasek, Matthew Weeden, and Tyler Moore. Measuring the Impact
of Sharing Abuse Data with Web Hosting Providers. In Proceedings of the
2016 ACM on Workshop on Information Sharing and Collaborative Secu-
rity, WISCS ’16, page 71–80, New York, NY, USA, 2016. Association for
Computing Machinery.
[20] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Cali-
brating noise to sensitivity in private data analysis. In Shai Halevi and Tal
Rabin, editors, Theory of Cryptography, pages 265–284, Berlin, Heidelberg,
2006. Springer.
[21] Sabyasachi Mitra and Sam Ransbotham. Information Disclosure and the
Diffusion of Information Security Attacks. Information Systems Research,
26(3):565–584, September 2015.
[23] Samuel D Warren and Louis D Brandeis. The right to privacy. Harvard Law
Review, pages 193–220, 1890.
[24] Alan F Westin. Privacy and freedom. Washington and Lee Law Review,
25(1):166, 1968.
178
A Albakri Chapter 6
[26] Alissa Cooper, Hannes Tschofenig, Bernard Aboba, Jon Peterson, J Morris,
Marit Hansen, and Rhys Smith. Privacy considerations for internet proto-
cols. Internet Architecture Board, 2013.
[29] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differ-
ential privacy. Foundations and Trends in Theoretical Computer Science,
9(3-4):211–407, 2014.
[30] Irit Dinur and Kobbi Nissim. Revealing information while preserving pri-
vacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART
symposium on Principles of database systems, pages 202–210, 2003.
[31] Michael Barbaro, Tom Zeller, and Saul Hansell. A face is exposed for aol
searcher no. 4417749. New York Times, 9(2008):8, 2006.
179
A Albakri Chapter 6
[36] Adam J Slagell, Kiran Lakkaraju, and Katherine Luo. FLAIM: A Multi-
level Anonymization Framework for Computer and Network Logs. In Large
Installation System Administration Conference(LISA), volume 6, pages 3–8,
2006.
[37] Adam Slagell and William Yurcik. Sharing computer network logs for se-
curity and privacy: A motivation for new methodologies of anonymization.
In Workshop of the 1st International Conference on Security and Privacy
for Emerging Areas in Communication Networks, 2005, pages 80–89. IEEE,
2005.
[40] Roberto J Bayardo and Rakesh Agrawal. Data privacy through optimal
k-anonymization. In 21st International conference on data engineering
(ICDE’05), pages 217–228. IEEE, 2005.
[41] Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, and Ada Wai-Chee
Fu. Utility-based anonymization using local recoding. In Proceedings of the
180
A Albakri Chapter 6
[42] Thomas D Wagner, Esther Palomar, Khaled Mahbub, and Ali E Abdallah.
Towards an anonymity supported platform for shared cyber threat intelli-
gence. In International Conference on Risks and Security of Internet and
Systems, pages 175–183. Springer, 2017.
[46] Sajal Kanti Das, Shrikant Kumar Gupta, and Mohammad Kauser. Micro
aggregation Through DBSCAN for PPDM: Privacy-Preserving Data Min-
ing. International Journal of Advance Research in Science and Engineering-
IJARSE, 1(2):15–21, 2011.
181
A Albakri Chapter 6
[49] Craig Gentry. Fully homomorphic encryption using ideal lattices. In Pro-
ceedings of the forty-first annual ACM symposium on Theory of computing,
pages 169–178, 2009.
[50] Steven Y Ko, Kyungho Jeon, and Ramsés Morales. The hybrex model for
confidentiality and privacy in cloud computing. HotCloud, 11:8–8, 2011.
[52] Michael Naehrig, Kristin Lauter, and Vinod Vaikuntanathan. Can homo-
morphic encryption be practical? In Proceedings of the 3rd ACM workshop
on Cloud computing security workshop, pages 113–124, 2011.
[56] Ali Nakhaei Amroudi, Ali Zaghain, and Mahdi Sajadieh. A verifiable (k,
n, m)-threshold multi-secret sharing scheme based on NTRU cryptosystem.
Wireless Personal Communications, 96(1):1393–1405, 2017.
[57] David Chaum, Claude Crépeau, and Ivan Damgard. Multiparty uncon-
ditionally secure protocols. In Proceedings of the twentieth annual ACM
symposium on Theory of computing, pages 11–19, 1988.
182
A Albakri Chapter 6
[58] Peter Bogetoft, Dan Lund Christensen, Ivan Damgard, Martin Geisler,
Thomas Jakobsen, Mikkel Krøigaard, Janus Dam Nielsen, Jesper Buus
Nielsen, Kurt Nielsen, Jakob Pagter, et al. Secure multiparty computa-
tion goes live. In International Conference on Financial Cryptography and
Data Security, pages 325–343. Springer, 2009.
[59] Dan Bogdanov, Riivo Talviste, and Jan Willemson. Deploying secure multi-
party computation for financial data analysis. In International Conference
on Financial Cryptography and Data Security, pages 57–64. Springer, 2012.
[60] Abid Mehmood, Iynkaran Natgunanathan, Yong Xiang, Guang Hua, and
Song Guo. Protection of big data privacy. IEEE Access, 4:1821–1834, 2016.
[61] Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan, and Yong Ren. Information
security in big data: privacy and data mining. IEEE Access, 2:1149–1176,
2014.
[62] Vipul Goyal, Omkant Pandey, Amit Sahai, and Brent Waters. Attribute-
based encryption for fine-grained access control of encrypted data. In Pro-
ceedings of the 13th ACM conference on Computer and communications
security, pages 89–98, 2006.
[65] Hongbing Cheng, Chunming Rong, Kai Hwang, Weihong Wang, and Yanyan
Li. Secure big data storage and sharing scheme for cloud tenants. China
Communications, 12(6):106–115, 2015.
183
A Albakri Chapter 6
[66] Lea Kissner and Dawn Song. Privacy-preserving set operations. In Advances
in Cryptology – CRYPTO 2005, pages 241–257, Berlin, Heidelberg, 2005.
Springer Berlin Heidelberg.
[67] Siani Pearson and Marco Casassa-Mont. Sticky policies: An approach for
managing privacy across multiple parties. Computer, 44(9):60–68, 2011.
[69] Fabio Martinelli, Andrea Saracino, and Mina Sheikhalishahi. Modeling pri-
vacy aware information sharing systems: A formal and general approach. In
2016 IEEE Trustcom/BigDataSE/ISPA, pages 767–774. IEEE, 2016.
[70] Oleksii Osliak, Andrea Saracino, and Fabio Martinelli. A scheme for the
sticky policy representation supporting secure cyber-threat intelligence anal-
ysis and sharing. Information and Computer Security, 27(5):687–710, 2019.
[74] Lawrence A Gordon, Martin P Loeb, William Lucyshyn, and Lei Zhou. The
impact of information sharing on cybersecurity underinvestment: A real
options perspective. Journal of Accounting and Public Policy, 34(5):509–
519, 2015.
184
A Albakri Chapter 6
[76] Alexandre Dulaunoy, Gérard Wagener, Marc Stiefer, and Cynthia Wagner.
The void–an interesting place for network security monitoring. In Proceed-
ings of the 30th TERENA networking conference (TNC’14). Dublin, Ireland.
Citeseer, 2014.
[77] Sami Mokaddem, Gérard Wagener, and Alexandre Dulaunoy. AIL-The de-
sign and implementation of an Analysis Information Leak framework. In
2018 IEEE International Conference on Big Data (Big Data), pages 5049–
5057. IEEE, 2018.
[79] Aaron Boyd. How FBI Cyber Division helps agencies investigate intru-
sions. https://www.federaltimes.com/enterprise-view/2015/10/30/how-fbi-
cyber-division-helps-agencies-investigate-intrusions, 2015.
[81] Tom Bergin and Jim Finkle. Exclusive: Swift confirms new cyber
thefts, hacking tactics. https://www.reuters.com/article/uk-usa-cyber-swift-
idUKKBN1412NU?edition-redirect=uk, December, 12, 2016.
185
A Albakri Chapter 6
[83] Tom Bergin and Nathan Layne. Special report: Cyber thieves exploit
banks’ faith in swift transfer network. https://es.reuters.com/article/us-
cyber-heist-swift-specialreport/special-report-cyber-thieves-exploit-banks-
faith-in-swift-transfer-network-idUSKCN0YB0DD, 2016.
[86] Robert M Lee. Intelligence defined and its impact on cyber threat intel-
ligence. https://www.robertmlee.org/intelligence-defined-and-its-impact-on-
cyber-threat-intelligence, 2, 2016.
186
A Albakri Chapter 6
[92] Christopher Johnson, Mark Badger, David Waltermire, Julie Snyder, and
Clem Skorupka. Guide to cyber threat information sharing. Technical
report, https://www.nist.gov/publications/guide-cyber-threat-information-
sharing, National Institute of Standards and Technology, 2016.
[96] Frank Fransen and Richard Kerkdijk. Cyber threat intelligence sharing
through national and sector-oriented communities. In Collaborative Cyber
Threat Intelligence, pages 187–224. Auerbach Publications, 2017.
[97] Greg Farnham and Kees Leune. Tools and standards for cy-
ber threat intelligence projects. https://www.sans.org/reading-
room/whitepapers/warfare/tools-standards-cyber-threat-intelligence-
projects-34375, SANS Institute, 2013.
187
A Albakri Chapter 6
[106] Mike Goffin. CRITs: Collaborative Research Into Threats. The MITRE
Corporation, http://crits.github.io, 2014.
188
A Albakri Chapter 6
[117] Christian Sillaber, Clemens Sauerwein, Andrea Mussmann, and Ruth Breu.
Data quality challenges and future research directions in threat intelligence
sharing practice. In Proceedings of the 2016 ACM on Workshop on Infor-
mation Sharing and Collaborative Security, pages 65–70, 2016.
[118] Sarah Brown, Joep Gommers, and Oscar Serrano. From cyber security
information sharing to threat management. In Proceedings of the 2nd ACM
workshop on information sharing and collaborative security, pages 43–49,
2015.
[120] Sean Barnum. Standardizing cyber threat intelligence information with the
Structured Threat Information eXpression (STIX). Mitre Corporation, 11:1–
22, 2012.
[121] Julie Connolly, Mark Davidson, and Charles Schmidt. Trusted Automated
eXchange of Indicator Information (TAXII). The MITRE Corporation,
pages 1–20, 2014.
189
A Albakri Chapter 6
[123] Desiree Beck, Ivan Kirillov, and Rich Piazza. STIX ™ Version 2.0. Part 2:
STIX Objects. Mitre Corporation, (June):1–58, 2017.
[126] VERIZON. The Vocabulary for Event Recording and Incident Sharing
(VERIS). http://veriscommunity.net/, 2016.
[131] Ross Anderson and Tyler Moore. The Economics of Information Security.
American Association for the Advancement of Science, 314(5799):610–613,
2006.
[132] Boris Petrenj, Emanuele Lettieri, and Paolo Trucco. Information sharing
and collaboration for critical infrastructure resilience–a comprehensive re-
view on barriers and emerging capabilities. International journal of critical
infrastructures, 9(4):304–329, 2013.
[133] Eric Luiijf and Marieke Klaver. On the sharing of cyber security information.
In Mason Rice and Sujeet Shenoi, editors, Critical Infrastructure Protection
IX, pages 29–46, Cham, 2015. Springer International Publishing.
190
A Albakri Chapter 6
[134] Tomas Sander and Joshua Hailpern. Ux aspects of threat information shar-
ing platforms: An examination & lessons learned using personas. In Proceed-
ings of the 2nd ACM Workshop on Information Sharing and Collaborative
Security, pages 51–59, 2015.
[135] David Mann, J Brooks, and Joe DeRosa. The Relationship between
Human and Machine-Oriented Standards and the Impact to Enter-
prise Systems Engineering. The MITRE Corporation, Bedford, MA,
https://www.mitre.org/sites/default/files/pdf/102 335.pdf, 2011.
[137] Ioannis Agrafiotis, Jason R C Nurse, Michael Goldsmith, Sadie Creese, and
David Upton. A taxonomy of cyber-harms: Defining the impacts of cyber-
attacks and understanding how they propagate. Journal of Cybersecurity,
4(1), 10 2018.
[140] James L Cebula and Lisa R Young. A taxonomy of operational cyber secu-
rity risks. Technical report, Carnegie-Mellon Univ Pittsburgh Pa Software
Engineering Inst, 2010.
[141] James J Cebula, Mary E Popeck, and Lisa R Young. A taxonomy of op-
erational cyber security risks version 2. Technical report, Carnegie-Mellon
Univ Pittsburgh Pa Software Engineering Inst, 2014.
191
A Albakri Chapter 6
[145] Louis Marinos. ENISA Threat Taxonomy: A tool for structuring threat
information. ENISA, Heraklion, 2016.
192
A Albakri Chapter 6
natural persons with regard to the processing of personal data and on the
free movement of such data, and repealing Directive 95/46/. Official Journal
of the European Union (OJ), 59(May):1–88, 2016.
[150] Laurence Kalman. The GDPR and NIS Directive A new age of accountabil-
ity, security and trust. https://owasp.org/www-chapter-cambridge, 2017.
[151] ENISA. European Union Agency for Network and Information Security.
https://www.enisa.europa.eu, 2004.
[152] ENISA. European union agency for network and information security.
URL: https://www. enisa. europa. eu/topics/cyber-exercises/cyber-europe-
programme (accessed 01/12/2020), 2004.
[153] Department for Digital Culture Media Sport The Rt Hon Oliver Dowden
CBE MP and Matt Warman MP. Post-Implementation Review of the Net-
work and Information Systems Regulations 2018. CP 242(May), 2020.
193
A Albakri Chapter 6
[157] Ronald S Ross. Guide for conducting risk assessments (NIST sp-800-30rev1).
The National Institute of Standards and Technology (NIST), Gaithersburg,
2012.
[158] Christopher Alberts, Audrey Dorofee, James Stevens, and Carol Woody.
Introduction to the OCTAVE Approach. Technical report, Carnegie-Mellon
Univ Pittsburgh Pa Software Engineering Inst, 2003.
[159] Isabel Wagner and Eerke Boiten. Privacy Risk Assessment: From Art
to Science, by Metrics. Data Privacy Management, Cryptocurrencies and
Blockchain Technology, page 225–241, 2018.
[160] Michael Howard and Steve Lipner. The Security Development Lifecycle:
SDL: A Process for Developing Demonstrably More Secure Software. Mi-
crosoft Press, page 352, 2006.
[161] Kim Wuyts and Wouter Joosen. LINDDUN privacy threat modeling: a
tutorial. CW Reports, 2015.
[162] Jaspreet Bhatia, Travis D Breaux, Liora Friedberg, Hanan Hibshi, and
Daniel Smullen. Privacy risk in cybersecurity data sharing. In Proceedings
of the 2016 ACM on Workshop on Information Sharing and Collaborative
Security, pages 57–64, 2016.
[163] Gina Fisk, Calvin Ardi, Neale Pickett, John Heidemann, Mike Fisk, and
Christos Papadopoulos. Privacy principles for sharing cyber security data.
In 2015 IEEE Security and Privacy Workshops, pages 193–197. IEEE, 2015.
194
A Albakri Chapter 6
[166] Riyana Lewis, Panos Louvieris, Pamela Abbott, Natalie Clewley, and Kevin
Jones. Cybersecurity information sharing: a framework for information
security management in UK SME supply chains. 2014.
[167] Jinsoo Shin, Hanseong Son, and Gyunyoung Heo. Cyber security risk eval-
uation of a nuclear I&C using BN and ET. Nuclear Engineering and Tech-
nology, 49(3):517–524, 2017.
[168] M Ugur Aksu, M Hadi Dilek, E İslam Tatlı, Kemal Bicakci, H Ibrahim Dirik,
M Umut Demirezen, and Tayfun Aykır. A quantitative CVSS-based cyber
security risk assessment methodology for IT systems. In 2017 International
Carnahan Conference on Security Technology (ICCST), pages 1–8. IEEE,
2017.
[169] Tawei Wang, Karthik N Kannan, and Jackie Rees Ulmer. The association
between the disclosure and the realization of information security risk fac-
tors. Information Systems Research, 24(2):201–218, 2013.
[170] Valentina Viduto, Carsten Maple, Wei Huang, and David LóPez-PeréZ.
A novel risk assessment and optimisation model for a multi-objective net-
work security countermeasure selection problem. Decision Support Systems,
53(3):599–610, 2012.
[171] Daniel M Best, Jaspreet Bhatia, Elena S Peterson, and Travis D Breaux. Im-
proved cyber threat indicator sharing by scoring privacy risk. In 2017 IEEE
International Symposium on Technologies for Homeland Security (HST),
pages 1–5. IEEE, 2017.
195
A Albakri Chapter 6
[172] Kaniz Fatema, David W Chadwick, and Brendan Van Alsenoy. Extracting
access control and conflict resolution policies from european data protec-
tion law. In IFIP PrimeLife International Summer School on Privacy and
Identity Management for Life, pages 59–72. Springer, 2011.
[174] Peter Doorn and Emily Thomas. Tagging Privacy-Sensitive Data Ac-
cording to the New European Privacy Legislation: GDPR DataTags -
a Prototype. https://dans.knaw.nl/en/current/first-gdpr-datatags-results-
presented-in-workshop, 2017.
[176] Travis D Breaux and Annie I Antón. A systematic method for acquiring
regulatory requirements: A frame-based approach. RHAS-6, Delhi, India,
2007.
[177] Travis Breaux and Annie Antón. Analyzing regulatory rules for privacy
and security requirements. IEEE Transactions on Software Engineering,
34(1):5–20, 2008.
[179] Clare Sullivan and Eric Burger. “in the public interest”: The privacy impli-
cations of international business-to-business sharing of cyber-threat intelli-
gence. Computer Law and Security Review, 33(1):14–29, 2017.
196
A Albakri Chapter 6
[180] Adham Albakri, Eerke Boiten, and Rogério De Lemos. Risks of sharing cyber
incident information. In Proceedings of the 13th International Conference
on Availability, Reliability and Security, ARES 2018, Hamburg, Germany,
2018. Association for Computing Machinery.
[181] Ken Naganuma, Masayuki Yoshino, Hisayoshi Sato, and Yoshinori Sato.
Privacy-preserving analysis technique for secure, cloud-based big data ana-
lytics. Hitachi Rev, 63(9):577–583, 2014.
[183] Defense Use Case. Analysis of the cyber attack on the Ukrainian power grid.
Electricity Information Sharing and Analysis Center (E-ISAC), 388, 2016.
[185] Ian Ahl. Privileges and Credentials: Phished at the Request of Coun-
sel. https://www.fireeye.com/blog/threat-research/2017/06/phished-at-the-
request-of-counsel.html, 2017.
197
A Albakri Chapter 6
[189] Hak5. Stealing Files with the USB Rubber Ducky – USB Exfiltration Ex-
plained. https://www.hak5.org/blog/main-blog/stealing-files-with-the-usb-
rubber-ducky-usb-exfiltration-explained, 2005.
[190] Robert Mcmillan. The Pwn Plug is a little white box that can hack your net-
work. https://arstechnica.com/information-technology/2012/03/the-pwn-
plug-is-a-little-white-box-that-can-hack-your-network/, 2012.
[191] Robert Falcone Bryan Lee, Mike Harbison. Sofacy Attacks Multiple Govern-
ment Entities. https://unit42.paloaltonetworks.com/unit42-sofacy-attacks-
multiple-government-entities, 2018.
[193] Tomáš Foltýn. OceanLotus ships new backdoor using old tricks.
https://www.welivesecurity.com/2018/03/13/oceanlotus-ships-new-
backdoor, 2018.
[195] Edmund Brumaghin, Ross Gibb, Warren Mercer, Matthew Molyett, and
Craig Williams. Talos Blog || Cisco Talos Intelligence Group - Com-
prehensive Threat Intelligence: CCleanup: A Vast Number of Ma-
chines at Risk. https://blog.talosintelligence.com/2017/09/avast-distributes-
malware.html, 2017.
198
A Albakri Chapter 6
[196] Gavin O’Gorman and Geoff McDonald. The Elderwood Project. Syman-
tec Corporation, https://www.infopoint-security.de/medien/the-elderwood-
project.pdf, 2012.
[201] Stephane M Meystre, F Jeffrey Friedlin, Brett R South, Shuying Shen, and
Matthew H Samore. Automatic de-identification of textual documents in the
electronic health record: a review of recent research. BMC medical research
methodology, 10(1):70, 2010.
199
A Albakri Chapter 6
[207] Adham Albakri, Eerke Boiten, and Richard Smith. Risk Assessment of
Sharing Cyber Threat Intelligence. In European Symposium on Research in
Computer Security, pages 92–113. Springer, 2020.
[209] Douglas Hubbard and Dylan Evans. Problems with scoring methods and
ordinal scales in risk assessment. IBM Journal of Research and Development,
54(3):2–1, 2010.
[210] Rebecca M. Blank and Acting Secretary. Guide for Conducting Risk As-
sessments, 2011.
[212] Ronen Feldman, James Sanger, et al. The text mining handbook: advanced
approaches in analyzing unstructured data. Cambridge university press,
2007.
[213] Joseph L Fleiss. Measuring nominal scale agreement among many raters.
Psychological bulletin, 76(5):378, 1971.
200
A Albakri Chapter 6
[215] Alexander D Kent and Lorie M Liebrock. Secure communication via shared
knowledge and a salted hash in ad-hoc environments. In 2011 IEEE 35th
Annual Computer Software and Applications Conference Workshops, pages
122–127. IEEE, 2011.
[216] Frederik Armknecht, Colin Boyd, Christopher Carr, Kristian Gjøsteen, An-
gela Jäschke, Christian A Reuter, and Martin Strand. A guide to fully
homomorphic encryption. IACR Cryptol. ePrint Arch., 2015:1192, 2015.
[217] Andrew C Yao. Protocols for secure computations. In 23rd annual sympo-
sium on foundations of computer science (sfcs 1982), pages 160–164. IEEE,
1982.
201
A Albakri Chapter 6
[221] Adham Albakri, Eerke Boiten, and Rogério De Lemos. Sharing Cyber
Threat Intelligence Under the General Data Protection Regulation. In Pri-
vacy Technologies and Policy, pages 28–41, Cham, 2019. Springer Interna-
tional Publishing.
[222] Michael Bar-Sinai, Latanya Sweeney, and Merce Crosas. Datatags, data
handling policy spaces and the tags language. In 2016 IEEE Security and
Privacy Workshops (SPW), pages 1–8. IEEE, 2016.
[223] Latanya Sweeney, Mercè Crosas, and Michael Bar-Sinai. Sharing sensitive
data with confidence: The datatags system. Technology Science, 2015.
[227] Xiaokui Xiao and Yufei Tao. Personalized privacy preservation. In Proceed-
ings of the 2006 ACM SIGMOD international conference on Management
of data, pages 229–240, 2006.
[228] Mehmet Ercan Nergiz, Maurizio Atzori, and Chris Clifton. Hiding the pres-
ence of individuals from shared databases. In Proceedings of the 2007 ACM
SIGMOD international conference on Management of data, pages 665–676,
2007.
202
A Albakri Chapter 6
[229] CIRCL. AIL information leaks analysis and the GDPR in the
context of collection, analysis and sharing information leaks.
https://www.circl.lu/assets/files/information-leaks-analysis-and-gdpr.pdf,
2018.
[230] Sam Thielman. Yahoo hack: 1bn accounts compromised by biggest data
breach in history. The Guardian, 15:2016, 2016.
[233] Fenia Ferra, Isabel Wagner, Eerke Boiten, Lee Hadlington, Ismini Psy-
choula, and Richard Snape. Challenges in assessing privacy impact: Tales
from the front lines. Security and Privacy, 3(2):e101, 2020.
[234] Denise E Zheng and James A Lewis. Cyber threat information sharing.
Center for Strategic and International Studies, 2015.
203