0% found this document useful (0 votes)
57 views14 pages

Using Virtual Machine Allocation Policies To Defend Against Co-Resident Attacks in Cloud Computing

This paper addresses the security threat of co-resident attacks in cloud computing, where attackers exploit side channels to extract sensitive information from virtual machines (VMs) located on the same server. The authors propose a new virtual machine allocation policy that enhances security by reducing the likelihood of co-location while also considering workload balance and power consumption. The effectiveness of this policy is demonstrated through implementation and testing on the OpenStack platform and simulations on CloudSim.

Uploaded by

Kamran khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views14 pages

Using Virtual Machine Allocation Policies To Defend Against Co-Resident Attacks in Cloud Computing

This paper addresses the security threat of co-resident attacks in cloud computing, where attackers exploit side channels to extract sensitive information from virtual machines (VMs) located on the same server. The authors propose a new virtual machine allocation policy that enhances security by reducing the likelihood of co-location while also considering workload balance and power consumption. The effectiveness of this policy is demonstrated through implementation and testing on the OpenStack platform and simulations on CloudSim.

Uploaded by

Kamran khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO.

1, JANUARY/FEBRUARY 2017 95

Using Virtual Machine Allocation Policies


to Defend against Co-Resident
Attacks in Cloud Computing
Yi Han, Jeffrey Chan, Tansu Alpcan, and Christopher Leckie

Abstract—Cloud computing enables users to consume various IT resources in an on-demand manner, and with low management
overhead. However, customers can face new security risks when they use cloud computing platforms. In this paper, we focus on one
such threat—the co-resident attack, where malicious users build side channels and extract private information from virtual machines
co-located on the same server. Previous works mainly attempt to address the problem by eliminating side channels. However, most of
these methods are not suitable for immediate deployment due to the required modifications to current cloud platforms. We choose to
solve the problem from a different perspective, by studying how to improve the virtual machine allocation policy, so that it is difficult for
attackers to co-locate with their targets. Specifically, we (1) define security metrics for assessing the attack; (2) model these metrics,
and compare the difficulty of achieving co-residence under three commonly used policies; (3) design a new policy that not only
mitigates the threat of attack, but also satisfies the requirements for workload balance and low power consumption; and (4) implement,
test, and prove the effectiveness of the policy on the popular open-source platform OpenStack.

Index Terms—Cloud computing security, co-resident attack, virtual machine allocation policy, security metrics modelling

1 INTRODUCTION

S ECURITY is one of the major concerns against cloud com-


puting. From the customer’s perspective, migrating to
the cloud means they are exposed to the additional risks
e.g., workloads and web traffic rates [1], to the fine-grained,
e.g., cryptographic keys [2]. For clever attackers, even seem-
ingly innocuous information like workload statistics can be
brought about by the other tenants with whom they share useful. For example, such data can be used to identify when
the resources—are these neighbours trustworthy, or they the system is most vulnerable, i.e., the time to launch further
may compromise the integrity of others? This paper concen- attacks, such as Denial-of-Service attacks.
trates on one form of this security problem: the co-resident A straightforward solution to this novel attack is to elimi-
attack (also known as co-location, co-residence, or co-resi- nate the side channels, which has been the focus of most pre-
dency attack). vious works [3], [4], [5], [6]. However, most of these methods
Virtual machines (VM) are a commonly used resource in are not suitable for immediate deployment due to the
cloud computing environments. For cloud providers, VMs required modifications to current cloud platforms. In our
help increase the utilisation rate of the underlying hardware work, we approach this problem from a completely different
platforms. For cloud customers, it enables on-demand perspective. Before the attacker is able to extract any private
resource scaling, and outsources the maintenance of com- information from the victim, they first need to co-locate their
puting resources. However, apart from all these benefits, it VMs with the target VMs. It has been shown that the attacker
also brings a new security threat [1]. In theory, VMs running can achieve an efficiency rate of as high as 40 percent [1],
on the same physical server (i.e., co-resident VMs) are logi- which means four out of 10 attacker’s VMs can co-locate
cally isolated from each other. In practice, nevertheless, with the target. This motivates us to study how to effectively
malicious users can build various side channels to circum- minimise this value. From a cloud provider’s point of view,
vent the logical isolation, and obtain sensitive information the VM allocation policy (also known as VM placement—we
from co-resident VMs, ranging from the coarse-grained, use these two terms interchangeably in this paper) is the
most important and direct control that can be used to influ-
 Y. Han and C. Leckie are with the Department of Computing and Informa-
ence the probability of co-location. Consequently, we aim to
tion Systems, The University of Melbourne, Melbourne, Vic. 3010, design a secure policy that can substantially increase the dif-
Australia. E-mail: [email protected], [email protected]. ficulty for attackers to achieve co-residence.
 J. Chan is with the School of Computer Science and Information In our earlier work [7], we have proposed a prototype of
Technology, RMIT University, Melbourne, Vic. 3000, Australia.
E-mail: [email protected]. such a secure policy, called the previous-selected-server-first
 T. Alpcan is with the Department of Electrical and Electronic Engineering, policy (PSSF). However, this prototype policy only focuses
The University of Melbourne, Melbourne, Vic. 3010, Australia. on the problem of security, and hence has obvious limita-
E-mail: [email protected].
tions in terms of:
Manuscript received 16 Jan. 2015; revised 2 Apr. 2015; accepted 25 Apr. 2015.
Date of publication 4 May 2015; date of current version 18 Jan. 2017. 1. Workload balance—Workload here refers to the VM
For information on obtaining reprints of this article, please send e-mail to: requests. From the cloud provider’s point of view,
[email protected], and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TDSC.2015.2429132 spreading VMs among the servers that have already
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
1545-5971 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
96 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

been switched on can help reduce the probability of targets on the same physical servers. Second, after co-resi-
servers being over-utilised, which may cause SLA dence is achieved, the attacker will construct different types
(service level agreement) breaches. From the custom- of side channels to obtain sensitive information from the
er’s perspective, it is also preferable if their VMs are victim.
distributed across the system, rather than being allo- Note that this is different from [18], [19], [20], where
cated together on the same server. Otherwise, the fail- attackers do not have specific targets, and their goal is to
ure of one server will impact all the VMs of a user. obtain an unfair share of the cloud platform’s capacity.
2. Power consumption—It has been estimated that the In order to co-locate with the targets, the attacker can
power consumption of an average datacentre is as either use a brute-force strategy: start as many VMs as pos-
much as 25,000 households [8], and it is expected to sible (the number may be limited by the cost), or take
double every five years [9]. Therefore, managing the advantage of the sequential and parallel locality in VM
servers in an energy efficient way is crucial for cloud placement. It has been shown in [1] that in the Amazon EC2
providers in order to reduce the power consumption cloud [21], if one VM is started immediately after another
and hence the overall cost. This has also been the focus one is terminated, or if two VMs are launched almost at the
of many previous works [10], [11], [12], [13], [14]. same time, it is more likely that these two VMs are allocated
In this paper, we take all three aspects of security, work- to the same server.
load balance and power consumption into consideration to
make PSSF more applicable to existing commercial cloud 2.1.1 Security Risks
platforms. Since these three objectives are conflicting to Theoretically speaking, co-resident VMs should not be
some extent, we improve our earlier policy by applying influenced by each other. However, this can still occur in
multi-objective optimisation techniques. In addition, we real-world cloud systems, which is the reason why attackers
have implemented PSSF on the simulation environment are able to build side channels between VMs, and obtain
CloudSim [15], [16], as well as on the real cloud platform private information. We can categorise these side channels
OpenStack [17], and performed large scale experiments that as either coarse grained or fine grained.
involve hundreds of servers and thousands of VMs, to dem-
onstrate that it meets the requirements of all three criteria. 1) Coarse grained. Experiments in [1] show that because
Specifically, our contributions include: (1) we define secure the cache utilisation rate has a large impact on the
metrics that measure the safety of a VM allocation policy, in execution time of the cache read operation, attackers
terms of its ability to defend against co-resident attacks; can infer the victim’s cache usage and workload
(2) we model these metrics under three basic but commonly information, by applying the PrimeþProbe tech-
used VM allocation policies, and conduct extensive experi- nique [22], [23]. Similarly, they can estimate the vic-
ments on the widely used simulation platform CloudSim tim’s web traffic rate, which also has a strong
[15], [16] to validate the models; (3) we propose a new secure correlation with the execution time of cache opera-
policy, which not only significantly decreases the probability tions. As we mentioned in the introduction, even
of attackers co-locating with their targets, but also satisfies such coarse-grained information can be useful to
the constraints in workload balance and power consump- clever attackers to maximise the damage of further
tion; and (4) we implement and verify the effectiveness of attacks.
our new policy using the popular open-source cloud soft- 2) Fine grained. It is demonstrated in [2] that attackers
ware OpenStack [17], as well as on CloudSim. can exploit shared hardware resources, such as the
The rest of the paper is organised as follows. In Section 2, instruction cache, to extract cryptographic keys. Spe-
we give a survey of previous work on co-resident attacks, cifically, they show how to overcome the following
and current VM allocation policies. In Section 3, we describe challenges: dealing with core migrations and deter-
our research aim and formally define the problem. In mining if an observation is associated with the vic-
Sections 4 and 5, we model the security metrics under three tim, filtering out hardware and software noise, and
existing allocation policies, and give an experimental verifi- regaining access to the target CPU core with suffi-
cation. In Section 6, we introduce our new policy and sum- cient frequency.
marise the test results on CloudSim. In Section 7, we In addition, a number of side channels have been
present the implementation on OpenStack. Finally, Section explored [24], [25], [26], [27] in order to transfer sensitive
8 concludes the paper, and gives directions for future work. information between VMs, which is prohibited by security
policies. In particular, in contrast to exploiting side channels
2 BACKGROUND AND RELATED WORK to launch attacks, Zhang et al. [28] discuss how to use side
channels to detect whether the isolation of a VM is violated.
In this section, we start by surveying previous work on co-res-
ident attacks, including its definition, security risks and possi-
2.1.2 Existing Countermeasures
ble countermeasures. We then summarise commonly used
VM allocation policies as context for our proposed method. Previous studies have proposed the following five types of
possible defense methods:
2.1 Survey on Co-Resident Attacks 1) Eliminating the side channels. Side channel attacks are
The co-resident attack discussed in this paper comprises the not unique to cloud systems. Prior to the popularisa-
following two steps. First, the attacker has a clear set of tar- tion of cloud platforms, different methods [29], [30]
get VMs, and their goal is to co-locate their VMs with these have already been proposed to mitigate the threat of
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 97

side channels. However, these methods are at the 5) Using VM allocation policy to make it difficult to
hardware layer, and hence are normally costly to achieve co-residence. This is also the method we
adopt. In cloud environments, many side channels choose. To the best of our knowledge, only [41]
rely on high resolution clocks, therefore, Vattikonda adopts a similar method. Their co-location resis-
et al. [3] propose to remove such clocks; Wu et al. [4] tant (CLR) algorithm label all servers as either
choose to add latency to potentially malicious opera- open or closed, where open (closed) means the
tions; while the approach of Aviram et al. [5] is to server can (cannot) receive more VMs. At any
“eliminate all internal reference clocks”. An alterna- time, CLR keeps a fixed number (Nopen) of servers
tive solution is enforcing isolation by preventing the open, and allocates a new VM to one of these serv-
sharing of sensitive resources, e.g., Shi et al. [6] use ers randomly. If the selected server cannot take
page-colouring to limit cache-based side channels. In more VMs due to this allocation, it will be marked
particular, Szefer et al. [31] take a further step by pro- closed, and a new server will be opened. Specifi-
posing to remove the hypervisor, and use hardware cally, the larger Nopen is, the more co-location resis-
mechanisms for the isolation of access to shared tant the algorithm becomes. Therefore, the best
resources. Nevertheless, the problem with these case scenario is that all servers are open. In this
methods is that they often require substantial case, CLR works the same as the Random policy
changes to be made to existing cloud platforms, and defined in the next section.
hence are unlikely to be adopted by cloud providers
any time soon. More recently, Zhang and Reiter [32]
2.1.3 Our Earlier Work
propose to perform periodic time-shared cache
cleansing, in order to make the side channel noisy. In We have been looking at this problem from a new per-
addition, Varadarajan et al. [33] show that a schedul- spective: instead of concentrating on the countermeasures
ing mechanism called minimum run time (MRT) after attackers co-locate with their targets, we explore
guarantee is effective in preventing cache-based side different ways to make it difficult for attackers to achieve
channel attacks. These two methods require less co-residence. In [42], we propose a game theoretic
changes and hence are easier to be deployed. approach to compare three commonly used virtual
2) Increasing the difficulty of verifying co-residence. The machine allocation policies, in terms of security (i.e., the
easiest way to determine whether two VMs are on ability of defending against co-resident attacks), workload
the same server is based on network measurements balance and power consumption. In [7], we defined
[1]: by performing a TCP traceroute operation the security metrics for assessing the attack, and mathemati-
attacker can obtain the IP address of a VM’s Dom0, cally modelled these metrics under the three policies. In
which is a privileged VM that manages all VMs on a addition, we proposed a new policy, PSSF, which can sig-
host. If two Dom0 IP addresses are the same, the cor- nificantly decrease the probability of attackers co-locating
responding VMs are co-resident. Cloud providers with the targets. In this paper, we propose a more gener-
can prevent Dom0’s IP address from being exposed ally applicable extension of the PSSF policy, and provide
to customers, so that attackers will be forced to resort a detailed analysis of its performance.
to other options that do not rely on network meas-
urements, and often require greater effort. However, 2.2 Summary of Virtual Machine Allocation Policies
as more and more different methods of detecting co- In cloud computing environments, there are two kinds of
residence have been proposed [34], [35], [36], simply VM placements: (1) initial placement, which admits new
hiding Dom0’s IP address is not sufficient. requests for VMs, and assigns them to specific hosts; and
3) Detecting the features of co-resident attacks. Sundares- (2) live migration [43], which optimises the current alloca-
waran and Squcciarini [37] and Yu, et al. [38] tion according to certain metrics. The initial placement can
observed that when attackers use the PrimeþProbe be further divided into two steps—searching for a data cen-
technique to extract information from the victim, tre within the system, and then a host within the chosen
there are abnormalities in the CPU and RAM utilisa- data centre. In this paper, we focus on the initial VM alloca-
tion, system calls, and cache miss behaviours. They tion within a data centre.
propose different methods to detect these features, Although dozens of algorithms [9], [10], [11], [12], [13],
and design the defense mechanisms accordingly. [14] have been proposed based on various criteria and goals,
4) Migrating VMs periodically. Li and Zhang et al. [39], judging from the final allocation process they can be gener-
[40] tackle the problem by applying a Vickrey- ally classified into two types: stacking and spreading. In
Clarke-Groves (VCG) mechanism to migrate VMs other words, the VMs are either concentrated to a number
periodically. Specifically, they discuss the number of physical servers, in order to decrease the power con-
of VMs to be migrated as well as the destination sumption and maximise the utilisation rate, or distributed
hosts. In addition, they propose a method to gener- across the whole data centre, for the purpose of workload
ate a VM placement plan, in order to decrease balance and higher reliability. Table 1 summarises some
the overall security risk. However, frequently commonly used or widely cited allocation policies.
migrating VMs will cause extra power consump- In this paper, we only consider three fundamental yet
tion, and may lead to performance degradation, representative VM allocation policies [11]: Least VM policy
which increases the probability of cloud providers (Workload Balancing), Most VM policy (Workload Stack-
breaking their SLA. ing), and Random policy (Random).
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
98 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

TABLE 1
Popular VM Allocation Policies

Name Description
First Fit All servers are ordered by their identifier, and a new VM is allocated
to the legitimate server (i.e., the server with enough remaining
resources and satisfying all other requirements if
there are any) with the smallest identifier
Stacking Workload Stacking One example is to allocate a new VM to the
legitimate server with the most number of VMs (started by any user)
This is what we call “Most VM policy”.
Energy/ Cost Aware This is a more advanced type of policies than
“Workload Stacking”, which also aims to minimise the (cost of)
power consumption. Specifically, a new VM is allocated to the server
that will cause the least additional power consumption/cost,
the calculation of which is different from policy to policy
Random The simplest policy that selects at random from those
legitimate servers
Next Fit Similar to First Fit, except that the search begins from the
server that was last selected
Spreading Workload Balancing A group of similar policies that spread the VM
requests based on different criteria. For example, a new VM
is allocated to the legitimate server (1) with the least number
of VMs (started by any user). This is what we call “Least VM
policy”; (2) with the most number of free CPU cores;
or (3) with the largest ratio of free CPU cores to total CPU cores

3 PROBLEM FORMULATION AND ANALYSIS The reason why we use j Servers(SuccVM(A,t)) j


instead of just SuccVM(A,t) is that when two malicious
In this section, we first summarise the aim of our research.
VMs co-locate with the same target, the second VM
Then we describe our problem definition, and briefly intro-
should not be counted. Note that we focus on prevent-
duce our proposed solution.
ing attackers from co-locating with their targets, and
consider that once co-residence is achieved, attackers
3.1 Research Aim
are able to construct side channels. Although a second
The aim of this research is to design a secure VM allocation
co-resident VM can make it easier for attackers to
policy, in order to mitigate the threat of co-resident attacks.
extract sensitive information from the victim, in this
Specifically, we determine whether an allocation policy is
paper we focus on preventing any co-residence.
secure based on the following three metrics (the definitions
2) Coverage—Another criterion to measure the success
of all notation are given in Table 2):
of an attack is the percentage of the conquered tar-
1) Efficiency—For attackers, clearly it is desirable to co- gets, i.e., Coverage, which equals the number of target
locate with as many targets as possible by starting VMs co-located with malicious VMs started in the
the minimum number of VMs. Hence, we define Effi- attack, divided by the number of targets T, i.e.,
ciency as the gains divided by the costs. More pre-
jSuccTargetðA; tÞj
cisely speaking, it equals the number of servers on CoverageðjVM ðA; tÞjÞ ¼ :
which malicious VMs are co-located with at least T
one of the T targets, divided by the total number of 3) VMmin—This is defined as the minimum number of
VMs launched by the attacker, i.e., VMs that the attacker needs to start so that at least
one of them co-locates with at least one target. It is
jServersðSuccVM ðA; tÞÞj an estimate of the minimum effort an attacker has to
EfficiencyðjVM ðA; tÞjÞ ¼ :
jVM ðA; tÞj take in order to achieve co-residence.

TABLE 2
Definitions Regarding the Security Metrics

Name Definition
K The total number of servers
A The attacker
L A legal user. The target of A is the set of VMs started by L
VM(L,t) The set of VMs started by L at time t
VM(A,t) The set of VMs started by A during one attack at time t
Target(A) The target set of VMs that A intends to co-locate with, Target(A) ¼ StVM(L,t), j Target(A) j ¼ T
SuccTarget(A,t) A subset of Target(A) that co-locates with at least one VM from VM(A,t)
SuccVM(A,t) A subset of VM(A,t) that co-locates with at least one of the T targets
Servers({a set of VMs}) Servers that host the set of VMs
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 99

utilised. For simplicity, in our new policy we use the


number of running VMs per server as the criterion to
spread the workload (the same as the Least VM pol-
icy). In addition, customers would also prefer if their
VMs are not all allocated together on the same server.
In other words, on average the number of servers that
host a user’s VMs should be maximised, i.e.,

1 X  
W : max sjs 2 S; xv;s;u ¼ 1; 8v 2 V  :
M u2U

Fig. 1. An example to explain attack efficiency and coverage. 3) Power consumption—How to effectively reduce the
power consumption is a crucial issue for cloud
The following example illustrates the definitions of attack providers [12]. Different techniques of energy aware
efficiency and coverage. As shown in Fig. 1, a legal user L VM placement has been widely discussed in previ-
starts four VMs (VM_L1, VM_L2, VM_L3 and VM_L4), and ous papers [9], [10], [11], [12], [13], [14]. In order to
they are running on four different servers (Server 1, Server 2, simplify the problem, here we only consider the
Server 3 and Server 4). Then attacker A starts eight VMs, most straightforward approach—minimising the
(VM_A1, VM_A2, . . ., VM_L8), four of which co-locate with number of running servers, i.e.,
three VMs of L. In this case, the attack efficiency is 3/8  
P : min sjs 2 S; 9u 2 U; v 2 V; xv;s;u ¼ 1 :
(instead of 4/8, as VM_A2 and VM_A4 co-locate with the
same target VM_L2), and the coverage is 3/4. In addition, we make the following assumptions:
In addition to security, we also take workload balance
and power consumption into consideration, since they 1) The capacity of a server is not explicitly included.
are another two important factors when cloud providers However, when a new VM request is being proc-
design their VM allocation policies. In other words, the essed, only the servers with sufficient resources left
new policy not only mitigates the threat of co-resident are considered—we refer to these as legitimate serv-
attacks, but also satisfies the constraints in workload ers. In other words, we focus on designing an algo-
balance and power consumption. rithm to sort these legitimate servers, and allocate
the new VM to the top ranked server, so that the
3.2 Problem Definition above three objectives can be satisfied;
Consider the following scenario: in a cloud computing sys- 2) The multi-objective optimisation is done for every
tem of K servers S ¼ {s1, s2, . . ., sK}, M users U ¼ {u1, u2, . . ., incoming VM request when it arrives, such that only
uM} start N virtual machines V ¼ {v1, v2, . . ., vN}. A mapping the current system state and the VM request are
X: U  V ! S, allocates each VM from each user to a specific taken into consideration, i.e., no lookahead mecha-
server, XVSU ¼ {xv,s,u j xv,s,u ¼ 1 iff VM v of user u is allo- nism is used;
cated to server s}. An attacker A intends to co-locate 3) VM live migration is not taken into consideration,
P their
VMs with the VMs of legal user L, i.e., Target(A) ¼ tVM(L, which means once a VM is allocated to a server, it
t). During one attack started at time t, A launches j VM(A, will run on that server until it is stopped or termi-
t) j VMs, and the goal is to maximise the efficiency and/or nated by the user;
coverage rate. 4) Cloud providers do not have any prior knowledge of
Given this attack scenario, our new policy should satisfy the attackers or victims, and all VM requests are
the following objectives: treated equally.
There are many different ways to solve this multi-objec-
1) Security—Under the new policy, the attacker has to tive optimisation problem, and since the security objective
start a large number of VMs to achieve a non-zero is the most important in our case, we choose the e-con-
efficiency or coverage rate, i.e., VMmin is high. In straints method [44].
addition, the coverage rate does not increase or
increases very slowly with j VM(A,t) j . In order to 3.2.1 Problem Statement
achieve these two points, one extreme case is that
Formally, the problem is as follows: how to design a rule for
VMs of different users are never allocated to the
the mapping X, such that the number of users per server is
same server. Based on such an idea, we minimise the
minimised, under the constraints on workload balance and
average number of users per server, i.e.,
power consumption, i.e.,
1 X   
S : min uju 2 U; xv;s;u ¼ 1; 8v 2 V  : 1 X  
K s2S S : min uju 2 U; xv;s;u ¼ 1; 8v 2 V 
K s2S
 
2) Workload balance—As we mentioned in the introduc- s:t: W 0 : 8s 2 S; u 2 U;  vjv 2 V; xv;s;u ¼ 1   N 
 
tion, the importance of workload balance is twofold. P :  sjs 2 S; 9u 2 U; v 2 V; xv;s;u ¼ 1   K 
For cloud providers, evenly distributing VMs helps
decrease the probability of servers being over- where N and K are predefined positive thresholds.
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
100 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

Note that we re-write the condition W to “no more than t) j ), and repeat j VM(A,t) j /B times. Otherwise, all their
N VMs of one single user are hosted on the same server”, VMs will be stacked into a small number of servers. For a
as this is more practical for implementation. more detailed analysis regarding the size of B, please refer
Because this multi-objective optimisation problem is NP- to our earlier paper [7] (we used a different notation of M in
hard—it contains both the knapsack problem (the security that paper).
objective and the workload balance constraint) and the bin- Under the Random policy, how the attacker starts their
packing problem (the power consumption constraint)—we VMs does not have an obvious impact on the number of
aim to find a heuristic solution for it. different servers that these VMs are assigned to, since the
Specifically, in order to minimise the average number of host for each VM is chosen randomly.
users per server, PSSF gives a higher priority to (1) servers
that currently host VMs from the same user—assigning a 4.2 Modelling VMmin: Minimum Number of VMs the
new VM to one of these servers will not increase the number Attacker Needs to Start
of different users on that server; (2) servers that once hosted In the following two sections, we present our models for the
VMs from the same user – this is for the case where none of three security metrics. Note that hereafter all discussions
the user’s VMs is running, but the user has created VMs are based on the assumption that the attacker applies the
before. Selecting one of these servers prevents attackers best strategies described in the last section. We start with
from constantly switching on and off their VMs to circum- VMmin, i.e., the minimum effort the attacker has to take in
vent the restriction. order to co-locate with their targets.
In addition, the workload balance constraint, i.e., no Let pi, 1  i  VMmin, be the probability that the ith
more than N VMs of one single user are hosted on the VM co-locates with at least one of the T targets. Then
same server, is directly implemented in PSSF—if a server the probability that VMmin ¼ n follows a binominal
already hosts N VMs from a user, it will not be chosen distribution, and equals to the probability that none of
again even if it has enough remaining resources, when that the first (n  1) VMs co-locates with any target, and the
user creates more VMs. last VM does:
Finally, in order to decrease the power consumption, we Y
n 1
combine the ideas of spreading and stacking workload. A P ðVMmin ¼ nÞ ¼ pn  ð1  pi Þ: (1)
more detailed description of our PSSF policy is given in i¼1
Section 6. Let Ki be the number of servers to which the ith VM can
be assigned, and T0 be the number of servers that host the
4 SECURITY ANALYSIS OF BASIC VM ALLOCATION target T VMs, then pi ¼ T 0 =Ki . Note that T0  T, as a subset
POLICIES of these VMs may co-locate with each other. An analysis of
T0 is given in Section 5.2.
Before proposing our new VM allocation policy, we first
carry out a comparison between three basic existing
policies—Least VM policy, Most VM policy and Random 4.2.1 Least VM Allocation Policy
policy—in terms of their abilities in defending against co- Under this policy, if all n VMs are started at the same time,
resident attacks. We start by analysing the attacker’s strate- previously selected hosts will not be chosen again. In other
gies under different allocation policies to maximise their effi- words, the ith VM (i ¼ 1, 2, . . ., n) can only be assigned to
ciency and coverage rates. Then we introduce our models of one of the remaining K  (i  1) servers, i.e., Ki0 ¼ K  (i 
VMmin, Efficiency( j VM(A,t) j ) and Coverage ( j VM(A,t) j ) for 1). Hence:
each policy. T0
pi ¼ (2)
K  ði  1Þ
4.1 Attackers’ Strategies under Different VM Y
n2  
T0 T0
Allocation Policies P ðVMmin ¼ nÞ ¼  1 (3)
K  n þ 1 i¼0 Ki
We assume that the attacker can figure out which VM allo-
cation policy (or at least the type) is being used in the 4.2.2 Most VM Allocation Policy
system, and optimise their strategies accordingly. Therefore,
As we analysed before, in this case the attacker starts their
before comparing the three policies, we first need to under-
VMs in batches of B, and because B is sufficiently small, all
stand the worst case scenario, i.e., how the attacker maxi-
the B VMs will be allocated to different servers. In other
mises either the efficiency or the coverage rate under these
words, no co-location of the attacker’s own VMs will occur
policies. One straightforward approach is to spread
within a single batch. However, it is possible for VMs from
the VMs, i.e., to occupy as many servers as possible with the
different batches to co-locate with each other, i.e., Ki0
minimum number of VMs.
decreases as i increases within a batch, but resets to K for
Under the Least VM policy, the attacker should start as
every new batch, Ki0 ¼ K  ði  1Þ mod B. Hence:
many VMs as possible at once (the number is restricted by
costs and other constraints), for the reason that a server will T0
pi ¼ (4)
not be selected twice within a short period of time under K  ði  1Þ mod B
this policy.
T0 Y
n2
T0

Under the Most VM policy, instead of launching a large P ðVMmin ¼ nÞ ¼  1 :
number of VMs at the same time, the attacker should start K  ðn  1Þ mod B i¼0 K  i mod B
their VMs in batches, i.e., start B VMs at a time (B < j VM(A, (5)
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 101

TABLE 3
Configurations of Servers and VMs in CloudSim

Quantity CPU speed (MIPS a) No. of CPU cores RAM (MB)


Servers 150 2,600 16 24,576
150 2,600 12 49,152
VMs randomb 2,500 1 870
randomb 2,000 1 1,740
randomb 1,000 1 1,740
randomb 500 1 613
a
The metric of CPU speed in CloudSim is MIPS instead of MHz.
b
Each VM request randomly selects the required type of VM.

4.2.3 Random Allocation Policy P


T  i1 j¼i rj
Under the Random policy, as the probability for each server ri ¼ : (13)
K  ði  1Þ mod B
to be selected is the same, the probability of a new VM co-
locating with at least one of the T targets is: 3) Under the Random policy,
T0 Pi1
pi ¼ (6) T0  qj
K j¼i
qi ¼ (14)
  K
T 0 n1 T 0 Pi1
P ðVMmin ¼ nÞ ¼ 1   : (7) T rj
K K j¼i
ri ¼ : (15)
K
4.3 Modelling Attack Efficiency and Coverage
In this section, we present the models for the other two met- 5 EXPERIMENTAL VERIFICATION
rics: the attack efficiency and coverage. In this section, we present the experimental verification of
Let qi, 1  i  jVMðA; tÞj be the probability that the ith the above models for VMmin, Efficiency ( j VM(A,t) j ), and
VM is assigned to a server that (1) hosts at least one of the T Coverage( j VM(A,t) j ).
targets, and (2) does not host any of the previous (i  1)
VMs. Then the efficiency of starting j VM(A,t) j VMs is: 5.1 Experimental Environment
PVM ðA;tÞ CloudSim [15], [16] is a powerful and popular simulation
i¼1 qi platform for cloud computing. It provides support for simu-
EfficiencyðjVM ðA; tÞjÞ ¼ : (8)
jVM ðA; tÞj lating large scale data centres, self-defined VM allocation
policies, as well as many other useful functionalities. There-
Similarly, let ri, 1  i  j VM(A,t) j be the probability that fore, we choose it as our experimental environment to test
the ith VM co-locates with a new target (i.e., a target that the models of VMmin, Efficiency ( j VM(A,t) j ), and Coverage
does not co-locate with any of the previous (i  1) VMs). ( j VM(A,t) j ). The settings of the environment are as follows.
Then the coverage after starting j VM(A,t) j VMs is:
PVM ðA;tÞ 5.1.1 Physical Servers and Virtual Machines
i¼1 ri
CoverageðjVM ðA; tÞjÞ ¼ : (9) As shown in Table 3, a data centre with 150 servers and
T more than 3,500 VMs (as the background traffic) is used in
Note that when the attacker starts the Pith VM, the previ- the simulations. Note that there are two sets of configura-
ous (i1) VMs already co-locate P with j<i rj targets. In
tions for the servers, the difference being the potential bot-
otherPwords, there are only T  j < i rj target VMs, and tleneck resource, either the CPU capacity or the RAM
T 0  j < i qj (target) servers remaining. Therefore, qi and ri capacity. All experiments are repeated under each of these
under the three policies have the following forms (note that two configurations separately.
the denominator is the same as pi):
5.1.2 Background Workload
1) Under the Least VM policy, We have shown in our earlier work [45] that in cloud com-
P puting systems, both the VM request arrival and departure
T 0  i1
j¼i qj processes follow a power law distribution, and hence
qi ¼ (10)
K  ði  1Þ exhibit self-similarity. In order to make the results as close
P to the real case as possible, we use the program in [46] to
T  i1
j¼i rj
ri ¼ : (11) generate self-similar background workloads.
K  ði  1Þ
2) Under the Most VM policy, 5.1.3 Experimental Settings
P In each experiment, the number of running VMs in the sys-
T 0  i1
j¼i qj tem reaches a relatively stable value after one and a half
qi ¼ (12)
K  ði  1Þ mod B hours. At the 18,000th second, a legal user L starts 20 VMs,
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
102 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

and a certain time later (this time difference is called the lag) first configuration (where each server has 16 CPU
an attacker A starts j VM(A,t) j VMs at the tth (t ¼ 18,000 þ cores, and 24 Gigabytes of RAM), all three policies per-
lag) second (using the strategies analysed in Section 4.1 form similarly, although they allocate VMs using
under different policies). completely different approaches. Nevertheless, under
j VM(A,t) j ¼ 1, 5, 10, 15, 20, 25, . . ., 100, and lag ¼ 1, 5, 10, the second configuration (where each server has 12
20, 30, 40, . . ., 100 (minutes). For all possible combinations of CPU cores, and 48 Gigabytes of RAM), the Most VM
VM allocation policy, j VM(A,t) j and lag, the experiment is policy performs better. We explain the reason later.
repeated 50 times. 4. Model fitness—In terms of the model fitness, we find
that the difference between the estimated values and
5.2 Results on VMmin: Minimum Number of VMs the the simulation results is larger for the Most VM pol-
Attacker Needs to Start icy under the second configuration. This is due to
The legal user L starts their VMs (i.e., the target VMs) at the the effect of oversubscription as described below.
same time in our experiments. Consequently:
1) Under the Least VM policy, all these VMs are allo- 5.3.1 Effect of Oversubscription
cated to different servers, i.e., T0 ¼ T. Oversubscription is a common practice in cloud computing
2) Under the Most VM policy, a subset of the VMs co- [47], which allows providers to allocate more resources to
locate on the same servers. Specifically, in our exper- users than the actual capacity of their servers, provided that
iment, under the two sets of configurations for serv- they do not break SLAs. In our simulation experiment, the
ers, the 20 target VMs are allocated to 16 or four oversubscription of CPU capacity is enabled. Therefore,
servers on average, respectively, i.e., T0 ¼ 0.8  T, or under the Most VM policy, even if the attacker starts VMs
T0 ¼ 0.2  T, depending on the configuration. in batches (B is set to 5), it is still possible that a subset of
3) Under the Random policy, if ti is the probability that these VMs are allocated to the same server. In other words,
the ith VM is allocated to a new server, i.e., does not it is more difficult to spread the VMs, no matter how the
co-locate with any of the previous (i  1) VMs, then attacker starts them under the Most VM policy. In this case,
t1 ¼ 1, t2 ¼ 1  t1/K, t3 ¼ 1  (t1 þ t2)/K . . . Consider- equations (4), (5), (12) and (13) do not hold, which explains
the last point mentioned above, and is also part of the rea-
Plarge, we have ti  1  (i 
ing that K is sufficiently
1)/K, and hence: T0  0  i  T1 (1  i ∕ K) ¼ T 
son for the third observation.
T  (T  1) ∕ (2  K). But what happens under the first configuration? In this
Regarding the results, we find that under the Most VM situation, the RAM capacity is the bottleneck resource, and
policy and Random policy, the models fit the results quite hence even though the scheduler tries to allocation more
well in all cases. However, under the Least VM policy, the VMs to the same server, there is not enough RAM left, i.e.,
models only work when the lag is larger than 10 minutes. the oversubscription does not take effect.
When the lag is small, the experimental results are much
lower (this part of the results are excluded), which means if 5.3.2 Effect of Background Traffic
the attacker starts their VMs immediately after the legal user,
Background traffic is another factor that causes the three
it is extremely unlikely that these VMs will co-locate with
policies to perform similarly under the first set of configura-
any target. The reason is as we stated before—under the tions. Consider the server selection process. When the sys-
Least VM policy, one server will not be selected again within tem initiates, under the Most VM policy, the scheduler fills
a short period of time. up one server after another, which is almost the opposite to
the situation under the other two policies. However, after a
5.3 Results on Attack Efficiency and Coverage sufficiently long time, no matter which policy is applied,
As can be seen from Figs. 3 and 4, the models for attack effi- due to the background traffic, at any moment there are a
ciency and coverage also fit well with the results. Specifi- number of servers that are not fully utilised. In this circum-
cally, the following points should be noted: stance, under the Most VM policy, newly started VMs will
be assigned to these servers first, causing a server selection
1. Attack efficiency—Under the Least VM policy, the effi- process much similar to that under the other two policies.
ciency stays approximately stable as j VM(A,t) j An examination of the server selection sequence shows that
increases. This is desirable from the attacker’s point they are all nearly random.
of view, as all their VMs have the same probability In summary, the conclusion here is that if oversubscrip-
of co-locating with new targets. Under the other two tion is enabled, and the servers are properly configured
policies, the efficiency decreases gradually, which according to the deployed VM allocation policy, then stack-
means it is less likely for later started VMs to be co- ing the workload is more effective in defending against co-
resident with the targets. resident attacks.
2. Attack coverage—Under the Least VM policy, since
the efficiency stays the same as j VM(A,t) j increases,
the coverage rate increases almost linearly, which is 6 A NEW BALANCED VM ALLOCATION POLICY
faster than the other two cases. A lesson from the previous results is that if the number of
3. Comparison of policy performance—Regarding the per- servers that each user’s VMs can be assigned to is limited,
formance of the three policies, the conclusions are dif- so that the target VMs are less exposed to the attacker, then
ferent for the two sets of configurations. Under the the impact of co-resident attacks will be mitigated. Based on
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 103

this idea, we design a new balanced policy, Previously- As a result, these two servers host more VMs than the other
selected-servers-first (PSSF). servers now, and it is unlikely for further VM requests (from
We introduce step by step how PSSF satisfies the objec- other users) to be assigned to them until all other servers
tives of security, workload balance and power consumption. also host the same number of VMs. Even then, it is difficult
for attackers to achieve co-residence, since the target VMs
1. Security—In order to minimise the average number are allocated together, and as a result are less exposed.
of users per server, when a user ui creates new VMs, In summary, our VM allocation policy PSSF works as fol-
they will first be assigned to those servers that lows when a user u needs to create a new VM.
already host or once hosted VMs started by ui (i.e., In order to verify that PSSF is effective in defending
previously selected servers). We explained the rea- against the co-resident attack, while satisfying the require-
son in Section 3.2. ments in workload balance and power consumption, we re-
2. Workload balance—In the following three circumstan- ran the same experiments in the last section.
ces, the new VMs will not be assigned to previously It should be noted that under our new policy, if the
selected servers: (1) every previously selected server attacker intends to spread the VMs, they need to use multi-
already hosts N VMs of ui, (2) none of the previously ple accounts and keep creating new accounts (otherwise,
selected servers has enough resources left, and (3) the they have to keep all the VMs running in order for the
user has never started VMs before. In these three newly started VMs to be assigned to different servers), each
cases, PSSF will spread the workload instead, e.g., of which starts only one VM at a time. All the experiments
choose the servers with the least number of VMs. below are carried out in this worst case scenario.
3. Power consumption—One main reason why the Least
VM policy and the Random policy perform poorly in
power consumption is that an excessive number of 6.1 Simulation Experiment on CloudSim
servers are switched on. The most straightforward We start by ignoring the constraints on workload balance
way to minimise the number of running servers is (W) and power consumption (P), and only considering the
stacking, or in other words, allocating new VMs to security objective, which is to minimise the average number
the same server until there is not enough remaining of users per server. We denote this as “PSSF þ Least VM”.
resources. However, clearly this breaks the rule of As can be seen from Figs. 2, 3 and 4, all the values of the
workload balance. Therefore, we propose a compro- three security metrics are quite close to zero, regardless of
mise solution: logically divide all servers into groups the number of servers started by the attacker, the reason
of NG; within each group, the workload is spread; the being that in this case it is like every user has their own ded-
next group of servers will not be started until servers icated servers.
in all the former groups are fully utilised. For simplic- We continue by adding the workload balance constraint
ity, the group index equals to the server’s index, i.e., (W), and denote this as “PSSF þ Least VM þ Limit per
0, 1, . . ., K–1, divided by the group size NG. user”. In our simulation experiment, the limit (N) is set to 3
(i.e., every server can only host no more than three VMs
from the same user, which we believe is strict enough). We
Algorithm 1. Previously-selected-servers-first policy
can see from the results that even though this constraint has
1: PSSList ¼ {}, NPSSList ¼ {} an obvious negative impact in terms of security, all the met-
2: foreach server si in S rics are still significantly lower than those under the existing
3: if (si has enough remaining resources) policies. In addition, we calculate for each server the num-
4: if (si already hosts or once hosted u’s VMs)
ber of times they are selected. The standard deviations are
5: if (si hosts less than N of u’s VMs)
almost the same—in the range of [3.9, 4.3]—under this pol-
6: PSSList.add(si)
icy as under the Least VM policy, which indicates that these
7: else
8: NPSSList.add(si)
two policies perform similarly in balancing the workload.
9: if (!PSSList.isEmpty()) Finally, we take the power consumption objective (P)
10: return PSSList.get(random(PSSList.size())) into consideration, and denote this as “PSSF þ Mixed
11: else (group size ¼ NG) þ Limit per user”, where NG ¼ 10, 15, 25,
12: Sort(NPSSList, group index, resources left) 30. In this situation, the points below should be noted:
13: i ¼ the number of servers with the same group index and
1) Generally speaking, from the security perspective
remaining resources as the first server in NPSSList
the grouping brings further negative effects, espe-
(NPSSList.get(0))
14: Mark NPSSList.get(random(i)) as “previously selected”
cially when j VM(A,t) j  20. This is because if the
for u, and return it attacker’s VMs are allocated to the same group as
the target VMs, it is more likely that they co-locate
with each other due to the limited group size.
Another reason why we choose to spread the workload, 2) As j VM(A,t) j increases, the negative impact
instead of concentrating or randomly distributing it (in fact, becomes less obvious—in terms of the attack effi-
we tried all three options—please refer to our earlier paper ciency, the difference between “PSSF þ Mixed
[7] for details), is that it further helps lower the probability (group size ¼ NG) þ Limit per user” and “PSSF þ
of attackers co-locating with the targets. Consider the follow- Least VM” decreases gradually, while as for the cov-
ing example (as shown in Fig. 5): the victim user starts ten erage rate, the difference increases between “PSSF þ
VMs, and they are equally assigned to two servers, s1 and s2. Mixed (group size ¼ NG) þ Limit per user” and the
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
104 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

Fig. 2. Comparison of VMmin between the predicted values and the simulation results.

existing policies. In some cases (NG ¼ 10, 15), the val- 20 percent) causes an obvious degradation in security. This
ues of the metrics are even lower. This is because is because on the one hand, it is likely that attacker’s VMs
these malicious VMs are assigned to other groups. and the target VMs are allocated to the same group. On the
3) As can be seen from Fig. 6, the power consumption is other hand, due to the smaller size, it is easier for the
considerably lower, as far less servers are turned on. attacker to achieve co-location.
In addition, the number of servers used is insensitive
to the group size NG, and the limit per user N, which 6.1.1 Comparison between PSSF and CLR
means K is mainly decided by the system workload. In order to compare our PSSF policy with the co-location
4) As long as the current group(s) of servers have suffi- resistant algorithm proposed in [41], we implemented CLR
cient resources, the workload is balanced across the on CloudSim and re-ran the same experiment under the
system. The only exception happens when a new second set of configurations. Specifically, we tested three val-
group of servers is used. As VMs from the previous ues, 60, 90 and 150, for the parameter of Nopen. As we can see
group(s) leave the system, and/or new VMs are allo- from Fig. 7, PSSF (with group size ¼ 15, N ¼ 3) performs
cated to the new group of servers, the workload much better in terms of preventing the attacker from achiev-
gradually becomes balanced again. ing co-residence (due to space limitations, we only show the
As for the optimal group size, our current experiment attack coverage here). It was mentioned in [41] that the larger
suggests that 10 percent of the total number of servers gives Nopen is, the better the CLR algorithm is at becoming co-loca-
the best result. In addition, we find that a large value (of tion resistant, which is also confirmed by our results. In the

Fig. 3. Comparison of Efficiency between the predicted values and the simulation results.
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 105

Fig. 4. Comparison of Coverage between the predicted values and the simulation results.

extreme case where all servers are open (i.e., Nopen ¼ 150), The configuration of our experimental environment on
CLR becomes the Random policy. OpenStack is shown in Table 4. We first create 51 VMs in
NeCTAR (which runs on OpenStack itself). These VMs com-
7 IMPLEMENTATION ON OPENSTACK prise our own private cloud, and act as servers in our exper-
iment (we call them servers from now on). Then we set up
In order to further test our new allocation policy, we imple- OpenStack on the private cloud, where one server runs as
ment our policy on the popular open-source cloud platform the cluster controller, while the others run as compute
OpenStack and run similar experiments with the following nodes (servers that host VMs). Initially, each compute
changes: node has two CPU cores and 8 Gigabytes of RAM, but we
1. Most RAM policy instead of Least VM policy—In Open- configure the parameters “cpu_allocation_ratio” and
Stack, the default VM allocation policy is allocating “ram_allocation_ratio” so that each of them acts as if they
new VMs to the server with the most amount of have 128 CPU cores, and 64 Gigabytes of RAM. Another
remaining RAM (Most RAM policy). Therefore, point worth mentioning is that because all compute nodes
when spreading the workload, we choose to use this
Most RAM policy instead of the Least VM policy.
2. Fix the lag to 20 minutes—We found in our earlier
experiment that the lag (the time difference between
when the legal user and the attacker start their VMs)
does not have an obvious impact on the results in
most cases, so we fix it to 20 minutes.
3. NeCTAR workload—In order to make the results as
close to the real situation as possible, instead of using
a randomly generated background workload, we use
the traces obtained from an operational cloud service
called NeCTAR (National eResearch Collaboration
Tools and Resources) [48].

Fig. 5. An illustrative example of why spreading the workload further Fig. 6. Power consumption under different versions of the new policy, for
helps to prevent attackers from co-locating with their targets. the two sets of configurations.
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
106 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

Fig. 7. Comparison of Coverage between PSSF and CLR (Server


configuration 2: each server has 12 CPU cores, and 48 Gigabytes of RAM).

TABLE 4
Configurations of Servers and VMs in OpenStack

Quantity No. of CPU cores RAM (MB)


Servers 50 128 65,536
VMs 500 1 512 Fig. 9. Comparison of Efficiency between different VM allocation policies
300 2 1,024 (OpenStack).
10 4 2,048
150 8 4,096
50 16 7,168

Fig. 10. Comparison of Coverage between different VM allocation


policies (OpenStack).

Fig. 8. Comparison of VMmin between different VM allocation policies


will be on the same server. Such a requirement of
(OpenStack).
spreading each user’s workload has a negative effect
actually have just 8 Gigabytes of RAM, the memory size of on security, and the stricter the limit (i.e., the smaller
each VM hosted on these nodes should also be less than this the value) is, the larger the effect will be.
value (this is the reason why the largest VM in our configu- 3. Group size (NG)—e test two group sizes, 5 and 10.
ration has 7 Gigabytes of RAM). In addition, the legal user Similar to our earlier result, a larger group size of
starts only 10 instead of 20 VMs as there are less servers in 10 (20 percent of all servers) makes it easier to
this case; the attacker starts one, five, 10, 15, 20 VMs; NG is achieve co-residence, while a smaller value of 5 (10
set to 5, 10; and the limit per user (N) is set to 3, 5. Each percent of all servers) has a less impact on the
experiment is done at least 10 times. attack efficiency and coverage rate, which is the
As can be seen from Figs. 8, 9 and 10, the results are con- desired result.
sistent with former ones. 4. Power consumption—25 and 30 (out of 50) servers are
used when NG is set to 5 and 10, respectively. Con-
1. Security metrics—The three metrics under all variants sequently, the power consumption is reduced by 50
of our new policy are better than under the default and 40 percent respectively (for simplicity, we only
Most RAM policy, i.e., both the efficiency and the cov- consider the number of servers being used when
erage rates are lower, and VMmin is larger. The differ- calculating the power consumption).
ence becomes more obvious as j VM(A,t) j increases. In summary, our new VM allocation policy PSSF meets
2. Limit per user (N)—We test two limits, three and the objectives in security, workload balance and power
five, i.e., no more than three/five VMs of any user consumption. In terms of defending against co-resident
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
HAN ET AL.: USING VIRTUAL MACHINE ALLOCATION POLICIES TO DEFEND AGAINST CO-RESIDENT ATTACKS IN CLOUD... 107

TABLE 5
Performance Comparison of VM Allocation Policies in Security, Workload Balance, and Power Consumption

Policy Security Workload balance Power consumption


Spread workload (e.g., Least VM) Low Balanced High
Stack workload (e.g., Most VM) Medium Not Balanced Low
Random Low Balanced High
PSSF High Balanced most of the time Low

attacks, PSSF limits the number of servers that one user can REFERENCES
use, and hence increases the co-location of VMs belonging [1] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you,
to the same user. As a result, the victims are less exposed to get off of my cloud: Exploring information leakage in third-party
attackers, and it is also harder for attackers to spread their compute clouds,” in Proc. 16th ACM Conf. Comput. Commun. Secu-
VMs. Consequently, this strategy is effective in decreasing rity, 2009, pp. 199–212.
[2] Y. Zhang, A. Juels, M. Reiter, and T. Ristenpart, “Cross-VM side
the probability of attackers achieving co-residence. As for channels and their use to extract private keys,” in Proc. 19th ACM
the workload balance, if PSSF does not find any legitimate Conf. Comput. Commun. Security, 2012, pp. 305–316.
server that was selected previously, it allocates the new VM [3] B. Vattikonda, S. Das, and H. Shacham, “Eliminating fine grained
timers in xen,” in Proc. 3rd ACM Workshop Cloud Comput. Security
to a lightly loaded server. Therefore, during most of the Workshop, 2011, pp. 41–46.
time the workload is balanced across the system. Finally, in [4] J. Wu, L. Ding, Y. Lin, N. Min Allah, and Y. Wang, “XenPump: A
order to bring down the energy consumption, PSSF man- new method to mitigate timing channel in cloud computing,” in
ages the servers in groups, and a new group of servers will Proc. 5th IEEE Int. Conf. Cloud Comput., 2012, pp. 678–685.
[5] A. Aviram, S. Hu, B. Ford, and R. Gummadi, “Determinating
not be used until all existing ones are fully utilised, which timing channels in compute clouds,” in Proc. ACM Workshop Cloud
proved to be effective in decreasing the number of servers Comput. Security Workshop, 2010, pp. 103–108.
being used. [6] J. Shi, J. Shi, X. Song, H. Chen, and B. Zang, “Limiting Cache-
Table 5 summarises the performance of different VM based side-channel in multi-tenant cloud using dynamic page
coloring,” in Proc. 41st Annu. IEEE/IFIP Int. Conf. Dependable Syst.
allocation policies discussed in this paper. Netw. Workshops, 2011, pp. 194–199.
[7] Y. Han, J. Chan, T. Alpcan, and C. Leckie, “Virtual machine alloca-
tion policies against co-resident attacks in cloud computing,” in
8 CONCLUSIONS Proc. IEEE Int. Conf. Commun., 2014, pp. 786–792.
[8] J. M. Kaplan, W. Forrest, and N. Kindler, “Revolutionizing data
This paper provides a new perspective to counter the co-res- center energy efficiency,” McKinsey & Company, 2008, http://
ident attack. Instead of looking for solutions after attackers www.mckinsey.com/clientservice/bto/pointofview/pdf/revolu-
co-locate with their targets, cloud providers can mitigate tionizing_data_center_efficiency.pdf
the threat by minimising the probability of attackers co- [9] A. Hameed, A. Khoshkbarforoushha, R. Ranjan, P. Jayaraman, J.
Kolodziej, P. Balaji, S. Zeadally, Q. Malluhi, N. Tziritas, A. Vishnu,
locating with the targets. Specifically, we first compare the S. Khan, and A. Zomaya, “A survey and taxonomy on energy
difficulty of achieving co-residence under three basic yet efficient resource allocation techniques for cloud computing
widely used VM allocation policies, and find that if over- systems,” Computing, pp. 1–24, 2014, Doi: 10.1007/s00607-014-
subscription is enabled, and the servers are properly config- 0407-8.
[10] P. Graubner, “Energy-efficient management of virtual machines in
ured, the Most VM policy, or more generally speaking eucalyptus,” in Proc. 4th IEEE Int. Conf. Cloud Comput., 2011,
stacking the workload, performs better than the other pp. 243–250.
options.1 In addition, we propose and implement a new pol- [11] R. Jansen and P. R. Brenner, “Energy efficient virtual machine
allocation in the coud: An analysis of cloud allocation policies,” in
icy that is effective not only in defending against the co-resi- Proc. Int. Green Comput. Conf. Workshops, 2011, pp. 1–8.
dent attack, but also balancing the workload, and [12] K. Le, R. Bianchini, J. Zhang, Y. Jaluria, J. Meng, and T. D.
decreasing the power consumption. Nguyen, “Reducing electricity cost through virtual machine
In the future, we will take into consideration additional placement in high performance computing clouds,” in Proc. Int.
Conf. High Perform. Comput., Netw., Storage Anal., 2011, pp. 1–12.
practical factors to improve the policy: (1) distributed [13] K. Mills, J. Filliben, and C. Dabrowski, “Comparing VM-place-
scheduling—we only consider the centralised version of ment algorithms for on-demand clouds,” in Proc. 3rd IEEE Int.
different allocation policies in this paper, and assume Conf. Cloud Comput. Technol. Sci., 2011, pp. 91–98.
[14] A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resource
that the current system state is always known; (2) live allocation heuristics for efficient management of data centers for
VM migration—live VM migration may enable the cloud computing,” Future Gener. Comput. Syst., vol. 28, no. 5,
attacker to exploit the potential loopholes in the migration pp. 755–768, 2012.
algorithm to increase their probability of co-locating with [15] CloudSim [Online]. Available: http://www.cloudbus.org/cloud-
sim/, 2009.
the targets; (3) testing the policy in larger cloud comput- [16] R. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. De Rose, and R.
ing systems that consist of multiple data centres. Buyya, “CloudSim: A toolkit for modeling and simulation of
cloud computing environments and evaluation of resource provi-
sioning algorithms,” Softw., Practice Experience, vol. 41, no. 1,
ACKNOWLEDGMENTS pp. 23–50, 2011.
The authors thank the NeCTAR support team at the Univer- [17] OpenStack [Online]. Available: http://www.openstack.org/,
2010.
sity of Melbourne, for sharing workload traces of VM usage. [18] F. F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram,
“Scheduler vulnerabilities and coordinated attacks in cloud
computing,” in Proc. 10th IEEE Int. Symp. Netw. Comput. Appl.,
1. We have tested several other policies that also stack the workload,
and found that they perform similarly to the Most VM policy. Due to 2011, pp. 123–130.
space limitations, we omit these additional results.
uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply
108 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 14, NO. 1, JANUARY/FEBRUARY 2017

[19] V. Varadarajan, T. Kooburat, B. Farley, T. Ristenpart, and M. [42] Y. Han, T. Alpcan, J. Chan, and C. Leckie, “Security games for vir-
Swift, “Resource-freeing attacks: Improve your cloud perfor- tual machine allocation in cloud computing,” in Proc. 4th Int. Conf.
mance (at your neighbor’s expense),” in Proc. ACM Conf. Comput. Decision Game Theory Security, 2013, pp. 99–118.
Commun. Security, 2012, pp. 281–292. [43] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I.
[20] Z. Yang, Z. Yang, H. Fang, Y. Wu, C. Li, B. Zhao, and H. H. Pratt, and A. Warfield, “Live migration of virtual machines,” in
Huang, “Understanding the effects of hypervisor i/o scheduling Proc. 2nd USENIX Symp. Netw. Syst. Des. Implementation, 2005,
for virtual machine performance interference,” in Proc. 4th IEEE pp. 273–286.
Int. Conf. Cloud Comput. Technol. Sci., 2012, pp. 34–41. [44] R. T. Marler and J. S. Arora, “Survey of multi-objective optimiza-
[21] Amazon Elastic Compute Cloud (EC2) [Online]. Available: tion methods for engineering,” Struct. Multidisciplinary Optimiza-
http://aws.amazon.com/ec2/, 2006. tion, vol. 26, no. 6, pp. 369–395, 2004.
[22] D. A. Osvik, A. Shamir, and E. Tromer, “Cache attacks and coun- [45] Y. Han, J. Chan, and C. Leckie, “Analysing virtual machine usage
termeasures: The case of AES,” in Proc. Cryptographers’ Track RSA in cloud computing,” in Proc. IEEE Int. Workshop Perform. Aspects
Conf. Topics Cryptol., 2006, pp. 1–20. Cloud Service Virtualization, 2013, pp. 370–377.
[23] E. Tromer, D. A. Osvik, and A. Shamir, “Efficient cache attacks on [46] G. Kramer Synthetic self-similar traffic generation [Online]. Avail-
AES, and countermeasures,” J. Cryptol., vol. 23, no. 1, pp. 37–71, able: http://glenkramer.com/trf_research.shtml, 2000.
2010. [47] S. A. Baset, L. Wang, and C. Tang, “Towards an understanding of
[24] K. Okamura, K. Okamura, and Y. Oyama, “Load-based covert oversubscription in cloud,” in Proc. 2nd USENIX Conf. Hot Topics
channels between xen virtual machines,” in Proc. ACM Symp. Manage. Internet, Cloud, Enterprise Netw. Serv., 2012, pp. 7–12.
Appl. Comput., 2010, pp. 173–180. [48] Research cloud NeCTAR[Online]. Available: http://www.nectar.
[25] H. Hlavacs, T. Treutner, J.-P. Gelas, L. Lefevre, and A.-C. Orgerie, org.au/, 2011.
“Energy consumption side-channel attack at virtual machines in a
cloud,” in Proc. 9th IEEE Int. Conf. Dependable, Autonomic Secure Yi Han received the BEng degree in 2007 and the
Comput., 2011, pp. 605–612. MEng degree in computer science in 2009, both
[26] J. Wu, L. Ding, Y. Wang, and W. Han, “Identification and evalua- from Wuhan University, China. He is currently
tion of sharing memory covert timing channel in xen virtual working toward the PhD degree in the University
machines,” in Proc. 4th IEEE Int. Conf. Cloud Comput., 2011, of Melbourne. His research interests include
pp. 283–291. networking, distributed systems, and network
[27] Y. Xu, M. Bailey, F. Jahanian, K. Joshi, M. Hiltunen, and R. and system security.
Schlichting, “An exploration of L2 cache covert channels in
virtualized environments,” in Proc. 3rd ACM Workshop Cloud Com-
put. Security, 2011, pp. 29–39.
[28] Y. Zhang, A. Juels, A. Oprea, and M. K. Reiter, “HomeAlone:
Co-residency detection in the cloud via side-channel analysis,” in
Proc. IEEE Symp. Security Privacy, 2011, pp. 313–328. Jeffrey Chan received the BEng, BSci, and PhD
[29] Z. Wang and R. B. Lee, “Covert and side channels due to proces- degrees all from the University of Melbourne. He
sor architecture,” in Proc. 22nd Annu. Comput. Security Appl. Conf., is currently a lecturer at RMIT University and an
2006, pp. 473–482. honorary fellow at the University of Melbourne.
He has more than 30 publications in graph
[30] Z. Wang and R. B. Lee, “New cache designs for thwarting soft-
mining, social network analysis, and data mining
ware cache-based side channel attacks,” in Proc. 34th Annu. Int.
Symp. Comput. Archit., 2007, pp. 494–505. and his research interests include data analytics,
[31] J. Szefer, E. Keller, R. Lee, and J. Rexford, “Eliminating the hyper- analyzing graphs and social networks, and learn-
visor attack surface for a more secure cloud,” in Proc. 18th ACM ing about new and exciting research.
Conf. Comput. Commun. Security, 2011, pp. 401–412.
[32] Y. Zhang and M. K. Reiter, “D€ uppel: Retrofitting commodity oper-
ating systems to mitigate cache side channels in the cloud,” in Proc. Tansu Alpcan received the BS degree in elec-
ACM SIGSAC Conf. Comput. Commun. Security, 2013, pp. 827–838. trical engineering from Bogazici University,
[33] V. Varadarajan, T. Ristenpart, and M. Swift, “Scheduler-based Istanbul, Turkey in 1998. He received the MS
defenses against cross-VM side-channels,” in Proc. 23rd USENIX and PhD degrees in electrical and computer
Security Symp., 2014, pp. 687–702. engineering from the University of Illinois at
[34] A. Bates, B. Mood, J. Pletcher, H. Pruse, M. Valafar, and K. Urbana-Champaign (UIUC) in 2001 and 2006,
Butler, “Detecting co-residency with active traffic analysis respectively. His research involves applications
techniques,” in Proc. ACM Workshop Cloud Comput. Security of distributed decision making, game theory,
Workshop, 2012, pp. 1–12. communication and control to various security
[35] S. Yu, X. Gui, J. Lin, X. Zhang, and J. Wang, “Detecting VMs co- and resource allocation problems in networked
residency in cloud: Using cache-based side channel attacks,” Elek- and energy systems. He has been a senior
tronika ir Elektrotechnika, vol. 19, no. 5, pp. 73–78, 2013. lecturer in the Department of Electrical and Electronic Engineering,
[36] A. Bates, B. Mood, J. Pletcher, H. Pruse, M. Valafar, and K. Butler, University of Melbourne, Australia, since October 2011.
“On detecting co-resident cloud instances using network flow
watermarking techniques,” Int. J. Inf. Security, vol. 13, no. 2,
pp. 171–189, 2014. Christopher Leckie received the BSc degree in
[37] S. Sundareswaran and A. Squcciarini, “Detecting malicious co-res- 1985, the BE degree in electrical and computer
ident virtual machines indulging in load-based attacks,” in Proc. systems engineering (with first class honors) in
15th Int. Conf. Inf. Commun. Security, 2013, pp. 113–124. 1987, and the PhD degree in computer Science
[38] S. Yu, X. Gui, and J. Lin, “An approach with two-stage mode to in 1992, all from Monash University, Australia. He
detect cache-based side channel attacks,” in Proc. Int. Conf. Inf. joined Telstra Research Laboratories in 1988,
Netw., 2013, pp. 186–191. where he conducted research and development
[39] M. Li, Y. Zhang, K. Bai, W. Zhang, M. Yu, and X. He, into artificial intelligence techniques for various
“Improving cloud survivability through dependency based telecommunication applications. In 2000, he
virtual machine placement,” in Proc. Int. Conf. Security Cryptog- joined the University of Melbourne, Australia,
raphy, 2012, pp. 321–326. where he is currently a professor with the Depart-
[40] Y. Zhang, M. Li, K. Bai, M. Yu, and W. Zang, “Incentive compati- ment of Computing and Information Systems. His research interests
ble moving target defense against VM-colocation attacks in include using artificial intelligence for network management and intrusion
clouds,” in Proc. 27th IFIP TC 11 Inf. Security Privacy Conf., 2012, detection, and data mining techniques such as clustering.
pp. 388–399.
[41] Y. Azar, S. Kamara, I. Menache, M. Raykova, and B. Shepard, " For more information on this or any other computing topic,
“Co-location-resistant clouds,” in Proc. 6th ACM Workshop Cloud
please visit our Digital Library at www.computer.org/publications/dlib.
Comput. Security, 2014, pp. 9–20.

uthorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 25,2025 at 17:39:15 UTC from IEEE Xplore. Restrictions apply

You might also like