0% found this document useful (0 votes)
31 views15 pages

RP ML Workflow Scheduling

The paper addresses the Cost-Aware Dynamic Multi-Workflow Scheduling (DMWS) problem in cloud data centers, focusing on optimizing VM rental fees and SLA violation penalties for workflow brokers. It introduces a novel priority-based deep neural network scheduling policy and a new Evolutionary Strategy based Reinforcement Learning (ES-RL) approach, which effectively trains scheduling policies that adapt to changing workloads. Experimental results demonstrate that the proposed ES-RL can significantly reduce costs by over 90% compared to existing scheduling methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views15 pages

RP ML Workflow Scheduling

The paper addresses the Cost-Aware Dynamic Multi-Workflow Scheduling (DMWS) problem in cloud data centers, focusing on optimizing VM rental fees and SLA violation penalties for workflow brokers. It introduces a novel priority-based deep neural network scheduling policy and a new Evolutionary Strategy based Reinforcement Learning (ES-RL) approach, which effectively trains scheduling policies that adapt to changing workloads. Experimental results demonstrate that the proposed ES-RL can significantly reduce costs by over 90% compared to existing scheduling methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Cost-Aware Dynamic Multi-Workflow Scheduling

in Cloud Data Center Using Evolutionary


Reinforcement Learning⋆

Victoria Huang1 , Chen Wang1 , Hui Ma2 , Gang Chen2 , and Kameron Christopher1
1
National Institute of Water and Atmospheric Research, Wellington, New Zealand
{victoria.huang, chen.wang, kameron.christopher}@niwa.co.nz
2
Victoria University of Wellington, Wellington, New Zealand
{hui.ma, aaron.chen}@ecs.vuw.ac.nz

Abstract. The Dynamic Multi-Workflow Scheduling (DMWS) prob-


lem aims to allocate highly complex tasks modeled as workflows to
cloud resources while optimizing workflow brokers’ interests. A work-
flow broker offers workflow execution services to end-users with agreed
Service Level Agreements (SLA) while reducing its total VM rental fees
in the meantime. Most existing DMWS-related research works focus on
minimizing the workflow makespan by using either heuristics or hyper-
heuristics techniques. However, these techniques were either designed for
static workflow scheduling based on prior workflow information and/or
the simplified cloud environment. In this paper, the DMWS problem is
formulated to collectively minimize VM rental fees and SLA violation
penalties. Moreover, we introduce a novel priority-based deep neural
network scheduling policy that can flexibly adapt to a changing num-
ber of VMs and workflows. To train the new policy, a new Evolutionary
Strategy based Reinforcement Learning (ES-RL) is developed and im-
plemented. Different from gradient-based deep reinforcement learning
algorithms, ES-RL has its advances in effectively training population-
based and generally applicable policies in parallel as well as robustness
to hyper-parameter settings. Our experiments with real-world datasets
show that ES-RL can effectively train scheduling policies that can sig-
nificantly reduce the costs by more than 90% compared to the state-of-
the-art scheduling policies.

Keywords: Dynamic workflow scheduling · Cloud computing · Rein-


forcement learning · SLA violation · Evolutionary strategy.

1 Introduction
Large-scale and highly complex computational applications (e.g., weather fore-
casting and Tsunami prediction) are usually modeled as workflows in cloud [17,

This work is in part supported by the NZ Government’s Strategic Science Invest-
ment Fund (SSIF) and the New Zealand Marsden Fund with the contract number
(VUW1510), administrated by the Royal Society of New Zealand.
2 V. Huang et al.

18]. A workflow consists of a set of inter-dependent tasks connected by directed


edges. These workflows are often outsourced to workflow brokers that offer work-
flow execution services to users [29,31]. A workflow broker usually uses computa-
tion resources, e.g., Virtual Machines (VMs), leased from cloud providers [10,28]
to reduce the maintenance cost [28,29]. Service Level Agreements (SLAs) are of-
ten established between the users and the workflow brokers [28]. Brokers are
highly motivated to comply with the commitments in SLAs, e.g., deadline con-
straints, in order to avoid paying SLA violation penalties [28, 30].
The process of Workflow Scheduling (WS) starts from users dynamically
submitting workflows to brokers along with specified deadlines. Upon receiving
a workflow, the broker makes scheduling decisions in real time which include the
selection of VM resources (e.g., the VM number and types) and allocation from
workflow tasks to VMs. Often, a broker needs to schedule multiple workflows for
a customer. The goal of the broker is to maximize its profit by minimizing the
VM rental fees and SLA violation penalties.
The WS problem is known to be NP-hard [9, 15] and has been widely in-
vestigated [9, 10]. For example, GRP-HEFT [9] schedules a given workflow to
minimize the makespan under a VM rental budget. ProLiS [29] proportionally
assigns a sub-deadline to each task and allocates tasks to VMs that can meet the
deadline constraint as well as minimize the VM rental fees. However, existing
methods [5, 9] were mostly designed for static WS where the workflow infor-
mation (e.g., the arrival time, the number, and types of workflows) is known
in advance. Moreover, many methods have been proposed with different goals
and constraints, such as minimizing the workflow makespan with a budget con-
straint in GRP-HEFT [9] or minimizing the budget while satisfying the deadline
constraint in ProLiS [29]. Different from these works, in this paper a workflow
broker aims to strike a desirable trade-off between reducing the SLA penalty
and the total VM rental fee such that the overall cost involving both of the two
can be minimized. Moreover, existing algorithms focus on developing heuris-
tics [3, 5, 9, 15, 29] or hyper-heuristics [1, 8, 16, 31] based on simplified cloud en-
vironment as well as prior information of all workflows to be scheduled. Some
works only considered scheduling one workflow at a time [7].
In this paper, we consider a Cost-aware Dynamic Multi-Workflow Scheduling
(DMWS) problem for workflow brokers, where different workflows are dynam-
ically sent to brokers for execution. The broker needs to rent proper VMs and
schedule workflows on the rented VMs in real-time to minimize VM rental fees
and SLA violation penalties. Note that the number of VMs also needs to accom-
modate the workflow dynamics. Therefore, DMWS involves decisions of dynami-
cally adjusting the number of rented VMs and allocating workflow tasks to VMs.
Since WS decisions at a given time are affected by previous decisions and the cur-
rently available VM resources, this is a sequential decision problem. Therefore,
Deep Reinforcement Learning (DRL) is a promising direction towards tackling
this challenging problem. However, existing Q-learning or gradient-based DRL
approaches for WS [7, 13, 20, 25] have certain limitations: (1) They do not guar-
antee high scalability due to the assumption that the number of VMs is pre-
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 3

determined and remains fixed [13, 20, 25]. (2) Their performance is sensitive to
hyper-parameter settings while hyper-parameter search is difficult.
The aim of this paper is to propose an effective approach for training newly
designed scheduling policies to handle a changing number of VMs and workflows
to address the cost-aware DMWS. Specifically, we design a new Deep Neural
Network (DNN) based scheduling policy to dynamically rent new VMs and allo-
cate simultaneously any tasks ready for execution to the rented VMs. We show
that the designed scheduling policies can be used to efficiently assign a priority
value to each candidate VM for a given task. The VM with the highest priority
will be selected for renting and/or task execution. In line with the new policy
network design, a new training approach called Evolutionary Strategy based
Reinforcement Learning (ES-RL) for DMWS is proposed. Our new training ap-
proach features the use of Evolutionary Strategy (ES), a deep neuroevolution
algorithm, to achieve stable and effective training of our policy network. Mean-
while, ES-RL is not sensitive to hyper-parameter settings and offers significant
performance gain and time reduction through its parallel training capabilities.
Specifically, the key contributions of this paper are listed as follows:
– A Priority-based Deep Neural Network (DNN) scheduling policy design is
proposed to flexibly adapt to a changing number of VMs and workflows.
– An evolutionary DRL approach called Evolutionary Strategy based Rein-
forcement Learning (ES-RL) is proposed to achieve robust and time-efficient
training of new policy networks that can solve the DMWS problem effec-
tively.
– Extensive experiments with real-world datasets have been conducted to show
that the scheduling policies trained by ES-RL can significantly reduce the
costs by more than 90% compared to the state-of-the-art WS approaches.

2 Related Work

Problem formulation: Existing Workflow Scheduling (WS) studies can be


divided into static and dynamic WS.
Static WS assumes the workflow information (e.g., arrival time, workflow
type, and the number of workflows) is known in advance. Given the workflow
information, the scheduling decisions are made offline and remained fixed during
the workflow execution. Most of the existing works belong to this category [3,5,9,
29]. However, the assumption on prior knowledge of the workflow information in a
cloud environment may not be practical because users can submit their workflows
at any time and the workflows from one user can also vary in terms of structure
and size from time to time [6,15]. Although an alternative way is to schedule the
workflows periodically (e.g., batch scheduling) [2], deciding a suitable scheduling
period is critical and challenging. For example, a short scheduling period can
significantly increase VM rental fees due to low VM utilization (see GRP-HEFT
performance in Sec. 6.2) while a long scheduling period introduces long workflow
waiting time, potentially leading to high SLA violation penalties.
4 V. Huang et al.

Dynamic WS makes scheduling decisions at run-time according to the cur-


rent cloud environment. Unlike static WS, dynamic WS only received limited
research interests [6, 15]. For example, the existing studies [6, 15] considered the
workflow scheduling problem with the goal of minimizing the VM rental costs
while treating the workflow deadline as a hard constraint. However, sometimes
it can be more cost-efficient by paying the SLA penalties rather than renting
additional/expensive VMs (see the comparison between ES-RL and ProLiS in
Sec. 6.2). The trade-off between VM rental fees and SLA violation was not cap-
tured by the models proposed in existing works [9, 29]. Motivated by [11, 31], in
this paper, we study the dynamic WS problem with the aim to optimize both
VM rental fees and SLA violation penalties. That is, we consider the trade-off
between rental fees and SLA violation fees to minimize the overall cost of brokers
and therefore maximize their profits.

Algorithm design: Most existing works focus on developing heuristics or


hyper-heuristics to generate approximate or near-optimal solutions for the NP-
hard WS problem with constraints. For example, GRP-HEFT [9] was proposed
which selected VMs using a greedy heuristic under a budget constraint and allo-
cated tasks using a modified HEFT [26]. To solve the deadline constrained WS
problem of a single workflow, ProLiS [29] distributed the user-assigned deadline
to each task in the workflow and subsequently allocated the tasks to VMs in
order to meet their sub-deadlines. Other heuristics can also be found in [3,5,15].
However, most of them rely on a simplified cloud environment and full knowledge
of all workflows to be scheduled, which potentially limits their practical appli-
cability. Apart from that, many of them [3, 5, 9] are designed for static work-
flow scheduling. Alternatively, meta-heuristic (e.g., Particle Swarm Optimiza-
tion) [19, 21] and hyper-heuristic methods (e.g., Genetic Programming) [8, 31]
have been applied to WS. However, these approaches either assume the work-
flow information is known in advance (i.e., static WS) or generate heuristics
based on historical data.
Deep Reinforcement Learning (DRL) has been applied for WS due to its
ability to optimize a solution via interacting with an unknown environment
[7, 13, 20, 25, 27]. For example, a deep-Q-network based DRL algorithm was pro-
posed [27] to optimize the workflow makespan and user’s cost. However, existing
works have certain limitations. First, they are usually designed with a given and
fixed number of VMs. However, the given number of VMs may not be optimal
to handle the changing workloads [13, 20, 25, 27]. Second, the performance of
most DRL algorithms [23, 24] is sensitive to the hyper-parameter setting while
hyper-parameter search is difficult [14].
To cope with these limitations, Evolutionary Strategy (ES) is leveraged in
this paper to train the scheduling policy. ES is a population-based approach that
evolves DNNs by simulating the process of natural selection. Existing studies [22]
have shown that ES can achieve competitive performance with DRL algorithms.
Moreover, ES is highly parallelizable and has fewer hyper-parameters needed to
be tuned compared to DRL algorithms.
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 5

3 Problem Formulation
In this paper we study the cost-aware Dynamic Multi-Workflow Scheduling
(DMWS) problem. This section presents a formal definition of the problem.

Cloud Environment: We consider a cloud data center equipped with a set


of VM types. Thanks to the elasticity feature in the cloud, we assume that the
number of VMs with each VM type for renting is “unlimited”. A VM v with type
T ype(v) can be described as:

v = ⟨T ype(v), Capa(v), P rice(v)⟩

where Capa(v) is the VM processing capacity measured in Compute Units [9]


and P rice(v) is the rental fee of each time unit depends on T ype(v). Following
existing studies [9,31], we consider the rental time unit as one hour in this paper.

VM Rental Fee: For the DMWS problem, we consider a time interval T =


(ts , te ) where ts and te are the starting and ending time. Within T , the same
type of VM can be rented for multiple times. We denote the set of rental periods
for a VM v with type T ype(v) within T as RT (v, T ):

RT (v, T ) = {(V M ST (v, k, T ), V M F T (v, k, T ))|k = 1, ...}

where (V M ST (v, k, T ), V M F T (v, k, T )) is the k th time pair for v within time


period T . V M ST (v, k, T ) is the rental start time which begins when a workflow
task is allocated at v and V M F T (v, k, T ) is the corresponding rental finish time.
The VM rental fees under a scheduling policy π can be calculated as follows:
 

 
X
P rice(v) ×
X t 2 t 1
RentF ee(π, T ) = 
3600
v∈I(π,T ) (t1 ,t2 )∈RT (v,T )

where I(π, T ) is the set of VMs being rented within the time period T .

Workflow Model: A workflow w is represented as a Directed Acyclic Graph


associated with its arrival time ArrT (w) and a user-specified deadline DL(w).

w = ⟨DAG(w), ArrT (w), DL(w)⟩

where DAG(w) includes a set of tasks {T ask(w, i)|i ∈ {1, 2, ...}} and directed
edges connecting the tasks to enforce their execution order. Note that T ask(w, i)
is associated with execution time Ref T (T ask(w, i)) and can only be executed if
all its predecessor tasks P re(T ask(w, i)) are completed. T ask(w, i) is an entry
task if P re(T ask(w, i)) = ∅. Similarly, the successors of T ask(w, i) are denoted
as Suc(T ask(w, i)). A task with no successors is an exit task.
In this paper, we assume that the arrival time of any new workflows is not
known in advance. To avoid SLA violation and flexibly utilize the VM resources,
6 V. Huang et al.

scheduling decisions are made in real time, e.g., whenever a task is ready. In
particular, a task T ask(w, i) is defined as ready if it is either an entry task
of a workflow (i.e., P re(T ask(w, i)) = ∅) or a task with all its predecessors
P re(T ask(w, i)) completed. We define a set of candidate VMs as CV M (t) which
includes all leased VMs at time t and a set of VM options with all VM types
that can be created. Whenever T ask(w, i) is ready at time t, π selects a VM v
from CV M (t) for T ask(w, i) allocation.
Following π, the start time for T ask(w, i) is ST (T ask(w, i), π) and the com-
pletion time is
Ref T (T ask(w, i))
CT (T ask(w, i), π) = ST (T ask(w, i), π) +
Capa(v(π))
Thus, the completion time W CT of a workflow w is the maximum completion
time among all tasks:
W CT (w, π) = max {CT (T ask(w, i), π)}
T ask(w,i)∈DAG(w)

SLA Penalty: Following existing works [28, 31], the SLA violation penalty of
workflow w can be defined as follows:
(
0, if W CT (w, π) ≤ DL(w)
P enalty(w, π) =
ϵ + β(w) × (W CT (w, π) − DL(w)), otherwise

where ϵ is a constant and β(w) is the penalty rate for w.


The goal of the cost-aware DMWS problem is to find π to schedule a set of
workflows W (T ) = {w|ArrT (w) < te } that arrive during T , so as to minimize
SLA violation penalties and VM rental fees:
X
argmin P enalty(w, π) + RentF ee(π, T ) (1)
π
w∈W (T )

4 Priority-based DNN Policy Design


In the DMWS problem, the scheduling decision needs to be made in real time
with minimum delay. Therefore, π must select a suitable VM quickly whenever
a task is ready. Meanwhile, in order to tackle environment dynamics and cap-
ture the most recent information (e.g., how close is the workflow deadline), the
scheduling decision is made as soon as the task is ready and before it is assigned
to a VM.
In this paper, a design of priority-based Deep Neural Network (DNN) policy is
proposed. As shown in Fig. 1, the policy π consists of three major components:
the state extraction function O, the priority function fθ parameterized by θ,
and the mapping function Φ. Whenever a task is ready at time t, the policy
π examines the VM status z(v, t) extracted by O for ∀v ∈ CV M (t). Then π
assigns a priority value p(v, t) to each VM v using fθ . Based on the priorities, a
VM is selected using Φ.
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 7

VMs from Cloud provider

Scheduling
Schedulingpolicy ⇡ from
policy from SaaS provider
workflow broker VM
Dynamic workflows submitted by users


VM



O



f✓ VM
Queuing Executing
tasks tasks

Fig. 1: The dynamic workflow scheduling system.

State Extraction: At time t, we use S(t) to capture the state information of


the current cloud environment including static information (e.g., the VM rental
price) and dynamic information (e.g., VM rental period, availability, workflow
processing information, etc). To allow the policy to be applied to a varying
number of VMs, a state extraction function O is proposed to extract essential
information regarding any given VM v from CV M (t) and the ready task to be
scheduled:
z(v, t) = O(S(t), v)
Whenever a task is ready, only information of one VM instead of all VM is fed
into fθ . Therefore, fθ can be flexibly applied with a changing number of VMs.
Intuitively, the priority value of a VM depends on the ready task and the VM.
Therefore, z(v, t) includes both workflow-related and VM-related information.
Given a ready task T ask(w, i) from a workflow w, we identify the following
workflow-related information to estimate the workflow remaining processing time
and predict future workload:
– the number of its successors |Suc(T ask(w, i))|
– the workflow completion ratio3
– the estimated workflow arrival rate
For VM-related information, we estimate whether a VM is a good fit depending
on if it will satisfy the deadline, introduce additional rental fees, and remain any
rental time:
– a Boolean value indicating whether the VM can satisfy the task deadline4
– the potential cost of using the VM5
– the VM remaining rental time after allocating the ready task
– a Boolean value indicating whether the current VM is the one with the lowest
cost and can satisfy the deadline.
The state extraction process can be formulated as follows:
Z(t) = [z(v, t)]v∈CV M (t) = [O (S(t), v)]v∈CV M (t) , t ∈ T (2)
3
The workflow completion ratio is the ratio of the number of completed tasks to the
total number of tasks from the workflow.
4
Motivated by ProLiS [29], we assign a deadline to a task based on the proportion of
its computational time to the overall workflow computational time
5
The potential cost is the sum of new VM rental fee and deadline violation penalty.
8 V. Huang et al.

Priority Mapping: Using the extracted state features z(v, t), the priority func-
tion fθ with trainable parameters θ calculates a priority value p(v, t) for every
VM candidate v:

P (t) = [p(v, t)]v∈CV M (t) = [fθ (z(v, t))]v∈CV M (t)

In this paper, DNN is adopted to implement the priority function.Meanwhile,


neural engines and similar hardware technologies can quickly process our neural
networks for priority mapping during practical use.

VM Selection: Given the priorities, a VM a(t) with the highest priority value
is selected by Φ:

a(t) = Φ(P (t)), i.e., a(t) = arg max (P (t))


v∈CV M (t)

The scheduling policy can be represented as follows:



a(t) = π(S(t)) = Φ [fθ (O (S(t), v))]v∈CV M (t) (3)

5 Evolutionary Reinforcement Learning


Training a policy can be considered as a DRL task. However, as we discussed in
Sec. 2, existing DRL-based approaches for WS assume the number of VMs is pre-
determined and fixed. Therefore, they cannot scale to a network with a different
number of VMs. Meanwhile, their performance highly relies on hyper-parameter
tuning.
To tackle these problems, we introduced a new Evolutionary Strategy based
Reinforcement Learning (ES-RL) approach for DMWS. The pseudo-code of ES-
RL is presented in Algorithm 1. In particular, ES-RL adopts the ES frame-
work introduced by OpenAI in [22] for training scheduling policies. ES-RL is a
population-based optimization method that runs iteratively, as shown in Fig. 2.
At each iteration, given the current policy parameters θ̂, ES-RL samples a pop-
ulation of N individuals [θi ]i=1,..,N from an isotropic multi-variance Gaussian
with mean θ̂ and fixed covariance σ 2 I, i.e., θi ∼ N (θ̂, σ 2 I), which is equivalent
to
θi = θ̂ + σϵi , ϵi ∼ N (0, I)
The fitness value F (θ̂+σϵi ) of each individual θ̂+σϵi is evaluated by applying
the perturbed policy πθ̂+σϵi in the cloud environment as discussed in Sec. 4.
In line with our objective function in Eq. (1), we define the fitness function
F (θ̂ + σϵi ) as the total cost incurred over T :
X
F (θ̂ + σϵi ) = P enalty(w, π) + RentF ee(π, T ) (4)
w∈W (T )

The goal of ES-RL is to find θ that can minimize the total cost defined in
Eq. (4) through minimizing the expected objective value over the population
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 9

Cloud environment

Step 3: Runtime system VM


Scheduling policy ⇡ state Z(t), Total cost F(t)


VM
… …

… …
O Step 4: Scheduling


decision a(t)
VM


Step 2: Update Dynamic Queuing Executing
individuals ✓ + ✏ workflows tasks tasks

Step 6: Policy gradient


Step 5: Calculate
Step 7: Update f✓ r✓ E✓⇠p F (✓) Policy gradient fitness F (✓ + ✏)
Generate Individual Calculate ✓ calculation
using Eq.(5)

2
Step 1: Sample noise ✏ ⇠ N( , I)
Multi-variance Gaussian
2
N( , I)

Fig. 2: Scheduling policy training using ES-RL.

Algorithm 1 ES-RL for DMWS


1: Input: Population size N , initial policy parameter θ̂, learning rate α; Gaussian
noise standard deviation σ
2: Output: Scheduling policy
3: while the number of generations < the max number of generations: do
4: for each individual i = 1, ..., N do
5: Sample ϵi ∼ N (0, I)
6: Update the WF scheduling policy πi using θi = θ̂ + σϵi
7: Evaluate fitness F (θi ) in cloud environment using Eq. (4)
8: end for
9: Estimate policy gradient ∇θ Eθ∼pψ F (θ) using Eq. (5)
10: Update θ̂ ← θ̂ + α∇θ Eθ∼pψ F (θ)
11: end while

distribution, i.e., Eϵ∼N (0,I) F (θ + σϵi ). To achieve this goal, ES updates θ using
the following estimator:

1
∇θ Eθ∼pψ F (θ) = ∇θ Eϵ∼N (0,I) F (θ + σϵ) = Eϵ∼N (0,I) F (θ + σϵ)ϵ
 
σ
N (5)
1 X 
≈ F (θ + σϵi )ϵi
N σ i=1
10 V. Huang et al.

Table 1: VM Setting based on Amazon EC2


Type vCPU Compute Unit Memory(GB) Cost($ per hour)
m5.large 2 10 8 0.096
m5.xlarge 4 16 16 0.192
m5.2xlarge 8 37 32 0.384
m5.4xlarge 16 70 64 0.768
m5.8xlarge 32 128 128 1.536
m5.12xlarge 48 168 192 2.304

6 Performance Evaluation

To evaluate the performance of our proposed ES-RL approach, we conduct exper-


imental evaluations using a simulator based on real-world data from cloud data
centers and benchmark workflows, comparing with two state-of-the-art methods.

6.1 Simulation Setting

Cloud Environment: We consider a cloud data center equipped with six differ-
ent VM types. Following [3,9], VMs are configured based on the general-purpose
on-demand VM group in the US East region provided by Amazon EC2. VM
details are summarized in Tab. 1 which were collected in September 2020 from
Amazon6 .

Workflows: Following existing studies [3, 9, 29], four classes of scientific work-
flows are used in our experiment which include CyberShake, Inspiral, Montage,
and Sipht. All workflows are computational intensive and their communication
time is trivial compared to the task execution time [9]. Detail analysis of these
workflows can be found in [4,12]. Note that workflows in each class share similar
structures but can differ from the numbers of their tasks (e.g., ranging from
25 to 50). During our training, we simulate small workflows from four classes
where each of them contains 25 to 30 tasks. Nevertheless, our trained scheduling
policy is generalized well and can be directly applied to large workflows with
50 tasks without retraining. Following [6, 15], we simulate the dynamic arrival
of workflow applications by randomly sampling 30 workflows from four classes
of workflows in each simulation. The arrival time of workflows follows a Poisson
distribution with λ = 0.01 [15]. The penalty rate β is set to be $0.24 per hour
for each workflow [32].
To compare the algorithm performance under different deadlines, a deadline
relaxation factor η is used. Similar to [3], we consider η changes from 1 to 2.25.
Given η, the deadline of a workflow w is set as below:

DL(w) = ArrT (w) + η × M inM akespan(w)


6
https://aws.amazon.com/ec2/pricing/on-demand/
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 11

where M inM akespan(w) is the shortest makespan of each workflow w as its


execution time when its tasks are executed by the fastest VM v from Tab. 1 (i.e.,
m5.12xlarge).

Algorithm Implementation: We implement ES-RL based on the code re-


leased by OpenAI7 . In terms of parameter settings, we set the Gaussian noise
standard deviation σ = 0.05 and the learning rate α = 10−2 . The population
size N is 40 and each individual is evaluated for 1 episodes (i.e., NE =1). The
maximum generation number is 3000. For the priority function design, we fol-
low the DNN architecture used by OpenAI baselines. Specifically, it is a fully
connected multilayer feed-forward neural network with two hidden layers. Each
hidden layer consists of 64 nodes with the tanh activation function.

Baseline Algorithms: We compared ES-RL with two state-of-the-art schedul-


ing algorithms (GRP-HEFT [9] and ProLiS [29]) which have similar objectives
as (1). As we discussed in Sec. 2, both GRP-HEFT and ProLiS were designed
for static WS of one single workflow. We need to adapt them to the DMWS
problem. To enable GRP-HEFT to be comparable with ES-RL, we minimize the
budget constraint for every newly arriving workflow by incrementally increas-
ing the budget that is passed to GRP-HEFT as a constraint until the workflow
satisfies its deadline. To enable ProLiS to be applicable for dynamic workflow
arrival, ProLiS is triggered to assign deadlines to all tasks once a workflow ar-
rives. Whenever a task is ready, the cheapest VM from CV M (t) that can meet
the task deadline will be selected.

6.2 Simulation Results

To demonstrate the effectiveness of ES-RL, we compare its performance (i.e.,


overall cost) with GRP-HEFT and ProLiS under different η (i.e., deadline relax-
ation factor) as shown in Tab. 2. Note that a smaller η implies a tighter deadline.
We also analyze the testing performance among all algorithms with respect to
VM cost ($), SLA penalty ($), and VM utilization (%), as shown in Fig. 3. Note
that GRP-HEFT causes a high VM cost which ranges from $830 to $1988. Thus,
to show the difference in VM rental cost in Fig. 3(a), we set the y-axis limit to
an upper value (i.e., 420).
As shown in Tab. 2, we can observe that the overall cost of ProLiS and
GRP-HEFT increases as η decreases (i.e., a tighter deadline). This is mainly
because both ProLiS and GRP-HEFT consider the workflow deadline as a hard
constraint. In other words, they always select VMs that can satisfy the workflow
deadlines. This explanation also matches well with our observation in Fig. 3(b)
where the SLA penalty remains 0 regardless of the η changes. With a tight
deadline, more powerful VMs in terms of computational capacity are required.
As a result, an increase in VM cost can be observed in Fig. 3(a) as η decreases.
7
https://github.com/openai
12 V. Huang et al.

400 400 0.8


350 1.0 350 1.0 1.0
1.25 1.25 0.7 1.25
300 1.5 300 1.5 0.6 1.5

SLA Penalty ($)

VM Utilization
VM Cost ($) 250 1.75 250 1.75 0.5 1.75
2.0 2.0 2.0
200 2.25 200 2.25 0.4 2.25
150 150 0.3
100 100 0.2
50 50 0.1
0 0 0.0
ProLiS GRP-HEFT ES-RL ProLiS GRP-HEFT ES-RL ProLiS GRP-HEFT ES-RL

(a) VM cost (b) SLA penalty (c) VM utilization

Fig. 3: Comparison of GRP-HEFT [9], ProLiS [29], and ES-RL on different fac-
tors with respect to different η with small workflows.

Table 2: The average fitness values (i.e., total cost) tested over multiple runs
for GRP-HEFT [9] and ProLiS [29], and ES-RL with different η with small
workflows. (Note: a lower value is better)
η ProLiS GRP-HEFT ES-RL
1.00 395.5520 ± 11.633617 - 74.490828 ± 7.060981
1.25 108.5184 ± 15.815567 1775.9232 ± 161.077450 71.817811 ± 7.664415
1.50 91.7376 ± 12.662934 1171.8144 ± 75.565892 73.737249 ± 6.339560
1.75 83.4304 ± 10.828026 1258.9056 ± 90.980289 71.436684 ± 5.725663
2.00 76.9088 ± 10.752586 1164.7488 ± 76.973755 68.036960 ± 5.993622
2.25 68.3456 ± 10.332963 969.9840 ± 102.602041 65.590513 ± 7.967546

In comparison, ES-RL consistently outperforms both ProLiS and GRP-HEFT


with the lowest overall cost as highlighted in Tab. 2. This is achieved by balanc-
ing the trade-off between VM rental fees and SLA penalties. As demonstrated
in Fig. 3, ES-RL maintains the low overall costs by renting cost-effective VMs
(see Fig. 3(a)) as well as utilizing existing VMs (see Fig. 3(c) for the high VM
utilization). As a result, ES-RL can violate the workflow deadlines and therefore
introduces SLA penalties (see Fig. 3(b)). Meanwhile, when η increases, the SLA
penalty decreases because when the deadline becomes looser, VM selection has
less impact on the SLA penalty.
Another interesting observation is that GRP-HEFT has the highest overall
cost among the three approaches. This is mainly because GRP-HEFT is designed
for static WS. In a scenario when two tasks from the same workflow are assigned
to the same VM, the idle time slot between the two tasks cannot be utilized
by a different workflow, leading to low VM utilization. This also matches our
observation in Fig. 3(c) where GRP-HEFT presents the lowest VM utilization.
We also investigate the generalization capability of our trained scheduling
policy. In particular, we define the generalization capability as the policy that
was trained using small workflows can still be able to effectively schedule large
workflows. The results are shown in Fig. 4. From the figures, we can see that ES-
RL still managed to reduce the overall cost by balancing the trade-off between
VM rental fees and SLA penalties. Meanwhile, compared to ProLiS, ES-RL can
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 13

achieve significantly higher VM utilization. In general, our observations of ES-


RL with large workflows are consistent with the results with small workflows
shown in Fig. 3. Our results demonstrate that ES-RL can effectively train a
generalized scheduling policy that can flexibly adapt to not only the changing
number of VMs but also workflows with different sizes.

400 1.0 400 1.0


1.25 1.25
300 1.5 300 1.5
Total Cost ($)

1.75 1.75

VM Cost ($)
2.0 2.0
200 2.25 200 2.25

100 100

0 0
ProLiS ES-RL ProLiS ES-RL

(a) Total cost (b) VM cost


400
350 1.0 0.5 1.0
1.25 1.25
300 1.5 0.4 1.5
SLA Penalty ($)

VM Utilization

250 1.75 1.75


2.0 0.3 2.0
200 2.25 2.25
150 0.2
100
0.1
50
0 0.0
ProLiS ES-RL ProLiS ES-RL

(c) SLA penalty (d) VM utilization

Fig. 4: Comparison of ProLiS [29] and ES-RL on different factors with respect
to different η with large workflows.

7 Conclusions
In this paper, we proposed an effective ES based approach for the cost-aware
dynamic multi-workflow scheduling (DMWS) problem. In particular, we formu-
late a dynamic multi-workflow scheduling problem with the goal of minimizing
both the VM rental cost and SLA violation penalties. To effectively solve this
problem, we proposed a new scheduling policy design of a priority-based deep
neural network that can be used in a dynamic environment with a changing
number of VMs and workflows. Meanwhile, a new Evolutionary Strategy based
RL (ES-RL) algorithm for DMWS is proposed to efficiently train a generally ap-
plicable scheduling policy in parallel. Our experiments with real-world datasets
showed that the scheduling policies trained by ES-RL can effectively reduce the
overall costs compared to two state-of-the-art algorithms.
14 V. Huang et al.

References
1. Ahmad, S.G., Liew, C.S., Munir, E.U., Ang, T.F., Khan, S.U.: A hybrid genetic
algorithm for optimization of scheduling workflow applications in heterogeneous
computing systems. Journal of Parallel and Distributed Computing 87, 80–90
(2016)
2. Alsurdeh, R., Calheiros, R.N., Matawie, K.M., Javadi, B.: Hybrid workflow schedul-
ing on edge cloud computing systems. IEEE Access 9, 134783–134799 (2021)
3. Arabnejad, V., Bubendorfer, K., Ng, B.: Budget and deadline aware e-science work-
flow scheduling in clouds. IEEE Transactions on Parallel and Distributed Systems
30(1), 29–44 (2019)
4. Bharathi, S., Chervenak, A., Deelman, E., Mehta, G., Su, M., Vahi, K.: Character-
ization of scientific workflows. In: 2008 Third Workshop on Workflows in Support
of Large-Scale Science. pp. 1–10 (2008)
5. Byun, E.K., Kee, Y.S., Kim, J.S., Deelman, E., Maeng, S.: Bts: Resource capacity
estimate for time-targeted science workflows. Journal of Parallel and Distributed
Computing 71(6), 848–862 (2011)
6. Chen, H., Zhu, X., Liu, G., Pedrycz, W.: Uncertainty-aware online scheduling for
real-time workflows in cloud service environment. IEEE Transactions on Services
Computing pp. 1–1 (2018)
7. Dong, T., Xue, F., Xiao, C., Zhang, J.: Workflow scheduling based on deep rein-
forcement learning in the cloud environment. Journal of Ambient Intelligence and
Humanized Computing 12(12), 10823–10835 (2021)
8. Escott, K.R., Ma, H., Chen, G.: Genetic programming based hyper heuristic ap-
proach for dynamic workflow scheduling in the cloud. In: International Conference
on Database and Expert Systems Applications. pp. 76–90. Springer (2020)
9. Faragardi, H.R., Saleh Sedghpour, M.R., Fazliahmadi, S., Fahringer, T., Rasouli,
N.: GRP-HEFT: A Budget-Constrained Resource Provisioning Scheme for Work-
flow Scheduling in IaaS Clouds. IEEE Transactions on Parallel and Distributed
Systems 31(6), 1239–1254 (2020)
10. Genez, T.A.L., Bittencourt, L.F., Madeira, E.R.M.: Workflow scheduling for saas /
paas cloud providers considering two sla levels. In: 2012 IEEE Network Operations
and Management Symposium. pp. 906–912 (2012)
11. Hoseiny, F., Azizi, S., Shojafar, M., Tafazolli, R.: Joint qos-aware and cost-efficient
task scheduling for fog-cloud resources in a volunteer computing system. ACM
Transactions on Internet Technology (TOIT) 21(4), 1–21 (2021)
12. Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Charac-
terizing and profiling scientific workflows. Future Generation Computer Systems
29(3), 682–692 (2013)
13. Li, H., Huang, J., Wang, B., Fan, Y.: Weighted double deep q-network based rein-
forcement learning for bi-objective multi-workflow scheduling in the cloud. Cluster
Computing 25(2), 751–768 (2022)
14. Liessner, R., Schmitt, J., Dietermann, A., Bäker, B.: Hyperparameter optimization
for deep reinforcement learning in vehicle energy management. In: ICAART (2).
pp. 134–144 (2019)
15. Liu, J., Ren, J., Dai, W., Zhang, D., Zhou, P., Zhang, Y., Min, G., Najjari, N.: On-
line multi-workflow scheduling under uncertain task execution time in iaas clouds.
IEEE Transactions on Cloud Computing pp. 1–1 (2019)
16. Lopez-Garcia, P., Onieva, E., Osaba, E., Masegosa, A.D., Perallos, A.: Gace: A
meta-heuristic based in the hybridization of genetic algorithms and cross entropy
Cost-Aware DMWS in Cloud Data Center Using Evolutionary RL 15

methods for continuous optimization. Expert Systems with Applications 55, 508–
519 (2016)
17. Masdari, M., ValiKardan, S., Shahi, Z., Azar, S.I.: Towards workflow scheduling
in cloud computing: a comprehensive analysis. Journal of Network and Computer
Applications 66, 64–82 (2016)
18. Oliver, H., Shin, M., Matthews, D., Sanders, O., Bartholomew, S., Clark, A., Fitz-
patrick, B., van Haren, R., Hut, R., Drost, N.: Workflow automation for cycling
systems. Computing in Science & Engineering 21(4), 7–21 (2019)
19. Pandey, S., Wu, L., Guru, S.M., Buyya, R.: A particle swarm optimization-based
heuristic for scheduling workflow applications in cloud computing environments.
In: 2010 24th IEEE International Conference on Advanced Information Networking
and Applications. pp. 400–407 (2010)
20. Qin, Y., Wang, H., Yi, S., Li, X., Zhai, L.: An energy-aware scheduling algorithm
for budget-constrained scientific workflows based on multi-objective reinforcement
learning. The Journal of Supercomputing 76(1), 455–480 (2020)
21. Rodriguez, M.A., Buyya, R.: Deadline based resource provisioningand scheduling
algorithm for scientific workflows on clouds. IEEE Transactions on Cloud Com-
puting 2(2), 222–235 (2014)
22. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as
a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864
(2017)
23. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy
optimization. In: International Conference on Machine Learning (ICML). pp. 1889–
1897 (2015)
24. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy
optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
25. Suresh Kumar, D., Jagadeesh Kannan, R.: Reinforcement learning-based controller
for adaptive workflow scheduling in multi-tenant cloud computing. The Interna-
tional Journal of Electrical Engineering & Education p. 0020720919894199 (2020)
26. Topcuoglu, H., Hariri, S., Min-You Wu: Performance-effective and low-complexity
task scheduling for heterogeneous computing. IEEE Transactions on Parallel and
Distributed Systems 13(3), 260–274 (2002)
27. Wang, Y., Liu, H., Zheng, W., Xia, Y., Li, Y., Chen, P., Guo, K., Xie, H.: Multi-
objective workflow scheduling with deep-q-network-based multi-agent reinforce-
ment learning. IEEE access 7, 39974–39982 (2019)
28. Wu, L., Garg, S.K., Versteeg, S., Buyya, R.: Sla-based resource provisioning for
hosted software-as-a-service applications in cloud computing environments. IEEE
Transactions on Services Computing 7(3), 465–485 (2014)
29. Wu, Q., Ishikawa, F., Zhu, Q., Xia, Y., Wen, J.: Deadline-constrained cost op-
timization approaches for workflow scheduling in clouds. IEEE Transactions on
Parallel and Distributed Systems 28(12), 3401–3412 (2017)
30. Xiaoyong, Y., Ying, L., Tong, J., Tiancheng, L., Zhonghai, W.: An analysis on
availability commitment and penalty in cloud sla. In: 2015 IEEE 39th Annual
Computer Software and Applications Conference. vol. 2, pp. 914–919 (2015)
31. Yang, Y., Chen, G., Ma, H., Zhang, M., Huang, V.: Budget and sla aware dynamic
workflow scheduling in cloud computing with heterogeneous resources. In: 2021
IEEE Congress on Evolutionary Computation (CEC). pp. 2141–2148. IEEE (2021)
32. Youn, C.H., Chen, M., Dazzi, P.: Cloud Broker and Cloudlet for Workflow Schedul-
ing. Springer (2017)

You might also like