0% found this document useful (0 votes)

47 views15 pages

Data Driven EJOR

Uploaded by

mailofsoroush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views15 pages

Data Driven EJOR

Uploaded by

mailofsoroush

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

European Journal of Operational Research 292 (2021) 1004–1018

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier.com/locate/ejor

Decision Support

Multiple kernel learning-aided robust optimization: Learning

algorithm, computational tractability, and usage in multi-stage
decision-making
Biao Han, Chao Shang∗, Dexian Huang
Department of Automation, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China

a r t i c l e i n f o a b s t r a c t

Article history: Robust optimization (RO) has been broadly utilized for decision-making under uncertainty; however, as
Received 22 May 2020 a key issue in RO the design of the uncertainty set could exert significant influence on both the conser-
Accepted 16 November 2020
vatism of solutions and tractability of induced problems. In this paper, we propose a novel multiple kernel
Available online 23 November 2020
learning (MKL)-aided RO framework for data-driven decision-making, by developing an efficient approach
Keywords: for uncertainty set construction from data based on one-class support vector machine. The learnt poly-
Uncertainty modelling hedral uncertainty set not only achieves a compact encircling of empirical data, which alleviates the pes-
Robust optimization simism and reduces the gap between the model and real-world performance, but also ensures structural
Uncertainty set sparsity and computational tractability. The data-driven RO framework enables a handy adjustment of the
Multiple kernel learning conservatism and complexity by simply manipulating two hyper-parameters, thereby being user-friendly
Data-driven decision-making in practice. In addition, the proposed framework applies to adjustable RO (ARO) with the extended affine
decision rule adopted, which helps improving the optimization performance without too much additional
effort. Numerical and application case studies demonstrate the effectiveness of the proposed data-driven
RO framework.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction tive risk evaluation and control, precise knowledge about distri-
bution of uncertainty is necessitated. Unfortunately, in most re-
Decision-making by solving a disciplined optimization problem alistic situations, even knowing the probability distribution is an
with some certain criterion optimized is a common demand aris- unattainable luxury. To alleviate such a concern, robust optimiza-
ing from diverse application areas in science and engineering. In a tion (RO) has been widely used as an effective non-probabilistic al-
real-world environment, parameters in optimization problems are ternative that only requires minimal distributional information. A
almost always influenced by uncertain factors, which make deter- comprehensive summary of developments and applications of RO
ministic models unreliable to some extent. It has been recognized can be found in the monograph (Ben-Tal et al. (2009)) and sev-
that even a small perturbation in nominal parameter may lead to eral review papers (Bertsimas, Brown, & Caramanis, 2011; Gabrel,
a strategy that is completely infeasible (Ben-Tal, El Ghaoui, & Ne- Murat, & Thiele, 2014). The general formulation of RO problems is
mirovski, 2009; Bertsimas & Thiele, 2006). Thus, modeling uncer- expressed as:
tainty has been of common interest across distinct fields. From
min f (x )
the celebrated definition by Camerer and Weber (1992), uncer- x∈R d (1)
tainty can be specified into risk that refers to probabilistic uncer- s.t. g( x ; u ) ∈ A , ∀u ∈ U ,
tainty with a known distribution, and ambiguity that refers to un-
certainty with an unknown distribution. The earliest optimization where some restrictions or constraints are denoted by g(x; u ) :
methods under uncertainty can be traced back to the pioneering Rd × Rn → belonging to A ⊂ for some space , u ∈ Rn is the
work of Dantzig (1955) on stochastic programming and Charnes uncertainty, and U ⊂ Rn is termed as the uncertainty set. Being im-
and Cooper (1959) on chance-constrained programming. For effec- mune against all probable realizations of u in D, the robust formu-
lation (1) has been widely adopted to ensure satisfaction of con-
straints on critical factors such timespan, capacity and safety under
∗
Corresponding author. all plausible scenarios of uncertainty (Balcik & Yanıkoğlu, 2020; Ca-
E-mail addresses: [email protected] (B. Han), c-shang@tsinghua. ballero, Lunday, & Uber, 2021; Dai et al., 2019; Jakubovskis, 2017;
edu.cn (C. Shang), [email protected] (D. Huang). Moret, Babonneau, Bierlaire, & Maréchal, 2020).

https://doi.org/10.1016/j.ejor.2020.11.027
0377-2217/© 2020 Elsevier B.V. All rights reserved.
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

However, (1) involves an infinite number of constraints, which (2016) proposed a correlated polyhedral uncertainty set on the
poses challenges in solving the problem. The design of the uncer- basis of estimated correlation matrix of uncertain coefficients.
tainty set U is a critical issue in RO that shall be guided by the In real-life applications, however, classical uncertainty sets still
following principles. First, an elaborate parameterization is neces- show prominent limitations in uncertainty description. Clearly,
sary to ensure the computational tractability, i.e. the possibility of their geometric structures are fixed a priori, resulting in a lim-
converting (1) into a tractable deterministic problem. On the other ited representation capability, while the underlying realistic dis-
hand, U shall cover all possible realizations of u accurately without tribution is diverse and may be extremely complicated, especially
unnecessary coverage, such that over-pessimistic solutions can be asymmetric. When faced with unknown complicated uncertain-
avoided. Concurrently considering both aspects has been a major ties, there is no effective and systematic guideline for the user to
streamline in RO community studied for decades. One of the ear- choose a suitable type of uncertainty set and determine parameters
liest attempts can be traced back to the box uncertainty set pro- thereof. Although the budget parameter can be specified based
posed by Soyster (1973), which assumes each uncertain parameter on partial distributional information to establish probabilistic guar-
to have independent perturbations within an interval: antees (Guzman, Matthews, & Floudas, 2016; Li, Tang, & Floudas,
2012), a poor specification of “geometry-related” parameters, such
ˆ ◦ ξ
U∞ = u = ū + u ξ ∞ , (2) as T in (5), can still lead to unsatisfactory performance.
where ū ∈ Rn represents the nominal value of uncertainty, u ˆ ∈ Rn Nowadays, we are witnessing a big data era where the data
denotes the “magnitude”, and ξ ∈ Rn is the normalized uncertainty. availability explodes and massive amounts of data are routinely
The element-wise product between two equally sized vectors is collected in many fields. This has spawned the paradigm of data-
denoted by ◦. is the so-called “budget” responsible for control- driven RO that straightforwardly injects data information into a ro-
ling the size of uncertainty set. Although U∞ secures tractability for bust decision-making schema (Bertsimas, Gupta, & Kallus, 2018;
a large class of RO problems, it has been recognized as being too Bertsimas & Thiele, 2006; Shang & You, 2019a). Data-driven RO
conservative for practical usage. Later on, ellipsoidal uncertainty seeks to build a data-driven uncertainty set U from historical
set has been developed independently by Ben-Tal and Nemirovski data D, such that distributional information of the uncertainty can
(1998, 1999), El-Ghaoui and Lebret (1997) and El-Ghaoui, Oustry, be seamlessly encompassed in U and over-conservatism can be
and Lebret (1998): reduced without tedious manual parameter tuning. A variety of
learning-based methods for uncertainty set construction have been
ˆ ◦ ξ
U2 = u = ū + u ξ 2 , (3) proposed, among which the simplest form is the hyper-rectangular
set constructed in a data-driven manner to cover all realizations
based on which the robust counterpart (RC) of a robust linear pro-
of uncertainty with high probability (Margellos, Goulart, & Lygeros,
gram (LP) can be translated into a second-order conic (SOC) pro-
2014). In Campbell and How (2015), a Bayesian nonparametric
gram. In comparison with the box set, the ellipsoidal uncertainty
method is presented under the assumption that the unknown dis-
set has better representation capability at the price of moderately
tribution belongs to a family of Dirichlet process Gaussian mix-
increased computational complexity. Alternatively, the polyhedral
tures, which was later developed and applied to data-driven adap-
uncertainty set is proposed by Bertsimas, Pachamanova, and Sim
tive robust optimization (Ning & You, 2017a; Ning & You, 2017b).
(2004) and Bertsimas and Sim (2003, 2004):
Other related works also develop alternative construction strate-

ˆ ◦ ξ
U1 = u = ū + u ξ 1 . (4) gies of data-driven uncertainty set, e.g. Zhang, Grossmann, Sun-
daramoorthy, and Pinto (2016a), Crespo, Colbert, Kenny, and Giesy
Uncertainty sets U1 , U2 , and U∞ altogether act as basic modeling (2019), etc. Based on these basic modeling tools, Hong, Huang, and
tools in RO, based on which intersections and unions thereof have Lam (2017) considered a class of uncertainty sets constructed as
been developed to promote the flexibility in uncertainty descrip- combinations of basic geometric shapes and investigated the prob-
tion, such as “interval + ellipsoidal” model, “polyhedral + ellip- ability guarantee in a data-driven scheme. An inclusive review on
soidal” model, etc (Ben-Tal & Nemirovski, 20 0 0; Bertsimas et al., the latest developments of data-driven RO can be found in Ning
2004; Li, Ding, & Floudas, 2011). and You (2019).
All aforesaid uncertainty sets are norm-based. Hence, an under- Amongst different formulations of U, polyhedral uncertainty
lying assumption is that uncertainty tends to spread symmetrically sets feature a desirable balance between the flexibility in leverag-
and radially from the center. Meanwhile, they also fail to account ing data information and the computational tractability of RC prob-
for correlations among different variables. To address this issue, lems. Zhang, Jin, Feng, and Rong (2018) proposed a heuristic to at-
Bertsimas and Sim (2004) developed an extended expression tain a data-based polytope by progressively adding cutting planes.
where correlated uncertainties are disentangled: Shang, Huang, and You (2017) proposed a systematic approach to
polyhedral set construction using support vector data description
u(ξ ) = ū + Tξ , (5)
(SVDD) with a tailored weighted generalized intersection kernel
where ξ ∈ Rmstands for the underlying independent uncertainties, (WGIK), which has found further applications in robust model pre-
and T ∈ Rn×m is the transformation matrix mapping m underlying dictive control (Shang & You, 2019b), energy system operations
uncertainty sources to n parameters. This approach was adopted (Shen, Zhao, Du, Zhong, & Qian, 2020), supply chain management
by Ferreira, Barroso, and Carvalho (2012) in a demand response (Mohseni & Pishvaee, 2020), and irrigation systems (Shang, Chen,
model, where {ū, T} are determined based on principal component Stroock, & You, 2020). In Ning and You (2018), a kernel density es-
analysis (PCA) and minimum power decomposition (MPD). Yuan, timation (KDE) approach was put forward. Notwithstanding their
Li, and Huang (2016) generalized classical symmetric uncertainty improved capabilities in dealing with data correlation and asym-
sets under (5) and established explicit formulations of associated metry, the performance of these non-parametric approaches criti-
RCs. A variety of alternatives have also been proposed to capture cally relies on projection directions, which are specified with PCA
the correlation and the asymmetry of uncertainties. For exam- as a common heuristic. However, whether such a heuristic yields
ple, Chen, Sim, and Sun (2007) proposed to construct U using the best performance is questionable.
forward and backward deviations. Natarajan, Pachamanova, and This work aims to address these issues by developing a novel
Sim (2008) developed an asymmetry-robust value-at-risk (VaR) non-parametric polytope learning approach that is capable of op-
measure to take into consideration asymmetries in the distribu- timally selecting projection directions while learning a compact
tion of portfolio returns. Jalilvand-Nejad, Shafaei, and Shahriari polyhedral set. In this way, it alleviates the need for tedious pa-

1005
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

rameter tuning and further assists decision-making by solving a space from the origin with a hyperplane by solving the following
data-driven RO problem. As an important ingredient in statistical problem1 :
learning theory, the multiple kernel learning (MKL) framework is
1
N
1
adopted in this work for support estimation, which allows for con- min w 2
−ρ + ξi
currently learning both an optimal combination of “candidate” ker- w,ρ ,ξ 2 Nν
i=1 (6)
nels and the induced uncertainty set. From a unified viewpoint s.t. wT φ (ui ) ρ − ξi , i = 1, . . . , N
of MKL, the aforesaid approaches based on SVDD (Shang et al., ξi 0 , i = 1 , . . . , N ,
2017) and KDE (Ning & You, 2018) can be regarded as utilizing all
where φ (· ) represents the feature map from input space X to
candidate kernels as a fixed combination, in the sense that they
the high dimensional feature space H, and the associated inner
are essentially single-kernel methods and have limited representa-
product can be calculated by evaluating its associated kernel func-
tion capabilities. Henceforth, the flexibility of MKL in learning op-
tion K (u, v ) = φ (u ), φ (v ) . Here, slack variables {ξi } allow some
timal kernels typically gives rise to better modeling power than
data points to be misclassified by the hyperplane (Huang, Shi, &
traditional single-kernel methods (Lanckriet, Cristianini, Bartlett,
Suykens, 2013). The regularization parameter ν ∈ (0, 1] controls
Ghaoui, & Jordan, 2004; Rakotomamonjy, Bach, Canu, & Grandvalet,
the tolerance of data points being “misclassified”, and its impli-
2008), and thus has the potential of yielding improved estima-
cations in uncertainty set constructions will be discussed in the
tion of the high-density region as a data-driven uncertainty set.
sequel. Upon solving (6), the decision function is explicitly written
In this paper, we propose a novel data-driven RO framework aided
as:
by MKL-based one-class SVM (OC-SVM), which has the following
advantages. y ( u ) = wT φ ( u ) − ρ f ( u ) − ρ , (7)
• By leveraging data information, a compactly enclosing polyhe- and the induced decision region as the data-driven uncertainty set
dral uncertainty set with optimal kernel functions, viz. the MKL can be expressed as:
uncertainty set, can be automatically learnt, which adapts to U ( D ) = {u|y ( u ) 0} = {u| f ( u ) − ρ 0}. (8)
data distribution with asymmetry without resorting to tedious
parameter tuning. Hence, the over-conservatism and the gap Problem (6) is essentially based on a single kernel function
between the model and the realistic performance can be re- K (·, · ), which shall be specified by the user prior to solving (6). Re-
duced. Besides, it yields a tractable low-complexity RC problem cent experiences in machine learning have shown that using mul-
since optimally selected kernel functions are sparse. tiple kernel functions can enhance the representation capability of
• The RO framework enables a convenient adjustment of both the model and the prediction performance (Lanckriet et al., 2004;
the conservatism and complexity, thereby being user-friendly Rakotomamonjy et al., 2008; Xu, Tsang, & Xu, 2013). Typically, the
in practice. Specifically, it bears clear statistical interpretations kernel K (u, v ) can be expressed as a convex combination of multi-
since the proportion of outliers can be manipulated, which not ple basis kernels {Km (·, · )}:
only helps rejecting extremal samples but also allows to adjust
M

the conservatism of RO based on user’s preference. K ( u, v ) = πm Km (u, v ) (9)

• As a fundamental approach to polytope construction, it enjoys m=1
wide generality and higher-level usability. By taking unions and with
intersections of sets complicated multi-modal and non-convex

M
distributions can be tackled, as recently advocated by Li et al. πm 0 , πm = 1 , (10)
(2011) and Hong et al. (2017). It can also benefit the decision- m=1
making paradigm of distributionally robust optimization (DRO), where M is the number of basis kernels. Each basis kernel can be
where the estimated support can be integrated into the design expressed as an inner-product Km (u, v ) = φm (u ), φm (v ) with its
of ambiguity set (Wiesemann, Kuhn, & Sim, 2014). own mapping φm (· ). In the light of Rakotomamonjy et al. (2008),
• By deliberately choosing candidate kernel functions, the MKL we need to find M functionals { fm } from M different kernel spaces
uncertainty set turns out to be compatible with multi-stage {Hm } whose direct sum forms the feature space H, such that f (u )
adjustable RO (ARO) problems and beneficial for reducing the in (7) can be equivalently expressed as:
conservatism, without paying too much additional efforts.

M
M
The rest of this paper is organized as follows. In Section 2, f ( u ) = wT φ ( u ) = f m (u ) = wTm φm (u ),
we first present the mathematical formulation of the MKL-based m=1 m=1
OC-SVM, based on which the data-driven polyhedral uncertainty where w and φ admit block decompositions:
set is constructed and an efficient learning algorithm is provided.
T
Section 3 focus on the computational tractability of the proposed 1 1 √ √ T
uncertainty set induced RO, including the tractability of static RO w= √ wT1 , . . . , √ wTM , φ= π1 φ1T , . . . , πM φMT .
π1 πM
and ARO. Some numerical studies covering various aspects are car-
ried out in Section 4. Section 5 presents the application of the pro- (11)
posed method to a production-inventory problem, followed by fi- Xu et al. (2013) put forward a unified framework of MKL by simply
nal concluding remarks. adding an upper bound on {πm } in constraint (10), which incorpo-
rates different kinds of kernel combination characteristics and is
2. MKL-based uncertainty set construction able to improve the effectiveness of MKL.
In the multiple kernel setting, πm can be deemed as the weight
2.1. Problem formulation of Km (·, · ). By simultaneously optimizing {πm }, {wm } and ρ , non-
decisive kernel functions will be revealed with {πm = 0}, and thus
OC-SVM has been extensively utilized to estimate the sup-
port for an unknown distribution based on historical observations 1
Whenever K (u, u ) = constant, OC-SVM is essentially equivalent to SVDD used
(Schölkopf, Smola, Bach et al., 2002). The idea is to first project in Shang et al. (2017). However, the multiple kernel extension of SVDD is much less
all uncertainty data samples D = {ui }N
i=1
into a high-dimensional straightforward than that of OC-SVM due to their different geometric interpreta-
feature space, and then separate the data samples in the feature tions. For better clarifications we opt for OC-SVM in this work.

1006
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

optimal kernel functions can be automatically selected, thereby Clearly, the dual formulation (17) of the proposed MKL-based
eventually leading to a compact representation of U. In this pa- OC-SVM has a neatly symmetrical structure with its primal formu-
per, we adopt the unified MKL framework proposed in Xu et al. lation (12). From the perspective of machine learning, the objective
(2013) to implement OC-SVM for uncertainty set construction. By of (17) is to maximize the “margin” γ while considering some “er-
replacing w and φ in (6) with (11) and considering the constraints rors” from M basis kernels, which are denoted by slack variables
on {πm }, we formulate the MKL-based OC-SVM as follows: {ζm } (also the Lagrangian multipliers). The regularization effect of
M N μ can also be evidenced from this dual formulation, because it
min 1
2
1
πm wm
2
− ρ + N1ν ξi gives a balance between two conflicting terms in the objective (Xu
{wm },π,ρ ,ξ m=1 i=1
M et al., 2013). Much as μ has a regularization effect on the spar-
s.t. wTm φm (ui ) ρ − ξi , i = 1, . . . , N sity of {πm }, ν is responsible for controlling the sparsity of {αi },
(12)
m=1 which can be observed from constraints on {αi } in (17). Note that
ξi 0 , i = 1 , . . . , N (17) is a quadratically constrained quadratic program (QCQP). If all
M
πm = 1, 0 πm 1
, m = 1, . . . , M. kernel functions {Km } are positive semi-definite, the dual problem
Mμ
m=1 is also convex, which can be handled by general-purpose convex
The regularization parameter μ on {πm } can be interpreted as fol- optimization softwares.
lows. To ensure the feasibility of the constraints on πm , there is Suppose we have attained both the primal
and dual optimal so-

an inherent requirement that μ ∈ (0, 1]. If μ > 1, feasible weights lutions, denoted as {wm }, π , ρ , ξ and α , γ , ζ , respectively.
{πm } no longer exist. If μ < 1/M, it follows that 1/Mμ > 1, render- Then the MKL uncertainty set can be expressed in terms of α , π
ing πm 1/Mμ redundant and thus the regularization effect on πm and kernel functions {Km } by substitution into (11) and (13):
vanishes. In this case, (12) reduces to the traditional formulation of
N
M
MKL-based OC-SVM (Han, Shang, Yang, & Huang, 2019; Rakotoma-
monjy et al., 2008), which tends to select the fewest kernels and Uν,μ (D ) = {u|y(u ) 0} = u α
π Km (u, ui ) ρ ,

i=1 i m=1 m
lead to over-fitting to some degree (Xu et al., 2013). If μ > 1/M,
sparsity within {πm } will be discouraged. This can be interpreted (18)
as that, when some {πm } are exactly zero, the remaining ones
tend to have large values to fulfill the equality m πm = 1, while where subscripts ν and μ highlight the dependence of U on hyper-
πm 1/Mμ penalizes “over-weighting” of decisive kernel functions parameters {ν, μ}. In the sequel, we will use U (D ) for brevity
and thus avoids undue sparsity. As an extreme case, when μ = 1, when no confusion is made.
all kernel weights are enforced to be 1/M, which amounts to tak-
ing the average of all kernels at hand. 2.2. The MKL-based piecewise linear uncertainty set
Note that (12) is a convex program since wm 2 /πm in the ob-
jective is a convex function, and it can be easily verified that strong Following the convention of classic SVM theory, the data sam-
duality is admitted (Boyd & Vandenberghe, 2004), so we can solve ples with nonzero dual variables {αi } are termed as support vec-
it from the dual. The Lagrangian of (12) can be written as: tors (SVs) because the others with {αi = 0} do not contribute to
1 1 1 the final expression of U (D ). The index set of SVs is denoted as
L= wm 2
−ρ + ξi
2 m πm Nν (Schölkopf et al., 2002):
i
SV = {i|αi > 0, ∀i }. (19)
− αi wTm φm (ui ) − ρ + ξi − βi ξi
i m i
Among the SVs, those data samples with nonzero dual vari-
ables {βi } (equivalently, {0 < αi < 1/Nν}) have m wm φm (ui ) =
T
1
−γ πm − 1 − η m πm + ζm πm − . ρ , ∀i because of the complementary slackness conditions αi

Mμ
m m m wmT φm (ui ) − ρ + ξi = 0, ∀i and βi ξi = 0, ∀i, and thus lie
m
By setting it to zero one obtains: right on the boundary of the uncertainty set U (D ), termed as
boundary support vectors (BSVs) (Schölkopf et al., 2002):
∂ L/∂ wm = 0 ⇒ wm = αi πm φm (ui ), ∀m (13)
i BSV = {i ∈ SV |βi > 0, ∀i } = {i|0 < αi < 1/Nν, ∀i }. (20)

∂ L/∂ ρ = 0 ⇒ αi = 1 (14) There is also a similar interpretation of complementary slack-
i ness for kernel functions. The complementary slackness condition
ηm π = 0 indicates that, for every basis kernel, either η = 0 or
m m
1 πm = 0 holds. Analogous to data samples with {αi = 0}, kernel
∂ L/∂ ξi = 0 ⇒ αi + βi = , ∀i (15)
Nν functions with {πm = 0} do not appear in U (D ), thereby giving

rise to a succinct expression. For this reason, the basis kernels

2
1 wm with {πm > 0} are termed support kernels (SKs), and SK = {m|π >
∂ L/∂ πm = 0 ⇒ + γ + ηm − ζm = 0, ∀m. (16) m
2 πm2 0, ∀m} is deﬁned as the associated index set.
Plugging these optimality conditions into the Lagrangian, one ob- Substituting (13) into (16), we have:
tains the dual problem: 1
− αi α j Km (ui , u j ) = γ + ηm − ζm , ∀m.
1
M
2
min −γ + ζm i, j
α ,γ ,ζ Mμ
m=1
It implies that any m for which ηm
= 0 must satisfy:
1
N
s.t. − αi α j Km (ui , u j ) γ − ζm , m = 1, . . . , M 1
2 (17) − αi α j Km (ui , u j ) = γ − ζm .
i, j=1
2
ζm 0, m = 1, . . . , M i, j

N
1
αi = 1, 0 αi , i = 1, . . . , N. In other words, the mth quadratic constraint in the dual problem
Nν (17) is activated. So the index set of the support kernels can be
i=1

1007
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

obtained by Proof. The proof of Proposition 1 in Shang et al. (2017) applies mu-
(i )
= qTm ui /cm ·
tatis mutandis to the present setup. We denote by zm
1 κ , ∀ui ∈ D, m = 1, . . . , M the scaled data projections in shorthand,
SK = m− α α Km (ui , u j ) = γ − ζm , ∀m .
2 i, j i j and define the residual

1 T
Upon training the model, a significant proportion of data sam- m = 1 − max qm ( ui − u j )
ui ,u j ∈D cm · κ
ples and basis kernels can be discarded, with only SVs and SKs re-
sm /cm
tained necessarily. This property is pivotal to ensure the practical > 1− 0, ∀m
applicability of MKL-based OC-SVM, since the MKL uncertainty set maxm =1,··· ,M sm /cm
(18) simplifies to According to (23) and (24). Then we could construct valid upper
(i )
, ∀i as zm = cm1·κ maxui ∈D qTm ui + m /2 and
and lower bounds for zm

Uν,μ (D ) = u α
π Km (u, ui ) ρ .

(21) zm = 1
c m ·κ minui ∈D qTm ui − m /2. It is easy to see that Km = K+
m+
i∈SV i m∈SK m (i ) ( j)
m , where Km has entries Km (ui , u j ) = min{zm − zm , zm − zm } and
K− + +

(i ) ( j)
As a matter of fact, some off-the-shelf optimization toolboxes, K−
m has entries −
Km ( ui , u j ) = min{zm − zm , zm − zm }. Denoting by
such as cvx (Grant, Boyd, & Ye, 2008), can simultaneously solve (i ) (i )
wm = zm − zm > 0, it turns out that K+
m is a kernel matrix of con-
the primal and dual problems, and yield values of {αi } and {πm }.
ventional intersection kernel in one dimension and hence satisfies
In this case, we only need to calculate ρ by using the following
K+
m 0 (Odone, Barla, & Verri, 2005). The positive-definiteness of
property of all the BSVs
K−
m can be established in a similar fashion. Therefore, it holds that
Km = K+m + Km 0, ∀m.
−
αi πm Km (ui , u j ) − ρ = 0, ∀ j ∈ BSV
i∈SV m∈SK
Following the spirit of MKL, we intend to choose as many di-
to compute rections {qm }M as possible in the data space to yield sufficient
m=1
1 representative basis kernels for further selection. As each basis ker-
ρ = αi πm Km (ui , u j ).
|BSV | j∈BSV i∈SV m∈SK
nel (22) is designed to capture the information of D along a spe-
cific direction qm , we enforce MKL to “observe” the dataset in a
There are a variety of popular kernel functions in the MKL set- comprehensive manner in the hope that the most useful directions
ting, such as the Gaussian kernel, polynomial kernel, etc., which in- can be automatically selected to yield a desirable combined ker-
duce fairly good decision boundaries. Nonetheless, they cannot be nel. Based on such a motivation, there are several ways to choose
used to construct the uncertainty set for RO owing to their lack of M representative directions {qm }. A trivial idea is to sample {qm }
computational tractability. To this end, we propose a novel piece- randomly from the n-dimensional input space and then normalize
wise linear concave kernel structure for MKL to serve as the can- them to unit length. An alternative deterministic way is to let them
didate basis kernels in (9), i.e., be equispaced on the n-dimensional sphere, based on the follow-

qTm (u − v ) ing polar coordinate expression in n dimensions:
Km (u, v ) = 1 − . (22) ⎧
cm · κ ⎪qm,1 = cos(ψm,n−1 ) · · · cos(ψm,2 ) cos(ψm,1 )
⎪
⎪
⎪
Each basis kernel (22) is parameterized by qm , cm and κ . Intuitively ⎨qm,2 = cos(ψm,n−1 ) · · · cos(ψm,2 ) sin(ψm,1 )
.. ,
speaking, Km (u, v ) evaluates the negative “distance” between two .
⎪
⎪
data points along a particular projection direction qm , which is a ⎪
⎪qm,n−1 = cos(ψm,n−1 ) sin(ψm,n−2 )
⎩
unit direction vector of the mth basis kernel. The normalization qm,n = sin(ψm,n−1 )
factor cm describes the “dispersion” degree of D along qm , and is
where arguments {ψm, j } are evenly distributed in [0, π ) for a
thus used to “normalize” data along qm , which can be taken as the
given j ∈ {1, . . . , n − 1}. If we choose P directions along each di-
span, i.e.,
mension j, there will be M = P n−1 basis kernels in total2 . This de-
cm := max qTm (ui − u j ). (23) terministic method is adopted in this work. Note that, although our
ui ,u j ∈D
approach also requires user-defined projection directions, it has
However, the maximum operator is known to be sensitive to out- better flexibility in learning a good sparse combination of projec-
liers. Instead, a robust estimation of the span can be obtained tion directions from sufficiently representative candidates.
based on quantiles: We point out that the combination of our proposed basis kernel
cm := maxi {qTm ui } − max1i − {qTm ui }, (22) generalizes the weighted generalized intersection kernel (WGIK)
proposed by Shang et al. (2017), which can be expressed as:
where maxi {·} stands for the -upper quantile of the set {·}, and
can be set as a small number, for example, 0.01. Note that the
n
K ( u, v ) = κ − W (u − v )
positive-definiteness of Km (u, v ) remains a concern, and a scaling 1
m=1
factor κ > 0 is introduced such that the positive-definiteness of all
basis kernels can be ensured provided that κ is sufficiently large.

n qT ( u − v )
= κ 1 − m , (25)
Note that, to ensure positive-definiteness of all basis kernels, the cm · κ
m=1
scaling factor κ can be set as
where W is a weighting matrix that can be constructed as the
sm
κ > max , sm = max qTm (ui − u j ), (24) inverse of covariance matrix , {qm } are defined by eigenvector
m=1,...,M cm ui ,u j ∈D
of , and cm is the normalization coefficient along qm . Obviously,
according to the following proposition. WGIK (25) is identical to the average of n basis kernels in the form
of (22), and hence the single WGIK-based uncertainty set construc-
Proposition 1 (Positive-Definiteness of Piecewise Linear Ker- tion method in Shang et al. (2017) can be considered as a spe-
nels). Assume the scaling factor κ in (22) is set according to cial case of our proposed MKL-based formulation (12) with μ = 1.
(24), then all the basis kernel matrices {Km } with elements
Km (ui , u j ), ∀ui , u j ∈ D, m = 1, . . . , M satisfy 2
To avoid duplicate directions, it is recommended to let P be an odd number in
Km 0, ∀m = 1, . . . , M. high-dimensional situations.

1008
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

(a) Single WGIK-based SVDD

(b) MKL-based OC-SVM

Fig. 1. Uncertainty sets learnt by single WGIK-based SVDD and MKL-based OC-SVM (ν = 0.01, μ = 0.05). Data points marked as pentagrams are the SVs. Green line segments
in the right diagram of (a) represent the eigen-directions of the data covariance matrix, and their lengths represent standard deviations of the data projection on the
corresponding directions. Red line segments in the right diagram of (b) represent the projection directions of SKs selected by MKL, and their lengths stand for kernel
weights. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

As a matter of fact, WGIK can be viewed as “observing” an n- Obviously, U (D ) is a polytope, which warrants the computa-
dimensional uncertainty dataset from at most n( M) orthogonal tional tractability of a broad class of RO problems, as to be dis-
directions defined by n basis kernels that are fixed a priori. This cussed in detail in the next section. The hyper-parameter ν inherits
leads to considerable restrictiveness in learning the uncertainty set the desirable interpretation of single-kernel OC-SVM, as described
from data, especially when the uncertainty distribution is strongly by the following proposition.
correlated and asymmetrical (As to be shown in Fig. 1(a), it may
produce much unnecessary coverage). In contrast, the MKL scheme Proposition 2 (Relationship between ν and the percentage of out-
for uncertainty set construction possesses significantly enhanced liers). Assuming the solution to (12) with 0 < ν 1 exists. Then ν is
flexibility. an upper bound on the percentage of outliers.
With a huge number of candidate piecewise linear kernels used,
only a fraction of them will be retained to build the combined ker- Proof. See Schölkopf, Platt, Shawe-Taylor, Smola, and Williamson
nel, with the majority discarded. In other words, most coefficients (2001) and Schölkopf et al. (2002).
{πm } of basis kernels will be exactly zero, and it turns out that
the MKL uncertainty set U (D ) can be represented by only SVs and Proposition 2 indicates that, the MKL uncertainty set Uν,μ (D )
SKs, which admits a concise representation and brings computa- encloses at least 100 × (1 − ν )% of N training samples. Henceforth,
tional convenience. Substituting (22) into (21) yields the following ν can be interpreted as the empirical confidence level of Uν,μ (D ).
RO-compatible MKL uncertainty set: The “volume” of Uν,μ (D ) can be controlled by ν with clear sta-
tistical implications. This not only endows Uν,μ (D ) with desirable
robustness against extremal outliers, but also renders the number

U ( D ) = u α
π Km (u, ui ) ρ

of outward points explicitly adjustable, which is intimately related
i∈SV i m∈SK m to the conservatism of RO. By contrast, the sizes of classical uncer-

tainty sets U1 , U2 and U∞ are controlled through the budget pa-
1 rameter , which has no explicit connection with the fraction of
= u α π qT ( u − ui ) 1 − ρ . (26)
i∈SV i m∈SK m cm · κ m data coverage.

1009
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

Remark 1. In fact, in the expression (26) of the uncertainty Combining (29) and (30), we arrive at the following linear system
set U (D ), ρ can also be regarded as a set parameter. Since
K(BSV,BSV ) 1 ∂ αBSV ∂ρ (BSV,SV )
−Km αSV ,
the LHS of the final expression of U (D ), i.e., i∈SV αi

m∈SK ,− T
= ∀m.
πm cm1·κ qTm (u − ui ), is a convex function in u, by adjusting ρ 1T 0 ∂πm ∂πm 0

one obtains different (1 − ρ )-sublevel sets. These sublevel sets are =:
convex and form a nested structure. That is, for all ρ1 and ρ2
satisfying ρ ρ1 ρ2 < 1, it holds that U (D; ρ1 ) ⊆ U (D; ρ2 ) ⊆ Defining as the sub-matrix of −1 with the last row and last
U (D; ρ ). This provides a practical method of constructing ambigu- column removed, we obtain:
ity sets satisfying the nesting condition in Wiesemann et al. (2014), ∂ αBSV (BSV,SV )
thereby being useful in convex DRO problems in literature. = −Km αSV , ∀m.
∂πm
2.3. An efficient learning algorithm Then the entries of the Hessian H are given by:

:= ∂ π∂m ∂J πn = −(α )T Km ∂π
∂α
2

In the proposed MKL uncertainty set construction scheme, we Hm,n n

need to set in advance a plenty of basis kernels for learning. = (α )T Km Kn(BSV,SV ) αSV (31)
Specifically, massive candidate basis kernels are required under
high-dimensional uncertainty. Meanwhile, a sufficient amount of
= (α )T K(SV,BSV ) K(BSV,SV ) α
SV m n SV , ∀m, n.
historical data are necessary for good description performance in Hence the Newton step s can be found by solving the following
order to accurately characterize the uncertainty distribution. These quadratic program (QP):
issues altogether lend the RO problem to a large-scale QCQP, which
1 T
cannot be efficiently solved by general-purpose solvers including min s Hs + gT s
cvx. Therefore, an efficient learning algorithm is indispensable. s 2
M
Many algorithms have been proposed for accelerating MKL- (32)
s.t. sm = 0,
based SVM (Aiolli & Donini, 2015; Jain, Vishwanathan, & Varma, m=1
2012; Lanckriet et al., 2004; Sonnenburg, Rätsch, Schäfer, & 0 πm + s m 1
Mμ
, ∀m
Schölkopf, 2006; Suzuki & Tomioka, 2011), but none of them ap-
where g is the gradient of J (π ):
plies to the present formulation (12) or (17). We develop herein
a new efficient HessianMKL algorithm, which is motivated by ∂J 1
gm := = − (α )T Km α , ∀m. (33)
Chapelle and Rakotomamonjy (2008). The idea is to alternately op- ∂πm 2
timize the following constrained optimization problem:
The whole procedure of the efficient learning algorithm for
min J (π ) MKL-based OC-SVM is summarized in Algorithm 1.
π

M
1
s.t. πm = 1 , 0 π m , ∀m
Mμ Algorithm 1: HessianMKL for MKL-Based OC-SVM (12).
m=1

using Newton’s method and solve the following standard single Input: M Kernel matrices {Km }, regularization parameters ν
kernel (kernel K = m πm Km ) OC-SVM problem J (π ) (or its dual) and μ.
through off-the-shelf SVM solvers such as libsvm (Chang & Lin, Output: Lagrangian multipliers α, kernel weights π .
2011): Set αi = 1/N, dm = 1/M;
⎧ while stopping criteriona not met do
⎪ 1 1 1
⎪
⎪ min wm 2 − ρ + ξi Solve (27) using existing solver to obtain the dual variable
⎨ {wm },ρ ,ξ 2 m πm Nν α;
i
J (π ) = s.t. wTm φm (ui ) ρ − ξi , ∀i (27) Compute the gradient g according to (33);
⎪
⎪
⎪
⎩ m Compute the Hessian H according to (31);
ξi 0, ∀i. Solve (32) using QP solver to obtain the Newton step s;
Choose step size τ via the exact or backtracking line
The crux in implementing this algorithm is to calculate the Hes-
search;
sian matrix and the Newton step s. According to complementary
Update π : π = π + τ · s.
slackness conditions, ∀ j ∈ BSV, we have
end
y (u j ) = αi πm Km (u j , ui ) − ρ = 0, ∀ j ∈ BSV,
i m a
The stopping criterion adopted in this paper is J 10−5 J.
⇐⇒ K(BSV,·) α − ρ 1 = 0,
where K(BSV,· ) denotes the submatrix of K where only the rows cor- 3. Computational tractability
responding to the BSVs have been preserved. Differentiating with
respect to πm , we have 3.1. The case of static RO

(BSV,· ) ∂α
∂ρ
Km α + K(BSV,·) − = 0, ∀m. (28) To illustrate the tractability of the MKL uncertainty set-induced
∂πm ∂πm
RO, we consider the following static robust linear programming
We have already known that α j = 0, ∀ j ∈ / SV and α j = 1/Nν, ∀ j ∈ (LP) problem:
SV \BSV, so ∂ α j /∂ πm = 0, ∀ j ∈
/ BSV, and hence (28) becomes
min cT x : A(u )x b, ∀u ∈ U ( D ) , (34)
(BSV,BSV ) ∂ α ∂ρ
x
(BSV,SV )
Km αSV + K BSV
− = 0, ∀m. (29) where c ∈ Rn1 , b ∈ Rm , and A(u ) ∈ Rm×n1 is the left-hand side
∂πm ∂πm
(LHS) coeﬃcient matrix affected by uncertainty u ∈ Rn . It suﬃces
On the other hand, according to the equality i αi = 1, we have
to consider the following single constraint (Ben-Tal et al., 2009):
∂α
∂ αBSV
i
=0 ⇒ 1T = 0, ∀m. (30) a(u )T x b, ∀u ∈ U ( D ). (35)
i
∂πm ∂πm

1010
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

For the sake of exposition, we assume a(u ) to be affine in u: construction of a high-complexity uncertainty set with moderate
sizes of SVs and SKs. Meanwhile, (37) involves (2|SV ||SK | + 1 ) ad-
a(u ) = ā + Pu, (36) ditional variables in total. However, the overall complexity tends to
be benign thanks to the sparsity of SVs and SKs in MKL, even if a
where ā ∈ Rn1
is constant and P ∈ Rn1 ×n . Then the tractability of plenty of training data and candidate basis kernels are provided for
(35) can be established as follows. constructing U (D ). As a matter of fact, ν and μ are informative in
Theorem 1. (35) is equivalent to the following system of linear con- revealing proportions of SVs and SKs that are eventually selected.
straints Proposition 3 (Relationship between ν and the percentage of
⎧
⎪
T
(μim − λim ) qcmm·uκi + η (1 − ρ ) b − āT x SVs). Assume the solution of (12) with ν ∈ (0, 1] exists, then ν is an
⎪
⎪
⎪i∈SV
⎪ m ∈ SK lower bound on the percentage of SVs.
⎨ (μim − λim ) cqmm·κ = PT x
i∈SV m∈SK (37) Proof. See Schölkopf et al. (2002).
⎪
⎪λim + μim = ηαi πm , ∀i ∈ SV, m ∈ SK
⎪
⎪ Proposition 4 (Relationship between μ and the percentage of
⎩λim 0, μim 0, ∀i ∈ SV, m ∈ SK
⎪ SKs). Assume the solution of (12) with μ ∈ (0, 1] exists, then μ is
η 0.
a lower bound on the percentage of SKs.
Proof. (35) can be rewritten in a worst-case sense:
Proof. Similar to the proof of Proposition 3, SKs can contribute
max uT PT x b − āT x. (38) at most 1/Mμ to constraint m πm = 1 of (12) due to constraint
u∈U (D )
0 πm 1/Mμ, hence there must be at least Mμ SKs, which es-
To eliminate 1 -norms in U (D ), auxiliary variables = tablishes μ as a lower bound of the fraction.
{θim }i∈SV,m∈SK are introduced together with the primitive un-
The above results imply that, in principle, the decision-maker
certainty u, giving rise to the following extended uncertainty set:
becomes aware of the “minimal” complexity of the RC problem
based on ν and μ. In addition, it is known that the number of
⎧ ⎫
⎨ −θim qTm (u−ui )
θim , ∀i ∈ SV, m ∈ SK ⎬ SVs (SKs) is non-decreasing in ν and μ. These guidelines can assist
c m ·κ
U˜ν,μ (D ) = (u, ) , the user to flexibly adjust the complexity of the induced RC prob-
⎩ αi

πm θim 1 − ρ ⎭ lem with ν and μ, thereby rendering the data-driven RO scheme
i∈SV m∈SK easy-to-use in practice.
(39)
Remark 2. Based on the tractability result, a further connection
which is cast as a series of linear inequalities. Obviously, U (D ) is can be established with the well-known sample average approxi-
the projection of U˜ (D ) into the space of primitive uncertainty u, mation (SAA) for the chance constraint (Ben-Tal et al., 2009):
and thus (35) becomes: P { g( x ; u ) ∈ A } 1 − ν , (42)
max uT PT x b − āT x, (40) where the uncertainty u is approximated by its empirical distri-
(u,)∈U˜ν,μ (D )
bution Pˆ with discrete and finite support, i.e., Pˆ {u = ui } = 1/N, i =
whose LHS can be expressed as the following LP: 1, . . . , N. A natural corollary can be established that a solution x
feasible for the robust constraint in (1) induced by U (D ) must
max uT PT x
u, be also feasible for the SAA-based chance constraint (42) under
qT ( u − ui ) risk level ν . A conventional way to handle SAA-based chance con-
s.t. −θim m θim , ∀i ∈ SV, m ∈ SK (41)
cm · κ
strained linear programs is to resort to MIP formulations using the

αi

πm θim 1 − ρ .
“big-M” technique (Luedtke, Ahmed, & Nemhauser, 2010). With the
i∈SV m∈SK
MKL uncertainty set-aided RO adopted as a high-fidelity safe ap-
proximation to (42), the usage of auxiliary integer variables be-
Because the feasible region of (41) is obviously bounded and comes no longer necessary, thereby avoiding heavy computational
nonempty, (41) must have a bounded optimal value. According cost.
to the strong duality of LP, the dual problem is also feasible and
bounded, and hence optimal values of the primal and dual coin- To sum up, the MKL uncertainty set lends itself to a system-
cide. Then, by deriving the dual problem, the infinite-dimensional atic approach to uncertainty set construction as well as a power-
constraint (35) can be translated to (37), where {λim }, {μim } and η ful data-driven alternative to classical ones U1 , U2 , and U∞ . With
are dual variables of (41). conservatism and complexity conveniently adjusted, the MKL un-
certainty set acts as a user-friendly modeling tool for uncertainty.
The above LP reformulation of the robust constraint sheds light Beyond its capability of characterizing unimodal uncertainty dis-
on the desirable computational tractability secured by the pro- tributions, the MKL uncertainty set is also useful for dealing with
posed MKL-based uncertainty set in solving a class of static ro- multi-modal distributions and non-convex support. In principle,
bust optimization problems. For example, when the original deter- one can first carry out clustering and then construct uncertainty
ministic problem is an MILP, the RC problem remains an MILP as set for each data cluster individually, or partition a non-convex set
well, which can be conveniently handled by off-the-shelf solvers. into several convex regions (Zhang et al., 2016a). The overall uncer-
In a nutshell, the RC problem is of the same type as the deter- tainty set can be formed by their union, with the computational
ministic problem without robustification, provided that the uncer- tractability of the induced RO still preserved. For example, it can
tainty penetrates into constraints multiplicatively. Moreover, based serve as a basic modeling strategy in some learning procedures
on the primal-dual saddle dynamics approach that has recently where various types of uncertainty sets are integrated, e.g. Hong
emerged (Ebrahimi, Vaidya, & Elia, 2019), the proposed MKL uncer- et al. (2017), Zhang et al. (2018), and Alexeenko and Bitar (2020).
tainty set can be used to tackle the general robustified constraint
g(x, u ) 0, ∀u ∈ U where g is convex in x and strictly concave in 3.2. The case of adaptive RO
u.
Notice that, (39) indicates that the uncertainty set has Next we discuss the particular usage of the proposed MKL
(2|SV ||SK | + 1 ) facets approximately. Hence, it enables an efficient uncertainty set in multi-stage decision-making under sequentially

1011
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

emerging uncertainties. We first consider the role of the MKL un-
min cT x + max min dT1 y1 + · · · + max min dTT yT ,
certainty set in the following two-stage ARO problem (Ben-Tal, x∈X u1 ∈U1 y1 ∈1 (x,u1 ) uT ∈UT yT ∈T (x,y1:T −1 ,u1:T )
Goryashko, Guslitzer, & Nemirovski, 2004; Yanıkoğlu, Gorissen, &
(48)
den Hertog, 2019):
where u1 , . . . , uT are uncertainties revealed over T stages, and
min cT x + max min dT y , (43) y1 , . . . , yT are recourse decisions made from stage 1 to stage T .
x∈X u∈U (D ) y∈(x,u )
The uncertainty set U for the concatenation of stage-wise uncer-
where x ∈ Rn1 is a vector of here-and-now decisions that are made tainties u = [u1 ; u2 ; . . . ; uT ] is the Cartesian product of all stage-
prior to realization of uncertainty, and y ∈ Rn2 denotes wait-and- wise sets, i.e., U = U1 × U2 × · · · × UT . The feasible region t :=
see decisions that can be made adaptively after seeing u ∈ Rn . The t (x, y1:t−1 , u1:t ) in stage t is dependent on here-and-now deci-
set X represents the feasible region of x, while the feasible re- sions x and all preceding recourse decisions and realizations of un-
gion (x, u ) of y is reliant on both here-and-now decisions and certainty:
past uncertainty, which can be typically described using linear con- t (x, y1:t−1 , u1:t ) = {yt |At (u )x + Bt y ht (u ) }.
straints:
In process systems engineering, multi-stage ARO has found
(x, u ) = {y|A(u )x + By h(u ) },
widespread applications in process scheduling and planning
where A(u ) and h(u ) are uncertain coefficients that are affine in (Lappas & Gounaris, 2016; Ning & You, 2017b; Zhang, Morari,
u. Coefficient matrix B is assumed to be constant representing Grossmann, Sundaramoorthy, & Pinto, 2016b). In the multi-stage
fixed recourse, which is a standard assumption in literature (Chen setting, there are two intuitively plausible ways of using the MKL
& Zhang, 2009). In general, the two-stage ARO problem (43) is uncertainty set. The first is to model each stage-wise uncertainty
intractable because one has to optimize over the policy y(u ) in set Ut as an MKL-based one, which may help capturing from data
a functional space. To circumvent this conundrum, Ben-Tal et al. the individual distributional geometry of stage-wise uncertainty.
(2004) proposed a simple affine decision rule (ADR) assuming that Similar to the two-stage case, EADR can still be applied with stage-
wait-and-see variables have simple linear dependence on u, i.e., wise auxiliary variables t utilized:

y(u ) = y0 + Gu, (44) yt (u1:t , 1:t ) = yt0 + Gt u1:t + Ht vec(1:t ), t = 1, . . . , T , (49)

where {y0 , G} become finite-dimensional decision variables. Insert- which can be uniformly written as (46) where some entries of G
ing the ADR (44) into (43) yields a tractable static RO problem as and H are enforced to be zero to respect non-anticipativity con-
an approximation of (43): straints. Then one can still obtains a static RO similar to (47), albeit
min cT x + z with additional non-anticipativity constraints on G and H. How-
x∈X ,y0 ,G,z ever, such a strategy may fail to capture the correlation amongst
s.t. dT y0 + dT Gu z, ∀u ∈ U (D ) (45)
uncertainties in different stages, since uncertainty sets associated
A(u )x + By0 + BGu h(u ), ∀u ∈ U (D ), with different stages are constructed independently. As an alterna-
whose RC can be obtained using Theorem 1. However, the ADR is tive strategy, one may certainly use MKL to derive an uncertainty
known to be quite restrictive in approximating the optimal policy set U for the concatenated uncertainty u over all stages, where
y (u ), which may lead to significant suboptimality. Thereafter, a Ut is essentially the projection of U into the ut space with pre-
variety of extensions have been developed, such as the extended ceding uncertainties u1:t−1 given. The use of ADR in this case is
ADR (EADR) (Chen & Zhang, 2009), deflected ADR (DADR) (Chen, obvious. However, it no longer allows to leverage the power of
Sim, Sun, & Zhang, 2008), and segregated ADR (SADR) (Goh & EADR since auxiliary variables basically carry information over
Sim, 2010), all of which could lead to more flexible characteriza- all stages, thereby disrupting the non-anticipativity.
tions than ADR. Next, we show that the proposed MKL uncertainty In what follows, an effective strategy is put forward to render
set can be seamlessly integrated with EADR in Chen and Zhang the use of EADR viable, which pertains to the training phase of
(2009) without paying too much effort. The idea is to parameter- MKL-based uncertainty set. In Section 2.2, we have discussed the
ize the EADR with primitive uncertainty u and auxiliary variable issue of selecting projection directions {qm } that parameterize ba-
in the extended uncertainty set (39) simultaneously: sis kernels, and here a different strategy is devised. For compatibil-
ity with multi-stage ARO, we divide all basis kernels into T groups,
y(u, ) = y0 + Gu + Hvec(), (46) where basis kernels in the tth group are defined by evaluating two
where H is a coefficient matrix with compatible dimension. The in- truncated uncertainty samples:

troduction of is helpful for capturing the potential nonlinearity q˜ Tm (u1:t − v1:t )

Km (u, v ) = 1 − , m ∈ It , (50)
of optimal policy and leads to a better adaptation of uncertainty.
cm · κ
Under EADR, the induced approximation as a tractable static RO
problem becomes: where the set It includes indices of basis kernels in the tth group.
That is to say, Km (·, · ) ignores variables from stage (t + 1 ) to stage
min c x+z
T
T , and (50) can be equivalently expressed as (22) by augmenting
x∈X ,y0 ,G,H,z
s.t. dT y0 + dT Gu + dT Hvec() z, ∀(u, ) ∈ U˜ (D ) q˜ m with a zero vector with appropriate size. For a specific group
A(u )x + By0 + BGu + BHvec() h(u ), t, all related projection directions {q˜ m }m∈It are essentially defined
∀(u, ) ∈ U˜ (D ). on a reduced-dimensional data space, which can be specified ei-
ther randomly or deterministically on the unit sphere, as afore-
(47)
mentioned. After deriving the MKL uncertainty set U (D ), SKs will
It can be readily verified that the optimal value of (47) is always be automatically selected from each group, which define auxiliary
no higher than that of (45) (Chen & Zhang, 2009), which show- variables ˜ along with SVs. For a certain m ∈ It ∩ SK, it immedi-
cases the potential of using EADR in reducing the suboptimality of ately follows that the resultant auxiliary variables ˜ 1:t := {θim , i ∈
approximation. SV, m ∈ It ∩ SK } in the extended set U˜ (D ) become stage-specific
Next, we proceed with the following multi-stage ARO problem, since they only carry information from stage 1 to stage t and obey
which has a nested architecture: non-anticipativity. Henceforth, we are free to use these auxiliary

1012
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

0.6 1

0.5 Percentage of SVs Percentage of SKs

Percentage of Outliers 0.8

0.4
Percentage

Percentage
0.6
0.3
0.4
0.2

0.2
0.1

0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 1

Fig. 2. Percentages of outliers, SVs, and SKs under varying ν and μ.

variables in a multi-stage setup: Table 1

Computational costs on different datasets [Mean CPU
yt (u1:t ,
˜ 1:t ) = y0 + Gt u1:t + Ht vec(
t
˜ 1:t ), t = 1, . . . , T , (51) time ± Standard deviation (seconds)].

which eventually yields a tractable RC of the multi-stage ARO prob- Dataset cvx HessianMKL
lem (48). As to be illustrated in Section 5, such a strategy is helpful 1 23.7438 ± 0.0454 0.3046 ± 0.0012
for alleviating the solution’s conservatism at the price of slightly 2 26.3713 ± 0.0771 0.3058 ± 0.0031
increased computational cost. 3 42.0645 ± 0.0649 0.3479 ± 0.0009
4 25.6820 ± 0.0694 0.3068 ± 0.0020
5 105.3523 ± 1.7732 0.8140 ± 0.0045
4. Computational studies 6 24.5734 ± 0.0514 0.6510 ± 0.0018
7 29.1698 ± 0.0498 0.3042 ± 0.0017
4.1. Uncertainty set constructions 8 27.8394 ± 0.0430 0.5381 ± 0.0022
9 27.9536 ± 0.0559 0.4208 ± 0.0017

In order to demonstrate the learning capability of the pro-

posed method, we first construct uncertainty sets for 2-D uncer-
tainty u ∈ R2 . There exists significant asymmetric correlation be- outliers, SVs and SKs are computed and depicted in Fig. 2. It can
tween two dimensions of uncertain vector u, where u1 is a random be seen that ν always yields an upper-bound of the percentage of
variable uniformly distributed within [0.5, 4.5], and u2 = 2/u1 + w outliers as well as a lower-bound of the percentage of SVs, while μ
with w ∼ N (0, 0.09 ) being Gaussian distributed. 300 samples are bounds the percentage of SKs from below. What’s more, the gaps
generated to construct the dataset D. The method used for com- between empirical values and theoretical bounds turn out to be
parison is the single WGIK-based SVDD proposed by Shang et al. quite small, thereby indicating that the decision-making can con-
(2017), which is known to perform better than norm-based sets veniently adjust the conservatism and the complexity of the MKL
in capturing correlation and asymmetry. For MKL-based OC-SVM, uncertainty set.
we adopt M = 100 basis kernels with ν = 0.01, μ = 0.05, while for We further carry out comparisons on eight 2-D datasets with
single-kernel SVDD, the same hyper-parameter ν = 0.01 is used. various distributions. The results are summarized in Fig. 3, where
Hence, such a choice ensures that both methods have at most 3 two methods yield identical data coverage. It can be seen that,
outliers residing outside uncertainty set U, thereby yielding nearly in five cases where data distributions exhibit significant asymme-
the same data coverage and ensuring similar conservatism a pri- try, our method yields more compact encircling than single-kernel
ori. As profiled in Fig. 1, the single-kernel SVDD has obviously un- SVDD. In other cases where data distributions are close to symmet-
necessary coverage, while the MKL uncertainty set encircles data ric, the MKL uncertainty set is nearly the same as the single-kernel
more compactly and thus alleviates over-conservatism. This is be- SVDD. This showcases the effectiveness of MKL in reducing super-
cause MKL allows for a flexible selection of optimal kernels out fluous spaces in uncertainty set constructions.
of 100 candidates, whilst single-kernel SVDD essentially utilizes a Next, we investigate the computational efficiency of our learn-
combination of only two fixed kernels, as shown in right diagrams ing method for MKL-based OC-SVM, i.e., Algorithm 1. As a popular
of Fig. 1. general-purpose convex programming solver, cvx is used to solve
Upon modelling uncertainty sets, we then pay attention to real- the dual (17) for comparison. Both algorithms are implemented
istic quantities of outliers, SKs and SVs. Both the MKL uncertainty on a desktop computer with Intel(R) Core(TM) i7-8700K CPU @
set and single-kernel SVDD yield zero ( 3 ) outlier, while in the 3.70 gigahertz and 16.0 gigabytes RAM. The average CPU times and
MKL set there are 11( 3 ) SVs. Note that μ = 0.05 indicates that standard deviations on various datasets are summarized in Table 1,
there are at least 5 SKs, and 7( 5 ) SKs have been found. These where datasets 1 ∼ 8 correspond to eight diagrams in Fig. 3, and
facts show consistency with statistical implications of ν and μ. dataset 9 pertains to Fig. 1. As can be seen, the general-purpose
Hence, although massive data samples and basis kernel functions solver cvx commonly entails more than 20 seconds to solve (17),
may be used for uncertainty set learning, it turns out that only which is essentially a large-scale QCQP. By contrast, the computa-
a small fraction of them are selected, which leads to structural tional cost of Algorithm 1 turns out to be two orders of magnitude
sparsity and benign complexity. We further examine the validity lower than that of cvx, which highlights the efficiency of the tai-
of Propositions 2–4 by varying ν and μ. The true percentages of lored learning algorithm.

1013
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

5 4 4.5
4
4
4 3
3.5
3
3 2 3
2
2.5
2 1
2
1
1 0 1.5
1 2 3 4 1 2 3 4 5 1 2 3 4 5 2 3 4

5 4 5
5

3 4 4
4
3
2 3
3 2
1 2
1
2
0 0 1

-1
1 -1 0
0 1 2 3 4 0 1 2 3 4 5 0 2 4 0 1 2 3 4 5

Fig. 3. Uncertainty sets constructed on different datasets (ν = 0.01, μ = 0.05). Polytopes learnt by MKL-based OC-SVM and single-kernel SVDD are marked in red and green,
respectively. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this article.)

4.2. Optimization performance

In this subsection, we investigate the improvement in the op-

timization performance brought by the proposed MKL uncertainty
set based on the following robust LP problem:
max cT x
x
s.t. |(1 + u˜1 )x1 + u˜2 x2 + u˜3 x3 | 10 (52)
|u˜1 x1 + (1 + u˜2 )x2 + u˜3 x3 | 10
|u˜1 x1 + u˜2 x2 + (1 + u˜3 )x3 | 10,
˜ is governed by the following Gaussian
where the uncertainty u
mixture distribution:
! "#
0.25 0 0
˜ ∼ 0.6 N 0,
u 0 0.5 0
0 0 0.75
! "#
0.5 0.25 0
+ 0.4N 0, 0.25 0.5 0 ,
0 0 0.5
based on which 500 samples are collected. The data-driven uncer-
tainty is modelled as the MKL uncertainty set with M = 441 ba-
sis kernels and WGIK-based SVDD, and then the robust LP (52) is
solved based on the associated RC problem. For a fair and compre-
hensive evaluation, we still use the same regularization parameter Fig. 4. Comparison of robust optimization results based on both methods (ν = 0.05,
ν in both methods that controls the percentage of data covered, μ = 0.1).
and let the coefficient vector c be a unit vector uniformly sam-
pled from a 3D space. There are 625 instances of different vec-
tors c generated in total, and optimal values of (52) under two depicted in Fig. 5. It is known that when μ ∈ (0, 1/M], the most
uncertainty sets are further obtained, which are summarized in sparse model is obtained with the fewest SKs selected. However,
Fig. 4. Since (52) is a maximization problem, the higher objective, this does not always yield the best performance, as evidenced from
the better performance. It can be seen from histograms that the the optimal value with the smallest μ = 1/M ≈ 0.0023. With μ in-
MKL uncertainty set tends to induce higher objective values than creasing, the MKL uncertainty set tends to select more SKs, and
the single-kernel SVDD. In particular, the highest optimal value of the performance becomes commonly better than the most sparse
WGIK-induced problems is 3.7173, while beyond that there are 89 model. This is in line with the general understanding of trivial
out of 625 instances in the MKL-aided RO. This unveils the effi- MKL models that simply pursuing sparsity may be rather risky,
cacy of a compact enclosing uncertainty set in providing suitable and further demonstrates the necessity of introducing regulariza-
safeguards with less sacrifice of robustness. tion on {πm } to avoid excessive sparsity. When μ ≈ 0.01, a desir-
Next, we investigate the influence of μ on the performance of able performance is obtained. Thereafter, the performance tends
RO. For a particular choice of c, we vary μ and solve the MKL- to degrade, and finally reduces to the extreme case with μ = 1
aided RO problem. The optimal values under varying μ is then where there is no room for kernel selection and an average of all

1014
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

3.6 180
Predicted demand
Demand with error
160
3.5
Optimal Value

Demand
140
3.4
120

3.3
100

3.2 80
0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6
Planning horizon

Fig. 5. RO performance under varying μ. Fig. 6. Predicted demands and prediction error data.

Table 2
Parameter setup in production-inventory problem.
basis kernels has to be used. This clearly demonstrates the effect
of kernel selection in reducing the conservatism of kernel-based Parameter Value
uncertainty sets. In addition, the performance of the MKL uncer- P 200
tainty set can be further tuned by adjusting μ. Recall that when Q 960
0 μ 1/M, the proposed formulation reduces to traditional MKL Vmin 100
Vmax 300
OC-SVM (Han et al., 2019; Rakotomamonjy et al., 2008), which se-
c [4.11, 2.69, 1.57, 1.89, 3.31, 4.43]T
lects the fewest kernels and leads to the risk of over-ﬁtting (Xu
et al., 2013). It is hence recommended to set the value of μ slightly
higher than 1/M to attain a desirable performance and a moderate Table 3
Optimal values and solution times of the different models.
computational complexity.
Model Optimal value Solution time (seconds)
5. Multi-stage production-inventory management Box + ADR 1944.27 0.27
SVDD + ADR 1954.85 0.31
In this section, we investigate the utilization of our proposed MKL + ADR 1859.31 0.48
MKL + EADR 1843.92 0.86
MKL uncertainty set in a production-inventory problem, which can
be cast in a multi-stage robust optimization setting (Ben-Tal et al.,
2004). Consider a factory with a warehouse producing a single
product. The goal is to determine its production plan for a planning help capturing correlation and asymmetry of distribution of uncer-
horizon of T periods, by minimizing the production cost while sat- tainty
over horizon, and we denote by U (D ) =
the entire planning
isfying the market demand and inventory level constraints in each δδ = δ + w, ∀w ∈ W (D ) the uncertainty set of δ. In a multi-
period. Mathematically, the deterministic problem can be expressed stage decision-making procedure, the decision maker is allowed
to adjust pt after knowing market demands δ1:t−1 = [δ1 , . . . , δt−1 ]
T
as the following LP:
observed prior to period t. To minimize the worst-case production
T
min ct pt cost, a multi-stage ARO formulation is given by:
{ pt },{vt } t=1
s.t. 0 pt P, t = 1, . . . , T max min c1 p1 + max min
T δ1 ∈U1 p1 ∈1 δ2 ∈U2 p2 ∈2 (v1 )
pt Q
t=1
vt = vt−1 + pt − δt , t = 1, . . . , T c2 p2 + · · · + max min
δT ∈UT pT ∈T (vT −1 )
cT pT , (54)
Vmin vt Vmax , t = 1, . . . , T ,
where t (vr−1 ) is the feasible region of pt deﬁned by the inven-
where ct and pt are the production cost and production quantity
tory level in the preceding period. Ut is the projection of U (D ) onto
in period t. P and Q are maximal production limits in each period
the space of δt . The above problem instantiates the general multi-
and over the entire planning horizon. δt is the market demand in
stage ARO formulation (48), and thus can be approximated as an
period t, and vt is the inventory level in period t, with the minimal
LP problem by means of ADR and EADR.
allowed level Vmin and the maximal storage capacity Vmax .
In this case study, we consider a planning horizon with T =
Under demand uncertainty, it is assumed that a prediction
6 periods in total. A dataset of prediction error {wi }500 i=1
has al-
model is available such that the uncertain demand decomposes
ready been collected from past experience. The predicted demands
into:
and those added with empirical prediction errors are shown in
δt = δt + wt , t = 1, . . . , T , (53) Fig. 6, where the variance of prediction error increases over stages.
T Other parameters in the optimization problem are summarized in
where the nominal demand δ = δ1 , . . . , δT can be predicted by Table 2. The initial inventory level is set as v0 = 150.
the model at hand and the prediction error w = [w1 , . . . , wT ]T is We adopt three different uncertainty sets to learn from data
assumed to reside in a time-invariant uncertainty set W. Histor- and then formulate the multi-stage ARO problem, including the
ical data of prediction errors can be accumulated as a dataset classical box uncertainty set U∞ , the single-kernel SVDD set (Shang
D, based on which the proposed MKL uncertainty set W (D ) can et al., 2017), and the proposed MKL uncertainty set. We use ν =
be established. The use of a data-driven uncertainty set may 0.01 for the last two kernel-based sets, which ensures that both

1015
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

Table 4
Computational complexity with increasing T of the “MKL + EADR” model.

T #BK Com. mem. Lea. time (seconds) #Con. #Var. Sol. time (seconds)

4 53 122.0 kilobytes 2.0276 2861 5787 0.23

5 54 3.0 megabytes 11.6101 5455 11,102 0.61
6 55 74.5 megabytes 193.3916 12,115 24,723 0.70
7 46 128.0 megabytes 202.3154 21,629 44,178 1.59
56 1.8 gigabytes 8485.4009 25,439 52,167 5.91
8 47 2.0 gigabytes 6521.5613 37,297 76,671 3.66
57 45.5 gigabytes – – – –

sets contain at least 99% of historical data. For a fair comparison, method is suitable for uncertainty with dimensions lower than 8.
the box set is calibrated to include 99% of historical data. To con- Meanwhile, the sizes of constraints and variables in the robust
struct the MKL uncertainty set, kernel functions are specified with counterpart problem grow with n. In spite of growing complexity,
truncated uncertainty samples based on (50) to be compatible with it is quite computationally thrifty to solve the robust counterpart
the non-anticipativity requirement such that the usage of EADR is problem to obtain the decision, which is due to an LP formulation
plausible. In contrast, the box set and the single-kernel SVDD set of the robust counterpart problem.
only enable the use of ADR. All induced RC problems can be cast Despite such challenges, in face of high-dimensional uncer-
as LPs. The optimal values and solution times of different models tainty one can resort to some useful statistical methods together
are summarized in Table 3. with the proposed approach to circumvent the curse of dimen-
It can be observed that, the model with box uncertainty set and sionality. For example, if all uncertain parameters are closely cor-
ADR yields an optimal value of 1944.27 as well as the lowest solu- related, one could perform dimension reduction (e.g. based on
tion time. The single-kernel SVDD model with ADR yields a slightly PCA) first, and then construct an uncertainty set in the reduced-
higher optimal value. This may be due to the curse of dimension- dimensional subspace (Shang & You, 2018). In this way, the curse
ality that traditional single-kernel methods typically suffer from, of dimensionality can be much alleviated. Conversely, if all uncer-
because in this case δ has six dimensions. By contrast, the use of tain parameters are not closely correlated, then one can split them
the proposed MKL uncertainty set with ADR leads to a significantly into several independent groups of smaller sizes, and then utilize
reduction of production cost. This sheds light on the effectiveness the proposed MKL uncertainty set to handle each group individ-
of learning with multiple kernel selected in alleviating the curse of ually. Henceforth, these two basic strategies can also be jointly
dimensionality and the conservatism of learning-based RO. Beyond adopted to handle general cases. In this way, high-dimensional un-
that, without paying too much additional effort, one can attain an certainty can still be tackled with our MKL approach serving as
improved performance using the MKL set along with EADR, which a basic modeling tool. All above-mentioned facts draw a compre-
owes to both the modeling power of MKL and the expressiveness hensive picture about the effectiveness and limitations of the pro-
of EADR. Note that computational burdens of using MKL sets are posed MKL-based uncertainty set, thereby providing insights into
only slightly higher than those associated with box set and SVDD its practical usage.
set, mainly because of the use of multiple basis kernels in the un-
certainty set. Nevertheless, the resultant LP problems can still be 6. Concluding remarks
tackled efficiently with off-the-shelf solvers. Therefore, integrating
the MKL uncertainty set with EADR can be an appealing choice to In this work, we propose a novel data-driven RO framework –
approximately solve multi-stage ARO problems. MKL-aided RO – to cope with static RO as well as ARO. With our
MKL-based OC-SVM learning approach, a compactly enclosing un-
5.1. Scalability of the proposed approach and discussions certainty set can be obtained, especially when the distribution of
the uncertainty has evident asymmetry, which alleviates the over-
In order to investigate the practical scalability of the ap- conservatism of the induced RO and narrows the gap between the
proach proposed in this paper, we still take the above multi-stage optimization model and the real-world situation. Thanks to the
production-inventory management problem as an example. Using inherent sparsity of the MKL-based OC-SVM, the uncertainty set
the “MKL + EADR” model, we investigate the number of candi- turns out to be a polytope with succinct expression, which ensures
date basis kernels (#BK), the minimal computer memory (Com. the computational tractability of the induced RO. The MKL-aided
Mem.) and the learning time (Lea. time) required for the uncer- RO framework is user-friendly because two hyper-parameters have
tainty set learning, the number of constraints (#Con.), variables explicit statistical meaning, which allow the user to conveniently
(#Var.) and the solution time (Sol. time) of the induced tractable balance between the conservativeness and the computational cost.
RC with progressively increasing T (see Table 4). The uncertain This data-driven RO framework is also compatible with EADR in
data D = {wi }N in this experiment follow a mixture Gaussian dis- the multi-stage ARO problems, which further improves optimiza-
i=1
tribution of varying dimensions, and the data size is N = 500. The tion performance. Finally, numerical and application case studies
demonstrate the capability of the proposed framework in further
predicted demands δ are set as random, and the other settings are
the same as the previous case. improving the RO performance without bringing excessive compu-
Table 4 shows that the proposed approach can model uncer- tational burden.
tainties of dimension n 7 at a computational cost less than 5 min
on a personal computer, with P = 4 directions specified along each Acknowledgments
dimension. When one proceeds with n = 7, P = 5 or n = 8, P = 4,
about two hours are needed for modeling, which exhibit the curse This work is supported in part by National Science and Technol-
of dimensionality obviously. When n continuous to grow, the com- ogy Innovation 2030 Major Project of the Ministry of Science and
putational cost becomes prohibitively unaffordable since both the Technology of China under Grant 2018AAA0101604, and National
solution time and memory needed by HessianMKL algorithm go Natural Science Foundation of China (Nos. 61673236, 61433001,
beyond limits that are practically acceptable. Hence, the proposed and 61873142).

1016
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

References Han, B., Shang, C., Yang, F., & Huang, D. (2019). Multiple kernel learning-based un-
certainty set construction for robust optimization. In Proceedings of the 15th
Aiolli, F., & Donini, M. (2015). EasyMKL: A scalable multiple kernel learning algo- IEEE international conference on control and automation (ICCA) (pp. 1417–1422).
rithm. Neurocomputing, 169, 215–224. IEEE.
Alexeenko, P., & Bitar, E. (2020). Nonparametric estimation of uncertainty sets for Hong, L. J., Huang, Z., & Lam, H. (2017). Learning-based robust optimization: Proce-
robust optimization. arXiv preprint arXiv:2004.03069. dures and statistical guarantees. arXiv preprint arXiv:1704.04342.
Balcik, B., & Yanıkoğlu, İ. (2020). A robust optimization approach for humanitarian Huang, X., Shi, L., & Suykens, J. A. K. (2013). Support vector machine classifier with
needs assessment planning under travel time uncertainty. European Journal of pinball loss. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5),
Operational Research, 282(1), 40–57. 984–997.
Ben-Tal, A., El Ghaoui, L., & Nemirovski, A. (2009). Robust optimization. Princeton Jain, A., Vishwanathan, S. V., & Varma, M. (2012). SPG-GMKL: generalized multiple
University Press. kernel learning with a million kernels. In Proceedings of the 18th ACM SIGKDD
Ben-Tal, A., Goryashko, A., Guslitzer, E., & Nemirovski, A. (2004). Adjustable ro- international conference on knowledge discovery and data mining (pp. 750–758).
bust solutions of uncertain linear programs. Mathematical Programming, 99(2), ACM.
351–376. Jakubovskis, A. (2017). Strategic facility location, capacity acquisition, and tech-
Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of nology choice decisions under demand uncertainty: Robust vs. non-robust
Operations Research, 23(4), 769–805. optimization approaches. European Journal of Operational Research, 260(3),
Ben-Tal, A., & Nemirovski, A. (1999). Robust solutions of uncertain linear programs. 1095–1104.
Operations Research Letters, 25(1), 1–13. Jalilvand-Nejad, A., Shafaei, R., & Shahriari, H. (2016). Robust optimization under
Ben-Tal, A., & Nemirovski, A. (20 0 0). Robust solutions of linear programming correlated polyhedral uncertainty set. Computers & Industrial Engineering, 92,
problems contaminated with uncertain data. Mathematical Programming, 88(3), 82–94.
411–424. Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learn-
Bertsimas, D., Brown, D. B., & Caramanis, C. (2011). Theory and applications of ro- ing the kernel matrix with semidefinite programming. Journal of Machine learn-
bust optimization. SIAM Review, 53(3), 464–501. ing research, 5(Jan), 27–72.
Bertsimas, D., Gupta, V., & Kallus, N. (2018). Data-driven robust optimization. Math- Lappas, N. H., & Gounaris, C. E. (2016). Multi-stage adjustable robust optimization
ematical Programming, 167(2), 235–292. for process scheduling under uncertainty. AIChE Journal, 62(5), 1646–1667.
Bertsimas, D., Pachamanova, D., & Sim, M. (2004). Robust linear optimization under Li, Z., Ding, R., & Floudas, C. A. (2011). A comparative theoretical and computational
general norms. Operations Research Letters, 32(6), 510–516. study on robust counterpart optimization: I. Robust linear optimization and ro-
Bertsimas, D., & Sim, M. (2003). Robust discrete optimization and network flows. bust mixed integer linear optimization. Industrial & Engineering Chemistry Re-
Mathematical Programming, 98(1–3), 49–71. search, 50(18), 10567.
Bertsimas, D., & Sim, M. (2004). The price of robustness. Operations Research, 52(1), Li, Z., Tang, Q., & Floudas, C. A. (2012). A comparative theoretical and compu-
35–53. tational study on robust counterpart optimization: II. Probabilistic guarantees
Bertsimas, D., & Thiele, A. (2006). Robust and data-driven optimization: modern de- on constraint satisfaction. Industrial & Engineering Chemistry Research, 51(19),
cision making under uncertainty. In Models, methods, and applications for inno- 6769–6788.
vative decision making (pp. 95–122). INFORMS. Luedtke, J., Ahmed, S., & Nemhauser, G. L. (2010). An integer programming approach
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University for linear programs with probabilistic constraints. Mathematical Programming,
Press. 122(2), 247–272.
Caballero, W. N., Lunday, B. J., & Uber, R. P. (2021). Identifying behaviorally robust Margellos, K., Goulart, P., & Lygeros, J. (2014). On the road between robust optimiza-
strategies for normal form games under varying forms of uncertainty. European tion and the scenario approach for chance constrained optimization problems.
Journal of Operational Research, 288(3), 971–982. IEEE Transactions on Automatic Control, 59(8), 2258–2263.
Camerer, C., & Weber, M. (1992). Recent developments in modeling preferences: Un- Mohseni, S., & Pishvaee, M. S. (2020). Data-driven robust optimization for wastewa-
certainty and ambiguity. Journal of Risk and Uncertainty, 5(4), 325–370. ter sludge-to-biodiesel supply chain design. Computers & Industrial Engineering,
Campbell, T., & How, J. P. (2015). Bayesian nonparametric set construction for ro- 139, 105944.
bust optimization. In Proceedings of the American control conference (ACC), 2015 Moret, S., Babonneau, F., Bierlaire, M., & Maréchal, F. (2020). Decision support for
(pp. 4216–4221). IEEE. strategic energy planning: A robust optimization framework. European Journal
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM of Operational Research, 280(2), 539–554.
Transactions on Intelligent Systems and Technology, 2(3), 27. Natarajan, K., Pachamanova, D., & Sim, M. (2008). Incorporating asymmetric distri-
Chapelle, O., & Rakotomamonjy, A. (2008). Second order optimization of kernel pa- butional information in robust value-at-risk optimization. Management Science,
rameters. In Proceedings of the NIPS workshop on kernel learning: Automatic se- 54(3), 573–585.
lection of optimal kernels: 19 (p. 87). Ning, C., & You, F. (2017a). Data-driven adaptive nested robust optimization: General
Charnes, A., & Cooper, W. W. (1959). Chance-constrained programming. Management modeling framework and efficient computational algorithm for decision making
Science, 6(1), 73–79. under uncertainty. AIChE Journal, 63(9), 3790–3817.
Chen, X., Sim, M., & Sun, P. (2007). A robust optimization perspective on stochastic Ning, C., & You, F. (2017b). A data-driven multistage adaptive robust optimization
programming. Operations Research, 55(6), 1058–1071. framework for planning and scheduling under uncertainty. AIChE Journal, 63(10),
Chen, X., Sim, M., Sun, P., & Zhang, J. (2008). A linear decision-based approximation 4343–4369.
approach to stochastic programming. Operations Research, 56(2), 344–357. Ning, C., & You, F. (2018). Data-driven decision making under uncertainty integrating
Chen, X., & Zhang, Y. (2009). Uncertain linear programs: Extended affinely ad- robust optimization with principal component analysis and kernel smoothing
justable robust counterparts. Operations Research, 57(6), 1469–1482. methods. Computers & Chemical Engineering, 112, 190–210.
Crespo, L. G., Colbert, B. K., Kenny, S. P., & Giesy, D. P. (2019). On the quantification Ning, C., & You, F. (2019). Optimization under uncertainty in the era of big data
of aleatory and epistemic uncertainty using sliced-normal distributions. Systems and deep learning: When machine learning meets mathematical programming.
& Control Letters, 134, 104560. Computers & Chemical Engineering, 125, 434–448.
Dai, X., Wang, X., He, R., Du, W., Zhong, W., Zhao, L., & Qian, F. (2019). Data– Odone, F., Barla, A., & Verri, A. (2005). Building kernels from binary strings for im-
driven robust optimization for crude oil blending under uncertainty. Computers age matching. IEEE Transactions on Image Processing, 14(2), 169–180.
& Chemical Engineering, 106595. Rakotomamonjy, A., Bach, F. R., Canu, S., & Grandvalet, Y. (2008). SimpleMKL. Journal
Dantzig, G. B. (1955). Linear programming under uncertainty. Management Science, of Machine Learning Research, 9(Nov), 2491–2521.
1(3–4), 197–206. Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001).
Ebrahimi, K., Vaidya, U., & Elia, N. (2019). Robust optimization via discrete-time sad- Estimating the support of a high-dimensional distribution. Neural Computation,
dle point algorithm. In Proceedings of the 58th IEEE conference on decision and 13(7), 1443–1471.
control (CDC) (pp. 2473–2478). IEEE. Schölkopf, B., Smola, A. J., Bach, F., et al. (2002). Learning with kernels: Support vector
El-Ghaoui, L., & Lebret, H. (1997). Robust solutions to least-square problems to machines, regularization, optimization, and beyond. MIT Press.
uncertain data matrices. SIAM Journal on Matrix Analysis and Applications, 18, Shang, C., Chen, W. H., Stroock, A. D., & You, F. (2020). Robust model predictive
1035–1064. control of irrigation systems with active uncertainty learning and data analytics.
El-Ghaoui, L., Oustry, F., & Lebret, H. (1998). Robust solutions to uncertain semidef- IEEE Transactions on Control Systems Technology, 28, 1493–1504.
inite programs. SIAM Journal on Optimization, 9(1), 33–52. Shang, C., Huang, X., & You, F. (2017). Data-driven robust optimization based on
Ferreira, R. d. S., Barroso, L., & Carvalho, M. M. (2012). Demand response models kernel learning. Computers & Chemical Engineering, 106(2), 464–479.
with correlated price data: A robust optimization approach. Applied Energy, 96, Shang, C., & You, F. (2018). Robust optimization in high-dimensional data space with
133–149. support vector clustering. IFAC-PapersOnLine, 51(18), 19–24.
Gabrel, V., Murat, C., & Thiele, A. (2014). Recent advances in robust optimization: Shang, C., & You, F. (2019a). Data analytics and machine learning for smart process
An overview. European Journal of Operational Research, 235(3), 471–483. manufacturing: Recent advances and perspectives in the big data era. Engineer-
Goh, J., & Sim, M. (2010). Distributionally robust optimization and its tractable ap- ing, 5(6), 1010–1016.
proximations. Operations Research, 58(4-part-1), 902–917. Shang, C., & You, F. (2019b). A data-driven robust optimization approach to sce-
Grant, M., Boyd, S., & Ye, Y. (2008). CVX: Matlab software for disciplined convex nario-based stochastic model predictive control. Journal of Process Control, 75,
programming. 24–39.
Guzman, Y. A., Matthews, L. R., & Floudas, C. A. (2016). New a priori and a Shen, F., Zhao, L., Du, W., Zhong, W., & Qian, F. (2020). Large-scale industrial energy
posteriori probabilistic bounds for robust counterpart optimization: I. Un- systems optimization under uncertainty: A data-driven robust optimization ap-
known probability distributions. Computers & Chemical Engineering, 84, 568– proach. Applied Energy, 259, 114199.
598.

1017
B. Han, C. Shang and D. Huang European Journal of Operational Research 292 (2021) 1004–1018

Sonnenburg, S., Rätsch, G., Schäfer, C., & Schölkopf, B. (2006). Large scale mul- Yuan, Y., Li, Z., & Huang, B. (2016). Robust optimization under correlated uncer-
tiple kernel learning. Journal of Machine Learning Research, 7(Jul), 1531– tainty: Formulations and computational study. Computers & Chemical Engineer-
1565. ing, 85, 58–71.
Soyster, A. L. (1973). Convex programming with set-inclusive constraints and appli- Zhang, Q., Grossmann, I. E., Sundaramoorthy, A., & Pinto, J. M. (2016a). Data-driven
cations to inexact linear programming. Operations Research, 21(5), 1154–1157. construction of convex region surrogate models. Optimization and Engineering,
Suzuki, T., & Tomioka, R. (2011). SpicyMKL: A fast algorithm for multiple kernel 17(2), 289–332.
learning with thousands of kernels. Machine Learning, 85(1–2), 77–108. Zhang, Q., Morari, M. F., Grossmann, I. E., Sundaramoorthy, A., & Pinto, J. M. (2016b).
Wiesemann, W., Kuhn, D., & Sim, M. (2014). Distributionally robust convex opti- An adjustable robust optimization approach to scheduling of continuous indus-
mization. Operations Research, 62(6), 1358–1376. trial processes providing interruptible load. Computers & Chemical Engineering,
Xu, X., Tsang, I. W., & Xu, D. (2013). Soft margin multiple kernel learning. IEEE Trans- 86, 106–119.
actions on Neural Networks and Learning Systems, 24(5), 749–761. Zhang, Y., Jin, X., Feng, Y., & Rong, G. (2018). Data-driven robust optimization under
Yanıkoğlu, İ., Gorissen, B. L., & den Hertog, D. (2019). A survey of adjustable robust correlated uncertainty: A case study of production scheduling in ethylene plant.
optimization. European Journal of Operational Research, 277(3), 799–813. Computers & Chemical Engineering, 109, 48–67.

1018

Babak Robust 1
No ratings yet
Babak Robust 1
20 pages
Babak Robust 2
No ratings yet
Babak Robust 2
16 pages
Theory and Applications of Robust Optimization: SIAM Review October 2010
No ratings yet
Theory and Applications of Robust Optimization: SIAM Review October 2010
51 pages
A Practical Guide To Robust Optimization11
No ratings yet
A Practical Guide To Robust Optimization11
29 pages
Robust Discrete Optimization and Network Flows
No ratings yet
Robust Discrete Optimization and Network Flows
26 pages
Mean Robust Optimization: Series A
No ratings yet
Mean Robust Optimization: Series A
43 pages
RobustOptimizationPaper PDF
No ratings yet
RobustOptimizationPaper PDF
38 pages
Zhen 2018
No ratings yet
Zhen 2018
16 pages
Zhang - A General Robust-Optimization Formulation For Nonlinear Programming
No ratings yet
Zhang - A General Robust-Optimization Formulation For Nonlinear Programming
14 pages
Uncertainty Reduction v5
No ratings yet
Uncertainty Reduction v5
9 pages
1 s2.0 S0377221722008773 Main
No ratings yet
1 s2.0 S0377221722008773 Main
28 pages
Practical Guide to Robust Optimization
No ratings yet
Practical Guide to Robust Optimization
36 pages
The Impact of Modeling On Robust Inventory Management Under Demand Uncertainty
No ratings yet
The Impact of Modeling On Robust Inventory Management Under Demand Uncertainty
15 pages
Adaptive Distributionally Robust Optimization
No ratings yet
Adaptive Distributionally Robust Optimization
16 pages
Robust Linear Optimization With Recourse: Aur Elie Thiele, Tara Terry, and Marina Epelman March, 2010
No ratings yet
Robust Linear Optimization With Recourse: Aur Elie Thiele, Tara Terry, and Marina Epelman March, 2010
29 pages
Robust Linear Optimization With Recourse: Aur Elie Thiele, Tara Terry, and Marina Epelman March, 2010
No ratings yet
Robust Linear Optimization With Recourse: Aur Elie Thiele, Tara Terry, and Marina Epelman March, 2010
29 pages
3 Programación Lineal 2021
No ratings yet
3 Programación Lineal 2021
16 pages
A Survey of Contextual Optimization Methods For Decision-Making Under Uncertainty
No ratings yet
A Survey of Contextual Optimization Methods For Decision-Making Under Uncertainty
52 pages
Adaptative For Lineare
No ratings yet
Adaptative For Lineare
23 pages
Distributionally Robust Optimization
No ratings yet
Distributionally Robust Optimization
221 pages
IntroRobustOptim Python
No ratings yet
IntroRobustOptim Python
28 pages
Robustness
No ratings yet
Robustness
23 pages
Robust Counterpart
No ratings yet
Robust Counterpart
6 pages
Robust Counterpart Optimization: Uncertainty Sets, Formulations and Probabilistic Guarantees
No ratings yet
Robust Counterpart Optimization: Uncertainty Sets, Formulations and Probabilistic Guarantees
6 pages
Adaptive Distributionally Robust Optimization Bertsimas
No ratings yet
Adaptive Distributionally Robust Optimization Bertsimas
41 pages
General Paper Rev2-11
No ratings yet
General Paper Rev2-11
25 pages
Robust Transmission Expansion Planning With Uncertain Generations and Loads Using Full Probabilistic Information
No ratings yet
Robust Transmission Expansion Planning With Uncertain Generations and Loads Using Full Probabilistic Information
8 pages
Nonparametric Adaptive Bayesian Stochastic Control Under Model Uncertainty
No ratings yet
Nonparametric Adaptive Bayesian Stochastic Control Under Model Uncertainty
28 pages
Achieving Robust Data-Driven Contextual Decision Making in A Data Augmentation Way
No ratings yet
Achieving Robust Data-Driven Contextual Decision Making in A Data Augmentation Way
30 pages
Effective Robust Formulation of Right Ha
No ratings yet
Effective Robust Formulation of Right Ha
32 pages
Robust Optimization in Business Analytics
No ratings yet
Robust Optimization in Business Analytics
24 pages
1 s2.0 S0167637711000022 Main
No ratings yet
1 s2.0 S0167637711000022 Main
5 pages
Distributionally Robust Optimization With Decision
No ratings yet
Distributionally Robust Optimization With Decision
30 pages
1 s2.0 S0377221724002200 Main
No ratings yet
1 s2.0 S0377221724002200 Main
19 pages
Powell UnifiedFrameworkforOUU ECSO Tutorial Sept222017 PDF
No ratings yet
Powell UnifiedFrameworkforOUU ECSO Tutorial Sept222017 PDF
177 pages
Computers and Operations Research: Babak Abbasi, Toktam Babaei, Zahra Hosseinifard, Kate Smith-Miles, Maryam Dehghani
No ratings yet
Computers and Operations Research: Babak Abbasi, Toktam Babaei, Zahra Hosseinifard, Kate Smith-Miles, Maryam Dehghani
20 pages
2003 - Cce - Sahinidis - Optimization Under Uncertainty State of The Art and Opportunities
No ratings yet
2003 - Cce - Sahinidis - Optimization Under Uncertainty State of The Art and Opportunities
13 pages
2831548138kannan Et Al 2025 Technical Note Data Driven Sample Average Approximation With Covariate Information
No ratings yet
2831548138kannan Et Al 2025 Technical Note Data Driven Sample Average Approximation With Covariate Information
16 pages
Distributionally Favorable Optimization: A Framework For Data-Driven Decision-Making With Endogenous Outliers
No ratings yet
Distributionally Favorable Optimization: A Framework For Data-Driven Decision-Making With Endogenous Outliers
40 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
Opre 2020 2069
No ratings yet
Opre 2020 2069
19 pages
Robust Linear Algebra
No ratings yet
Robust Linear Algebra
11 pages
8832 1 Art File 68003 p72w5l Convrt
No ratings yet
8832 1 Art File 68003 p72w5l Convrt
33 pages
NetOpt2020 2020 01 20 Arie
No ratings yet
NetOpt2020 2020 01 20 Arie
101 pages
Robust Dynamic Programming
No ratings yet
Robust Dynamic Programming
30 pages
W Pg#s
No ratings yet
W Pg#s
85 pages
A-A Regularized Robust Design Criterion For Uncertain Data
No ratings yet
A-A Regularized Robust Design Criterion For Uncertain Data
23 pages
CILAMCE2015 Renatha PDF
No ratings yet
CILAMCE2015 Renatha PDF
15 pages
Reinforcement Learning Optimization
No ratings yet
Reinforcement Learning Optimization
6 pages
Robust Model Calibration for NASA UQ Challenge
No ratings yet
Robust Model Calibration for NASA UQ Challenge
19 pages
Control and Reinforcement Learning
No ratings yet
Control and Reinforcement Learning
6 pages
An Improved Three-Stage Algorithm With Bender's Decomposition For Relative Robust Optimization Under Full Factorial Scenario Design of Data Uncertainty
No ratings yet
An Improved Three-Stage Algorithm With Bender's Decomposition For Relative Robust Optimization Under Full Factorial Scenario Design of Data Uncertainty
18 pages
A Unified Theory of Robust and Distributionally Robust Optimization Via The Primal-Worst-Equals-Dual-Best Principle
No ratings yet
A Unified Theory of Robust and Distributionally Robust Optimization Via The Primal-Worst-Equals-Dual-Best Principle
74 pages
Robust Control Scheme For A Class of Uncertain Nonlinear Systems With Completely Unknown Dynamics Using Data-Driven Reinforcement Learning Method
No ratings yet
Robust Control Scheme For A Class of Uncertain Nonlinear Systems With Completely Unknown Dynamics Using Data-Driven Reinforcement Learning Method
34 pages
Mathematics 11 04451
No ratings yet
Mathematics 11 04451
16 pages
Recent Advances in Bayesian Optimization: And, and
No ratings yet
Recent Advances in Bayesian Optimization: And, and
36 pages
Workshop CNAM
No ratings yet
Workshop CNAM
66 pages
Forensic Certificate for Police Use
No ratings yet
Forensic Certificate for Police Use
1 page
Class 12th Strings and List
No ratings yet
Class 12th Strings and List
10 pages
PhD Literature Review Guide
100% (1)
PhD Literature Review Guide
4 pages
Sinclair Coefficients 2017
No ratings yet
Sinclair Coefficients 2017
6 pages
Abhay - New Synopsis - AI in Banking
No ratings yet
Abhay - New Synopsis - AI in Banking
16 pages
Pen Book
No ratings yet
Pen Book
1 page
Alahi Et Al - 2016 - Social LSTM
No ratings yet
Alahi Et Al - 2016 - Social LSTM
11 pages
Bec (BJT Transistor)
No ratings yet
Bec (BJT Transistor)
23 pages
02 Performance Task 1
No ratings yet
02 Performance Task 1
1 page
SWOT Analysis Theory+Task
No ratings yet
SWOT Analysis Theory+Task
1 page
Essential Statistics For The Behavioral Sciences Gregory J. Privitera Full Chapters Instanly
No ratings yet
Essential Statistics For The Behavioral Sciences Gregory J. Privitera Full Chapters Instanly
77 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
20 pages
AI-Powered Online Job Portal Study
No ratings yet
AI-Powered Online Job Portal Study
8 pages
Ejemplo de Ensayo de Ilustración
100% (1)
Ejemplo de Ensayo de Ilustración
8 pages
EGE Flow Sensors
No ratings yet
EGE Flow Sensors
120 pages
ĐỀ SỐ 23
No ratings yet
ĐỀ SỐ 23
22 pages
Compact NSX 630a Lv429390
No ratings yet
Compact NSX 630a Lv429390
2 pages
Social Media's Impact on STI College Students
No ratings yet
Social Media's Impact on STI College Students
18 pages
CAD Poly Object Creation Guide
No ratings yet
CAD Poly Object Creation Guide
5 pages
Sahib Singh SAULT Letter-of-Acceptance PDF
100% (2)
Sahib Singh SAULT Letter-of-Acceptance PDF
2 pages
Case 13 - ING
No ratings yet
Case 13 - ING
6 pages
Sample of Welding Quality Manual
0% (1)
Sample of Welding Quality Manual
5 pages
5473 - 10 - Cadastral Mapping - Bhunaksha - 45 - 30
No ratings yet
5473 - 10 - Cadastral Mapping - Bhunaksha - 45 - 30
39 pages
A 1 BE00013887 G: Hydraulic Circuit DE712
No ratings yet
A 1 BE00013887 G: Hydraulic Circuit DE712
1 page
Sonomètre Svantek
No ratings yet
Sonomètre Svantek
4 pages
Yarn Lab Report
No ratings yet
Yarn Lab Report
6 pages
Co Generation
No ratings yet
Co Generation
5 pages
Geberit Sanitary Catalogue 2017 2018
No ratings yet
Geberit Sanitary Catalogue 2017 2018
348 pages
014 - SAS Overall Interlocking
100% (1)
014 - SAS Overall Interlocking
14 pages
Chapter 9
No ratings yet
Chapter 9
46 pages

Data Driven EJOR

Uploaded by

Data Driven EJOR

Uploaded by

European Journal of Operational Research 292 (2021) 1004–1018

Contents lists available at ScienceDirect

European Journal of Operational Research

Multiple kernel learning-aided robust optimization: Learning

the conservatism of RO based on user’s preference. K ( u, v ) = πm Km (u, v ) (9)

rise to a succinct expression. For this reason, the basis kernels

(a) Single WGIK-based SVDD

(b) MKL-based OC-SVM

In the proposed MKL uncertainty set construction scheme, we Hm,n n

y(u ) = y0 + Gu, (44) yt (u1:t , 1:t ) = yt0 + Gt u1:t + Ht vec(1:t ), t = 1, . . . , T , (49)

0.5 Percentage of SVs Percentage of SKs

Fig. 2. Percentages of outliers, SVs, and SKs under varying ν and μ.

variables in a multi-stage setup: Table 1

In order to demonstrate the learning capability of the pro-

4.2. Optimization performance

In this subsection, we investigate the improvement in the op-

4 53 122.0 kilobytes 2.0276 2861 5787 0.23

You might also like

y(u ) = y0 + Gu, (44) yt (u1:t , 1:t ) = yt0 + Gt u1:t + Ht vec(1:t ), t = 1, . . . , T , (49)