Advances in Stochastic Mixed Integer Programming
Lecture at the INFORMS Optimization Section Conference in Miami, February 26, 2012
Suvrajeet Sen Data Driven Decisions Lab Integrated Systems Engineering Ohio State University
d
Data Driven Decisions @ The Ohio State University Industrial and Systems Engineering
Overview of this Lecture
Some Historical Remarks Classification of SMIP SMIP Models: Risk, Recourse, Resilience Structural Properties Decomposition: Benders and Beyond Illustrative Computational Results
Historical Remarks: IOS
Age of the INFORMS Optimization Section is
a)
b) c)
d)
e)
0 < age 10 10 < age 15 15 < age 20 20 < age 25 age > 25
INFORMS OS was founded at the Spring ORSA/TIMS Meeting in Los Angeles, April 1995.
d d d
Historical Remarks: My assessment of SIP/SMIP
Stochastic Linear Programming
Discrete Choice
Major Hurdles Still Remain!!
Stochastic Integer Programming
Uncertainty
Uncertainty
Discrete Choice 1950sPresent
Linear Programming
Integer Programming
1960s Present
Why model uncertainty? For most people, thats just reality
In the real world:
Risk is everywhere Risk is merely a 4-letter word There is a market for information Information has no value Hind sight is 20/20 Foresight is 20/20
d d d
Certainty:
In the real world:
Certainty:
In the real world:
Certainty:
More Historical Remarks: Walk Before You Can Run
Some data: [1980 2000) prior to 2000
annotated bibliography (Stougie and van der Vlerk) Theory: (7+) papers
Simple Integer Recourse: 2 Structural Properties of Expected Recourse Function: 4 Complexity: (1+) Benders-type methods: 5 Grobner-basis methods: 2 Convex Approximations for Simple Integer Recourse: 2 Other: 8 (Sampling with first-stage integer, disjunctive cuts)
d d
General Purpose Algorithms: 17 papers
More Historical Remarks: Walk Before You Can Run [1980-2000)
Books/Surveys: 6 altogether
Dissertations: 3-4 (1 prior to 2000 in North America) Habilitation: 1 Published surveys: 3 (includes hierarchical planning) Production Planning/Scheduling: 3 Network and Routing: 11 Location: 7 Other: 4
Special Purpose Models/Algorithms: 25 papers
In the 12 years: more than 350 articles listed in http://mally.eco.rug.nl/index.html?BIBLIO/SIP.HTML
d d
Now we are running (Survey Articles 2000-)
Schultz, R (2003) Stochastic Programming with Integer Variables, Mathematical Programming-B, 285-309 Stougie, L. and M.H. van der Vlerk (2005) Approximation in Stochastic Integer Programming Sen, S. (2005) Stochastic Mixed-Integer Programming Algorithms, Handbook of Discrete Optimization, (Aardal, Nemhauser, Weismantel, eds.) . Some newer surveys are also available .
And you know were serious because of applications with realistic data
Manufacturing Supply Chain: Two-stage Design (IBM, Intel) Biofuel Supply Chain: Multi-stage Design (Fan et al) Homeland Security Defender/Attacker/Defender (Wood et al NPS, Ordonez/Tambe, Smith) Electric Power Unit Commitment (Birge/Takriti, Philpott, Guan/Zhang), Fuel Price Hedging (Sen et
al)
Military Prioritizing Choices (Morton), UAV/MAV (Evers et al) Fighting Forest Fire (Ntaimo)
d d
SMIP Classification: A (B-C-D-E) Notation for SMIP
Two Stage Stochastic Linear Programming Min cTx + E[f(x, )] Ax = b, x 0 where, f(x, ) = Min gTy Wy r() T()x y0 Variations depend on where the randomness appears
d d d
Stochastic MIP with First Stage Integers
Min cTx + E[f(x, )] n n Ax b, x R 1 Z 2 where, f(x, ) = Min gTy Wy r() T()x n yR 3 n Z denotes integer vectors of length n. With second-stage integers, extremely difficult!
d d d
Stochastic Combinatorial Optimization
Min cTx + E[f(x, )] n Ax b, x B 1 where,(0l18 f(x, ) = Min gTy Wy r() T() x n n yR 2B 3 Here Bn denotes binary vectors of length n. Many different structures for SMIP!
d d d
Describing SMIP Problems B = Set of stages with Binary Vars. C = Set of stages with Continuous Vars. D = Set of stages with Discrete Vars.
(arbitrary integers, not just binary)
E = Endogenous Uncertainty (Y/N)
Louveaux has proposed a notation that covers all SP problems (e.g. notation includes whether random variables are cont/discrete) Above notation helps clarify domain of applicability of results/algorithms etc.
d d d
Traditional Benders Decomposition SLP: B = {}, C={1,2}, D ={} Wollmer, Norkin et al, Poojari/Mitra: B = {1}, C={1,2}, D ={1} Special Structure: Simple Integer Recourse: B = {2}, C={1}, D ={2} + structure of second stage Global Optimization and IP Ahmed, Tawarmalani, Sahinidis: B = {2}, C={1,2}, D ={2} ; + Fixed Tenders Grossman & Co. (E = Y)
d d d
Disjunctive Programming for Two-Stage
Caroe/Tind, Sherali/Fraticelli, Sen/Higle, Sen/Sherali: B = {1,2}, C={2}, D ={} Ntaimo/Sen: B = {1,2}, C={1,2}, D ={}
Multi-stage SMIPs: Caroe/Schultz, Roemisch et al, Alonso-Ayuso et al, Lulli/Sen, Guan et al B = {1,2, N}, C={1,2 N}, D ={1,2 N}b
Lagrangian-based Methods for Multi-stage
Stochastic MIP Models: Risk, Recourse, and Resilience
SMIP Models
Modeling Risk Modeling Recourse Modeling Resilience Multi-stage Models
Models not Covered (Chance Constraints with Discrete Distributions)
Special Structured IP (Knapsack, Mixing etc.) See Prkopa, Dentcheva, Ruszczynski Leudtke et al (2010), Kkyavuz (2010), Saxena et al (2009), Shen et al (2010)
d d d
Risk in SMIP
We have only stated models via Expected Values Is the reliance on Expectation a handicap? Of course! But many risk measures (e.g. down-side risk, mean absolute deviation, CVaR, etc.) can be re-formulated using expectation of a slightly modified, though mathematically similar function.
SCO for Modeling Risk
We have only stated models via Expected Values Is the reliance on Expectation a handicap? Of course! But many risk measures (e.g. down-side risk, mean absolute deviation, CVaR, etc.) can be re-formulated using expectation of a slightly modified, though mathematically similar function Important: Inequalities are indispensible for risk modeling
d d
SCO for Modeling Risk
Example: Kahneman/Tversky S curve for riskaversion can be linearized using 0-1 variables.
r
Similar to non-convex piecewise linear programming.
Each piece requires a binary (switch variable)
d d d
SCO for Modeling Recourse: Stochastic Server Location Problem
SCO for Modeling Recourse: SSLP
This SCO has two sets of decisions: 1. Choose server locations (e.g. bases) 2. Once demand nodes (e.g threats) appear, then assign servers to demand nodes
SCO for Modeling Recourse: SSLP
SCO for Modeling Recourse: SSLP
Our Stochastic Server Location Problem (SSLP) also includes some policy constraints: Policy that each customer will receive service from only one site has been established. Moreover, service site must be located within a prescribed zone (z). Max number to be located is v, with each zone having no more than wz servers.
d d
SCO for Modeling Recourse: SSLP
The SSLP model objective: minimize Cost Expected Revenue Potential (last term denotes Penalty for lost demand)
Min j cjxj E[ijqijyij() + j QjYj()] subject to: constraints on supply-side, j xj v, j J(z) xj wz, z demand-side, jyij() + Yj() = i, i supply/demand: i yij() Yj() ujxj, j Plus: All variables are binary
Modeling Resilience
Logical conditions are as follows: 0 y jk 1 xj
y1jk xj
d d d
Multi-Stage SMIP Models
Non-anticipativity in the Two Stage Model
(*) is the non-anticipativity constraint all scenarios must agree on first-stage
d d
Two-stage: NA only on Root Node
Multi-stage: Difficult, unless Ocotillo-type Trees
d d d
Recursive Formulation using State Variables
Challenge of Coniferous Trees
SMIP with Recursive Formulation
Structural Properties
Most structural properties and algorithms for SMIP assume relatively complete and sufficiently expensive recourse.
- < f(x,) < + with probability 1.
Under the above assumption, the expected recourse function is real-valued and lower semi-continuous.
Complexity of Two-Stage SMIP
Two-stage stochastic programs with recourse having finitely many scenarios is #P-hard
(The class #P asks for the count (i.e. how many, rather than are there any?) The proof reduces any graph reliability problem to a two-stage stochastic combinatorial optimization problem
What Do we Need?
A Potent Brew! Decomposition (SP) and Convexification (IP)
Two Issues in Algorithm Design:
- Cuts for Second Stage IP - Approximation of f (also convexification)
d d d
Beyond Benders Decomposition: Second-stage IP
Gomory Cuts for SIP Decomposition
Hot off the Printer! First stage 0-1, Second-stage General Integer,
First stage: 0-1 Second-stage: mixed 0-1
Disjunctive Decomposition (D2)
Disjunctive Decomposition with Branch-and-Cut (D2-BAC)
First stage 0-1 Second-stage: mixed-integer
d d
Recall --- SLP
Two Stage Stochastic Linear Programming Min cTx + E[f(x, )] Ax = b, x 0 where, f(x, ) = Min gTy Wy r() T()x y0 Variations depend on where the randomness appears
d d d
Recall --- Benders Decomposition or L-shaped Method
Standard Benders Master (OR 501)
Where denote Non-negative Second-stage Dual Multipliers
Caroe-Tind extension of Benders or L-shaped Method for Second-stage SIP
Gomory Cuts to represent Subadditive Value Functions
Where denote Non-decreasing Secondstage value function approximations
Structure Similar to Benders
And a More General Framework But Need to overcome bottlenecks
Subproblems are Integer Programs Master Problems are required to Optimize Nonconvex functions
Our Recommendation: maintain Benders piecewise linear approximations
Notice the change below!
Notice that RHS r has changed to and T to
Where denote Non-negative Second-stage Dual Multipliers
Our Suggestion: Solve Second stage using Updated LP approximations
Each iteration will involve only LP solutions in the second-stage Solve LP relaxation TWICE
Once solve with an Old Convexification Derive a Cut to Update the Convexification First-stage is same as Benders original proposal Second-stage are LPs.
d d d
We will have
But can this be achieved? Yes under certain assumptions!
First stage pure binary (B = {1,2})
C = {} , D={2} Use Gomory Cuts (Gade,
Kkyavuz, Sen)
If C= {2}, B = {2} Use Disjunctive Set Convexification
(Sen and Higle)
If C= {2}, D = {2} Use Disjunctive Value
Approximations for Branch-and-Cut (Sen and Sherali)
First stage general MILP (Global Optimization)
Second Stage Set Convexification
Original Constraints Valid Inequalities as Functions of x
Parametric Gomory Cuts: Affine
Parametric Disjunctive Cuts: Piecewise Linear Concave
d d d
Parametric Gomory Cuts
Finiteness with Lexicographic Dual Simplex (Gade, Kkyavuz, Sen)
Obj
-63.50 -66.17 -67.33 -67.67
Scen
4 9 36 121
Vars
22 47 182 607
Cons
24 54 216 726
GDD-S
7 (13) 7 (39) 10 (183) 9 (526)
GDD-R
7 (32) 6 (76) 6 (384) 6 (1032)
B&B Nodes
54 306 1.55E7 7.60E6
B&B + Gom Nodes
2 (6) 8 (13) 52 (50) 13224 (167)
d
Parametric Disjunctive Cuts
Convexify 0(x,) by viewing its epigraph as a disjunctive set such as the one shown below.
0(x,)
First stage binary variable
Convergence for Disjunctive Decomposition (Set Convexification) Assumptions Complete recourse All integer variables are 0-1 Maintain all cuts in Wk Certain rules of order hold (a la lexicographic dual simplex in Gomorys proof) Under these assumptions, the D2 method results in a convergent algorithm (Sen and Higle).
d d d
Value Approximations for Branch-andCut in Second Stage (Sen and Sherali)
0(x,)
There will be one piece per node of a truncated BAC tree in the second-stage Disjunctive Programming lets us convexify the function (for each outcome )
d
Illustrative Computational Results with D2 and D2-BAC
Computational Results for Problem Instance SSLP_10_50 Scenarios Binaries Constraints 5 2,510 301 10 5,010 601 25 12,510 1,501 50 25,010 3,001 100 50,010 6,001 500 250,010 30,001 1,000 500,010 60,001 2,000 1,000,010 120,001 % ZIP Gap Iterations 10.49 209 11.38 264 10.81 286 10.89 252 11.07 300 10.75 309 11.07 322 11.01 308 D Cuts 189 257 281 250 299 307 321 307
2
CPU Time (secs.) D2 78.25 171.49 248.81 295.95 480.46 1902.20 5410.10 9055.29
L2
997.74 1284.47 1339.24 1982.60 2782.88 Failed Failed Failed
DEP 80.53 Failed Failed Failed Failed Failed Failed Failed
DEP % Gap 0.19 0.34 0.44 9.02 38.17 99.60 46.24
Computational Results for Problem Instance SSLP_15_45 Scenarios 5 10 15 Binaries Constraints 3,390 301 6,765 601 10,140 901 % ZIP Gap Iterations 6.88 146 6.53 454 5.62 814 D Cuts 145 453 813
2
CPU Time (secs.) 2 2 D L 110.34 Failed 1,494.89 Failed 7,210.63 Failed
DEP Failed Failed Failed
DEP % Gap 1.19 0.27 0.72
SCALABILITY: D2 scales well with increase in number of scenarios (linear) D2 does not scale well with increase in size of master 55 program (x)
d
Computational Results Cont
The D2 Algorithm Solves some of the largest (0-1) instances Scalability - Linear in the number of scenarios
10000 9000 8000 7000 T = 4.6631S R = 0.9888
2
T CPU Time (s)
6000 5000 4000 3000 2000 1000 0 0 500 1000
1500 S
2000
2500
Number of Scenarios
D2 CPU time for SSLP_10_50 with 100 scenarios
56
d
Computational Results (with Y. Yuan)
Treating Cut Generation as a Specialized Two-Stage LP
Instance 5.25.50 5.25.100 5.50.100 5.50.500 5.50.1000 5.50.2000 10.50.50 10.50.100 10.50.500 10.50.1000 10.50.2000 15.45.5 15.45.10 15.45.15 D2 1.64 2.15 7.10 34.50 140.47 603.37 295.95 396.76 1902.2 5410.1 9055.29 110.34 1494.89 7210.63 D2-BAC 0.70 1.73 3.70 23.05 64.17 274.40 373.98 452.31 2772.22 5677.80 > 10800 232.30 222.41 1988.26 D2-BAC++ 0.36 0.89 1.56 12.36 22.77 42.74 262.13 486.99 1313.38 2139.47 3916.47 211.79 153.41 803.56
d
Conclusions
Decomposition (SP) + Valid Inequalities (IP) provide a potent potion! But Stochastic MIP still needs a lot of work
Specially structured cuts (already at play in Chance Constrained SP) Multi-stage extensions (very rich area) Real-world Applications .
d d d