IP Lecture Notes
IP Lecture Notes
Lecture Notes
School of Mathematics
University of Birmingham
Birmingham B15 2TT, UK
1 Introduction 1
1.1 Scope of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
10 Lagrangian relaxation 83
10.1 Solving Lagrangian Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.2 Lagrangian dual as non-differentiable convex optimization . . . . . . . . . 88
10.3 Solving (LD) by Subgradient method . . . . . . . . . . . . . . . . . . . . . 89
1 – Introduction
In design, construction, maintenance,. . . engineers have to take decisions. The goal of all
such decisions is either to minimise effort or to maximise benefit. Sometimes it is impor-
tant to restrict the possible solutions to integer values, as for instance it is not profitable
to produce half a car. In such situations we speak of integer programming. Indeed, many
practical problems such as train and airline scheduling, vehicle routing, production plan-
ning, resource management, telecommunications and network design can be modeled as
integer or mixed-integer programs.
Before we delve deeper into the particulars of integer programming let us recall the steps
to be taken to model practical problems mathematically. For modeling, the following
three questions need to be addressed.
1.) What precisely shall be decided? This yields variables (also called design or de-
cision variables), which typically are stored in a vector x ∈ Rn . Every allocation
of the vector x is a solution to the problem.
2.) Which conditions need to be satisfied? This leads to formulating constraints in
the form of equalities and inequalities. One needs to check which values of the
design variables satisfy these equalities and inequalities. If a solution x satisfies all
the constraints, then it is called a feasible solution.
3.) Which feasible solution is the best one? For this one needs to come up with a rating
of the different feasible solutions (e. g. in regards to effort or benefit) so that one
can compare them and choose the best one of them. Such an optimisation criterion
is called objective and a feasible solution which is optimal with respect to the
objective is called an optimal solution.
The result of the above process is a mathematical model or program. The objective
can be usually expressed as a function of the design variables, which leads to the following
definition.
(N LP ) min f (x)
s.t. gi (x) ≤ 0 ∀i ∈ I1 ,
gi (x) = 0 ∀i ∈ I2 ,
x ∈ B,
where
• I = I1 ∪ I2 is a finite index set with
• I1 ∩ I2 = ∅
• B ⊆ Rn ,
• f, gi : B → R for all i ∈ I.
1
CHAPTER 1. INTRODUCTION
Remark 1.2. (i) In general the minimum might not exist and thus strictly speaking
we need to consider the infimum.
(ii) There are applications, in which B ⊆ Rn does not hold or in which I is not finite.
(iii) It suffices to consider minimization, as max f (x) = − min(−f (x)).
(iv) Similarly, it suffices to consider constraints of the form gi (x) ≤ 0, as gi (x) ≥ 0 is
equivalent to −gi (x) ≤ 0.
2
CHAPTER 1. INTRODUCTION
(BIP ) max c⊤ x
s. t. Ax ≤ b
x ∈ {0, 1}n .
Remark 1.4. Any (BIP) is a combinatorial optimization problem, as {0, 1}n is of cardi-
nality 2n . Can you formulate a corresponding COP as defined in the course Combinatorial
Optimisation?
In the present course we will examine how some fundamental examples of optimisation
problems can be modeled as IPs, BIPs and MIPs. Further, we will discuss methods
and algorithms for solving these. The methods and algorithms that we study build on
investigations concerning optimality criteria and finding upper and lower bounds on the
objective value.
This module presents a comprehensive theory as well as exact and approximate algorithms
for integer programming problems and a wide variety of its applications. More precisely,
this module will start with modeling, formulations and illustrative examples of integer
programming problems. Before following on to discussing optimality criteria and bounds
(including relaxation and total unimodularity) we will discuss some first methods for
solving certain IPs and BIPs. These methods include brute force, the graphical method,
divide and conquer and dynamic programming. We will then turn to some further im-
portant computational methods of integer programing, such as branch and bound, valid
inequalities and the cutting plane method. Finally, we will investigate computational
complexity of the problems and consider some further applications.
Acknowledgments. These notes are based off the lecture notes Integer Programming
by Dr. Yunbin Zhao, whom I wish to thank for sharing his notes.
3
I
Modeling & Some Methods
2 – Modeling Part I:
Fundamental Integer Programs
In this chapter we will introduce several well-known and fundamental optimization prob-
lems, which can be modeled as integer programs. We will focus on the modeling process
here and will return to these examples in later chapters to learn about algorithms and
methods to solve them.
Into which investments should we place our money so as to maximize our total present
value? Each investment is a ‘take it or leave it’ opportunity: it is not allowed to invest
partially in any of the investments.
Recall from the introduction that we need to determine the decision variables, the con-
straints and the objective in order to arrive at a mathematical program.
Definition of Variables: We use binary variables xj for each investment, i. e. xj ∈ {0, 1}
for j ∈ {1, 2, 3, 4}. If xj = 1 then we will make investment j. If xj = 0, we will not
make the investment.
4
CHAPTER 2. MODELING I: INTEGER PROGRAMS
There are a number of additional constraints we might want to add. For instance, consider
the following additional constraints:
• Only make two investments, i. e. x1 + x2 + x3 + x4 ≤ 2.
• If investment 2 is made, then investment 4 must also be made, i. e. x2 ≤ x4 .
• If investment 1 is made, then investment 3 cannot be made, i. e. x1 + x3 ≤ 1.
This leads to the 0-1 integer programming problem:
max 8 000x1 + 11 000x2 + 6 000x3 + 4 000x4
s. t. 6 700x1 + 10 000x2 + 5 500x3 + 3 400x4 ≤ 19 000
x1 + x2 + x3 + x4 ≤ 2
(2.1)
x2 − x4 ≤ 0
x1 + x3 ≤ 1
x1 , . . . , x4 ∈ {0, 1}
5
CHAPTER 2. MODELING I: INTEGER PROGRAMS
Constraints:
n
X
ai xi ≤ b,
i=1
xi ∈ {0, 1}, i ∈ {1, . . . , n}.
Objective:
n
X
max ci x i
i=1
Figure 2.1: Assignment problem, where 4 persons are assigned to 4 different jobs.
6
CHAPTER 2. MODELING I: INTEGER PROGRAMS
Conditions:
(i) n people carry out n jobs.
(ii) Each person carries out exactly one job.
(iii) If person i is assigned to job j, a cost cij is incurred
Which assignment minimizes the total cost?
Thus, the assignment problem can be formulated as the following binary program:
n X
X n
min cij xij
i=1 j=1
n
X
s. t. xij = 1, i ∈ {1, . . . , n},
j=1
Xn
xij = 1, j ∈ {1, . . . , n},
i=1
xij ∈ {0, 1}, i, j ∈ {1, . . . , n}.
7
CHAPTER 2. MODELING I: INTEGER PROGRAMS
Constraints: Define (
1 if i ∈ Sj ,
aij =
0 otherwise.
(We may construct a 0-1 incident matrix A = (aij ). This is nothing but pro-
cessing of the data.) With this notation we can describe the covering constraint (at
least one fire station must serve neighborhood i) as follows.
n
X
aij xj ≥ 1, for each i ∈ {1, . . . , m}.
j=1
8
CHAPTER 2. MODELING I: INTEGER PROGRAMS
Therefore, the set covering problem (which we had formulated as a combinatorial problem
before) can be likewise formulated as a BIP:
n
X
min cj x j
j=1
Xn
s. t. aij xj ≥ 1, i ∈ {1, . . . , m},
j=1
xj ∈ {0, 1}, for all j ∈ {1, . . . , n}.
Remark 2.3. It may be the case that cij ̸= cji . If cij = cji for all i, j ∈ {1, ..., n}, then
we speak of the Symmetric Traveling salesman problem (STSP).
TSP can be stated in different forms: A truck driver has a list of clients he must
visit on a given day, or a machine must place modules on printed circuit boards, or a
stacker crane must pick up and depose crates.
9
CHAPTER 2. MODELING I: INTEGER PROGRAMS
The two types of constraints above are related to the constraints of the assignment
problem. A solution to the assignment problem (with additional condition xii = 0)
might give a solution of the form shown as in Figure 2.2 (i. e., a set of disconnected
sub-tours).
n X
X n
min cij xij
i=1 j=1
X
s.t. xij = 1, i = 1, ..., n,
j:j̸=i
X
xij = 1, j = 1, ..., n,
i:i̸=j
XX
xij ≥ 1, ∀S ⊊ N, S ̸= ∅,
i∈S j ∈S
/
xij ∈ {0, 1} for all i, j = 1, ..., n.
10
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Or
n X
X n
min cij xij
i=1 j=1
X
s.t. xij = 1, i = 1, ..., n,
j:j̸=i
X
xij = 1, j = 1, ..., n,
i:i̸=j
X X
xij ≤ |S| − 1 for S ⊆ N, 2 ≤ |S| ≤ n − 1,
i∈S j∈S,j̸=i
xij ∈ {0, 1} for all i, j = 1, ..., n.
We can solve this relatively small problem by using brute-force, i. e. consider all sixteen
binary vectors (x1 , x2 , x3 , x4 ) ∈ {0, 1}4 , determine which of these vectors satisfy the con-
straints, calculate the objective value for all feasible vectors and choose the one with
maximal objective value. The set of feasible solutions is
1 1 0 0 0 0 0
0 , 0 , 1 , 0 , 0 , 0 , 0 .
0 0 0 1 1 0 0
1 0 1 1 0 1 0
It is not difficult to check that (0, 1, 0, 1)⊤ gives the optimal solution. The associated
optimal objective value is 15 000.
11
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
3.1.1 BIP The idea described in the above example can be generalised to general BIP
problems. If we have n binary design variables then there are 2n possible solutions. For
each of these we check if all constraints are satisfied, eliminate those which do not satisfy
all constraints, and determine the objective value of the remaining ones. Note that the
time needed to find a solution following this strategy grows exponentially in n.
3.1.3 The Knapsack and Covering Problems Pn In both cases the number of subsets
n
is 2 . For the knapsack problem with b = j=1 aj /2, at least half of the subsets are
feasible, and thus there are at least 2n−1 feasible solutions to compare.
3.1.4 The Traveling Salesman Problem Starting at city 1, the salesman has n − 1
choices. For the next choice n − 2 cities are possible, and so on. Thus, there are (n − 1)!
feasible tours.
The conclusion drawn from the above classes of examples is that it is only sensible to use
brute-force for solving such problems for very small values of n, as in all instances the
problem size grows exponentially with n.
Remark 3.2. Every programming problem whose set of feasible solutions is of finite
cardinality can be formulated as a combinatorial optimisation problem. Having the brute-
force approach in mind one can consider the following independence system (X, F), where
X is the set of all feasible solutions and where F = {Y ⊆ X : |Y | ≤ 1}.
Let us consider BIPs: Assume that m, n ∈ N and A ∈ Rm×n . Further, let c, x ∈ Rn and
b ∈ Rm .
(BIP ) max c⊤ x
s. t. Ax ≤ b
x ∈ {0, 1}n
can be formulated as a COP in the following way. For the seed set X we can take
X = {x ∈ {0, 1}n | Ax ≤ b}. Further, we let F = {Y ⊆ X : |Y | ≤ 1} as above and
w(x) = c⊤ x for x ∈ X. Then the above BIP is the COP associated with the independence
system (X, F) with weight function given by w. Note that using this independence system
is not very convenient and indeed corresponds to the brute-force approach.
12
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Example 3.3. We consider the following (IP) and its so-called relaxed (LP).
( 20
7 , 3)
T
(2, 1)T
(a) LP (b) IP
Figure 3.1: Graphical solution of the (IP) and its associated relaxed (LP) from Exam-
ple 3.3.
13
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Let us now turn to (IP). We can proceed in a similar manner. But notice that the feasible
region for (LP) is a polyhedron, whereas the feasible region of (IP) is a discrete set of
lattice points. Thus, we need to restrict to integer lattice points:
1.) First we determine the feasible region by intersecting Z2≥0 with the simplex which
we determined for (LP). We end up with a finite set of points which are visualised
by black balls in Figure 3.1b.
2.) As before, we choose an arbitrary point in the feasible region (i. e. from the finitely
many points), evaluate its objective value and plot the associated contour line of the
objective function. In Figure 3.1b we again chose the point (0, 0)⊤ with objective
value 0. The associated contour line is plotted in green.
3.) Again we shift the green line as far as we can but only consider those possibilities
which pass through feasible points. We conclude from Figure 3.1b that the optimal
solution to (IP) is (2, 1)⊤ with objective value 7.
(a) Rounding leaves the feasible (b) Rounded solution is far away
region. from the optimal integer solution.
2.) Even if one obtains a feasible solution by rounding up or down, the optimal objective
value of the (IP) can be arbitrarily far away from the rounded optimal objective
value of the (LP). This is visualized by the following example in R2 that is displayed
in Figure 3.2b.
14
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
max x1 + 0.64x2
s. t. 50x1 + 31x2 ≤ 250,
3x1 − 2x2 ≥ −4,
x1 , x2 ≥ 0 and integer.
The optimal LP solution (376/193, 950/193)⊤ is a long way from the optimal integer
solution (5, 0)⊤ , as we see in Figure 3.2b.
(376/193, 950/193)
(5, 0)
0
1 2 3 4 5
3.) Especially when considering boolean integer problems rounding will not be helpful.
15
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Suppose that item n fits into the knapsack, i. e. that an ≤ b. If we pack item n, then the
remaining space which is left in the knapsack is b − an and we have one less item to choose
from when wanting to pack more into the knapsack. This leads to solving the smaller
problem of maximizing the value of a knapsack of volume b − an when choosing from the
n − 1 items 1, . . . , n − 1. Let us call this problem A and its optimal objective value x∗A .
We shall also solve the smaller problem B, which corresponds to the case that item n is
not packed into the knapsack. So, here, we want to maximize the value of a knapsack
of volume b by choosing from the n − 1 items 1, . . . , n − 1. Let us denote the optimal
objective value of B by x∗B .
If x∗A +cn > x∗B then we know that we should pack item n. If x∗A +cn = x∗B then there is an
optimal solution with and without item n. If x∗A +cn < x∗B then we should not pack item n.
The above-described approach is known as a divide and conquer approach: One divides
the problem into smaller subproblems, conquers the subproblems recursively and combines
the solutions to give a solution to the original problem.
Dynamic programming is a related technique, which is more efficient when solving
problems with overlapping subproblems. In dynamic programming one starts with solving
the smallest subproblems and then recursively finds solutions for the bigger problems.
The benefit of DP is that each subproblem is solved only once. In DP the result of each
subproblem is stored in a table (generally implemented as an array or a hash table) for
future references. These subsolutions may be used to obtain the original solution and the
technique of storing the subproblem solutions is known as memorization.
3.4.1 0-1 Knapsack Problems Let n ∈ N and aj , b ∈ Z≥0 for j ∈ {1, . . . , n}. Consider
the 0 − 1 knapsack problem
n
X
(P ) max cj x j
j=1
Xn
s. t. aj xj ≤ b,
j=1
x ∈ {0, 1}n .
Then z = fn (b) gives us the optimal value of the knapsack problem, and furthermore, all
16
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
the problems (Pr (λ)) are all knapsack problems of the same type but of smaller size.
To arrive at a recursive relationship between the values fr (λ), we let x∗ (λ) = (x∗1 (λ), . . . , x∗r (λ))⊤
denote an optimal solution of (Pr (λ)) and distinguish two cases:
• If x∗r (λ) = 0, then the choice of the remaining variables x∗1 (λ), . . . , x∗r−1 (λ) must be
optimal for the problem (Pr−1 (λ)). In other words, we then have
• If x∗r (λ) = 1, then necessarily λ ≥ ar and the choice of the remaining variables
x∗1 (λ), . . . , x∗r−1 (λ) must be optimal for (Pr−1 (λ − ar )), and hence,
fr (λ) = cr + fr−1 (λ − ar ).
Note that if λ < ar then necessarily x∗r (λ) = 0 and whence fr (λ) = fr−1 (λ). However, if
λ ≥ ar then both x∗r (λ) = 0 and x∗r (λ) = 1 are possible. In this case, we can proceed as
follows. Since we are looking for the maximum objective value, we just compare the two
function values above and pick the one that produces the larger value! The larger value
is the value for fr (λ).
Thus, we need to solve the two subproblems P3 (7) and P3 (2), which have a number of
subproblems themselves. We solve these recursively by using (3.1):
17
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
f4 (7)
24
f3 (7) f3 (2)
25
We can determine the entries in the tree in the following way. It is not difficult to
determine the entries in the bottom row. For each entry in the second to last row we
consider its two children. We add the value on the edge to the value in the node of each
child (no value on the edge means 0), and compare which of the two children gives a larger
sum. The larger value is the one that we insert into the parental node.
34
24
32 10
25
17 7 10
7 7 7
10 10 0 0 10 0
The outcome of this procedure is shown in Figure 3.5. We can conclude that the optimal
objective value is 34. How can we determine the associated optimal solution? For this we
18
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
need to recall which route we have taken to arrive at the optimal objective value of 34.
In Figure 3.5 this route is displayed in bold. Starting from the root of the tree, we see
that along the bold path we first take a labeled edge. This means x∗4 = 1. Then we take
an unlabeled edge, implying x∗3 = 0. The next edge likewise is unlabeled, yielding x∗2 = 0.
For determining x∗1 we check whether the entry in the bottom node of the bold path is
zero or non-zero. As it is non-zero in our case, we conclude that x∗1 = 1.
Thus an optimal solution is x∗ = (1, 0, 0, 1)⊤ with objective value 34.
In the above example we have considered and solved the subproblem f1 (1) twice. In this
simple example this does not matter too much. However, when considering problems
with many more variables this phenomenon can occur often and should be avoided. Note
also, that with n items the tree can have more than 2n−1 − 1 nodes in the bottom row
(compare with Section 3.1.3), so the tree size, and hence the computational complexity
of the algorithm, grows exponentially in n.
We now turn to a dynamic programming , which - as mentioned before - avoids repeatedly
solving the same subproblems.
Dynamic Programming
fr (0) = 0, r ∈ {1, . . . , n}
f0 (λ) = 0, λ ∈ {0, . . . , b}.
19
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
• If pn (b) = 0, we set x∗n = 0 and continue by checking the value pn−1 (b).
• If pn (b) = 1, we set x∗n = 1 and then continue by checking the value pn−1 (b − an ).
Therefore, the DP algorithm for 0-1 knapsack problem can be stated as follows:
Algorithm 3.6 (DP for 0-1 Knapsack with integer coefficients (maximization))
Counting the number of calculations required to arrive at fn (b) we see that for each
calculation fr (λ) for λ ∈ {0, 1, . . . , b} and r ∈ {1, . . . , n} there are a constant number of
additions, subtractions, and comparisons. Calculating the optimal solution requires at
most the same amount of work. Thus, the DP algorithm for 0-1 knapsack problems is
O(nb), where O denotes the big-Landau notation, i.e. f (n) = O(nb) if and only if there
exist constants c, C such that c ≤ fbn (n)
≤ C for all n ∈ N.
20
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Example 3.7. Let us apply the above algorithm to the 0 − 1 knapsack problem
f1 f2 f3 f4 p1 p2 p3 p4
λ=0 0 0 0 0 0 0 0 0
1 0 7 7 7 0 1 0 0
2 10 10 10 10 1 0 0 0
3 10 17 17 17 1 1 0 0
4 10 17 17 17 1 1 0 0
5 10 17 17 24 1 1 0 1
6 10 17 25 31 1 1 1 1
7 10 17 32 34 1 1 1 1
Example 3.8. Solve the following 0-1 knapsack problem by dynamic programming
21
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
Clearly, our original knapsack problem is (Pn (b)), so that its optimal value is given by
gn (b).
The following boundary (starting) values are obvious,
If x∗ (λ) is an optimal solution to Pr (λ) giving value gr (λ), then we consider the value of
x∗r (λ), which can be any integer value of the following
λ
0, . . . , .
ar
If x∗r (λ) = t then necessarily tar ≤ λ. Using the principle of optimality, we have that
22
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
This is the recursion to use for the divide and conquer approach, that we introduced for
0-1 knapsack problems. Note that in the setting of integer knapsack problems the tree
might
j k be much larger than in the 0-1 knapsack setting: A node with entry fr (λ) has
λ
ar
+ 1 children. When considering a dynamical programming approach starting with
j k
the above recursion note the following. As aλr = b in the worst case, the above recursion
gives an algorithm of complexity O(nb2 ). Can we do better?
Observe the following.
• If x∗r = 0, then the vector (x∗1 , . . . , x∗r−1 )⊤ must be optimal for the problem (Pr−1 (λ)),
so that gr (λ) = gr−1 (λ).
• If x∗r ≥ 1, then necessarily ar ≤ x∗r ar ≤ λ and the vector (x∗1 , . . . , x∗r−1 , x∗r − 1)⊤ must
be optimal for (Pr (λ − ar )), so that gr (λ) = cr + gr (λ − ar ).
Therefore, we arrive at the recursion
(
gr−1 (λ) if ar > λ
gr (λ) = .
max {gr−1 (λ), cr + gr (λ − ar )} if ar ≤ λ
23
CHAPTER 3. FIRST METHODS FOR SOLVING IPS AND BIPS
This now gives an algorithm of complexity O(nb), which is the same as that of the 0-1
knapsack problem.
Applying Algorithm 3.9, we find the following table of values for gr (λ) and pr (λ) :
g1 g2 g3 g4 p1 p2 p3 p4
λ=0 0 0 0 0 0 0 0 0
1 0 0 2 2 0 0 1 0
2 0 0 4 4 0 0 1 0
3 7 7 7 7 1 0 0 0
4 7 9 9 9 1 1 0/1 0
5 7 9 11 11 1 1 1 0
6 14 14 14 14 1 0 0 0
7 14 16 16 16 1 1 0/1 0
8 14 18 18 18 1 1 0/1 0
9 21 21 21 21 1 0 0 0
10 21 23 23 23 1 1 0/1 0
Back-tracking:
• p4 (10) = 0, so x∗4 = 0.
• p3 (10) = 0, so x∗3 = 0.
• p2 (10) = 1, so x∗2 ≥ 1.
• p2 (10 − a2 ) = p2 (6) = 0, so x∗2 ̸≥ 2, and hence, x∗2 = 1.
24
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
• p1 (6) = 1, so x∗1 ≥ 1.
• p1 (6 − a1 ) = p1 (3) = 1, so x∗1 ≥ 2.
• p1 (3 − a1 ) = p1 (0) = 0, so x∗1 ̸≥ 3, and hence, x∗1 = 2.
We have found that x∗ = (2, 1, 0, 0)⊤ is an optimal solution.
Example 3.11. Solve the following integer programming problem by dynamic program-
ming
25
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
• yj = 1 if and only if there exists i ∈ M such that xij > 0. Instead of putting
this equivalence relation into a constraint, it suffices to require that yj ≥ xij
for all i ∈ {1, . . . , m}, since we are minimizing over yi (hence if xij = 0 for all
i ∈ {1, . . . , m} then although yj = 1 satisfies the constraint yj ≥ xij for all
i ∈ {1, . . . , m}, yj = 1 will not belong to an optimal solution unless fj = 0).
With the same arguments we see that indeed it suffices to require
m
X
xij ≤ myj , for j ∈ {1, . . . , n}.
i=1
Notice, that yj , xij for j ∈ {1, . . . , n}, i ∈ {1, . . . , m} are the decision variables for this
problem. The xij are allowed to take any value in [0, 1] while yj are binary. Thus, this
problem is an example of a mixed integer program.
26
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
xt
S t-1 St
dt
27
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
4.3 Alternatives
4.3.1 Discrete Alternatives (disjunctions)
Example 4.1. Suppose that two jobs must be processed on the same machine and cannot
be processed simultaneously. If pi (i = 1, 2) are the processing times, and the variables ti
the start times for i = 1, 2, then either job 1 precedes job 2 and so t2 ≥ t1 + p1 , or job 2
comes first and t1 ≥ t2 + p2 . Note that these two conditions are mutually exclusive, i. e.
exactly one of them is true, if pi > 0.
The above example demonstrates that in scheduling and other applications, problems of
the following type occur:
minx∈Rn c⊤ x
s.t. 0 ≤ x ≤ u and
either w⊤ x ≤ b1 (4.1)
or v ⊤ x ≤ b2 . (4.2)
Here, ’either or’ is to be understood as follows. We impose exactly one of the conditions
(4.1) and (4.2). If we impose Condition (4.1) then we do not impose (4.2) and vice
versa. In other words we require that at least one of the conditions (4.1) and (4.2) must
hold. (Notice the difference between ’a condition is imposed’ and ’a condition holds’: If a
condition is imposed, then it must hold. If a condition is not imposed it may still hold.)
c, u, w, v are given vectors and b1 , b2 two real numbers. (In Example 4.1, x = (t1 , t2 )⊤ ,
w⊤ = (1, −1), b = (−p1 , −p2 )⊤ , v ⊤ = (−1, 1).)
Remark 4.2. Note that we can alternatively state the above problem in the following
way:
minx∈Rn c⊤ x
s.t. x ∈ P1 ∪ P2 ,
where
P1 = {x ∈ Rn | w⊤ x ≤ b1 , 0 ≤ x ≤ u}
P2 = {x ∈ Rn | v ⊤ x ≤ b2 , 0 ≤ x ≤ u}.
28
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
Recall that if a condition is not imposed, it may still hold, but when it is imposed,
it must hold. Since exactly one of the two conditions must be imposed, we have
y1 + y2 = 1.
• Let
M ≥ max w⊤ x − b1 ; v ⊤ x − b2 .
x∈[0,u]
4.3.2 More General Alternative Constraints Consider a situation with the alter-
native constraints:
f1 (x1 , . . . , xn ) ≤ b1 , f2 (x1 , . . . , xn ) ≤ b2 ,
where x = (x1 , ..., xn ) ∈ C, with a bounded region C in Rn and f1 (x1 , . . . , xn ), f2 (x1 , . . . , xn ),
b1 , b2 ∈ R. In this situation, at least one (but not necessarily both) of these constraints
must be satisfied. By introducing certain binary variables, we have seen that this restric-
tion can be modelled as follows:
f1 (x1 , . . . , xn ) − b1 ≤ M y1 ,
f2 (x1 , . . . , xn ) − b2 ≤ M y2 ,
y1 + y2 = 1,
y1 , y2 are binary.
The constant M is chosen as
M ≥ max{f1 (x1 , . . . , xn ) − b1 , f2 (x1 , . . . , xn ) − b2 }.
x∈C
29
CHAPTER 4. MODELING II: MIXED INTEGER PROGRAMS
Conditional Constraints
k-Fold Alternatives
30
CHAPTER 5. MODELING III: FORMULATIONS
P = conv(x1 , ..., xk ).
31
CHAPTER 5. MODELING III: FORMULATIONS
(M IP ) maxx∈Rn ,y∈Rp c⊤ x + w⊤ y
s. t. Ax + By ≤ b
x ∈ Zn ,
where A and B are matrices, and c, w are two vectors. We can represent the feasible set
by
n x o
n+p n
S= ∈R Ax + By ≤ b, x ∈ Z .
y
If we drop the constraint x ∈ Zn , the set of points that satisfy the remaining constraints
is a polyhedron:
n x o
P = ∈ Rn+p Ax + By ≤ b .
y
Clearly, we have
S = P ∩ (Zn × Rp ),
which shows that P is a formulation for S.
32
CHAPTER 5. MODELING III: FORMULATIONS
max x + y
s.t. −3x + 2y ≤ 2
x+y ≤5
1≤x≤3
1≤y≤3
x, y ∈ Z
Clearly, y ≤ 3 is actually a superfluous constraint. Notice that we obtain the same feasible
set if we add the constraint x − y ≥ −1. So we get the convex hull of all feasible solutions
for the IP, which is described by the following system:
−3x + 2y ≤ 2
x+y ≤5
1≤x≤3
1≤y≤3
−x + y ≤ 1
x, y ∈ Z.
Example 5.7 (Equivalent formulations for a 0-1 knapsack set). Consider the set of points
1 0 0 0 0 0 0
0 1 1 0 0 0
, , , , , , 0 .
X= 0 0 0 1 1 0 0
0 1 0 1 0 1 0
It is not difficult to check that the three polyhedra below are formulations for X.
33
CHAPTER 5. MODELING III: FORMULATIONS
Example 5.8 (Traveling Salesman). Recall the binary programming model of the trav-
eling salesman problem:
n X
X n
minx cij xij
i=1 j=1
X
s. t. xij = 1, i = 1, ..., n,
j:j̸=i
X
xij = 1, j = 1, ..., n,
i:i̸=j
XX
xij ≥ 1, ∀S ⊊ N, S ̸= ∅,
i∈S j ∈S
/
xij ∈ {0, 1} for all i, j = 1, ..., n.
The cut-set constraints
XX
xij ≥ 1, ∀S ⊊ N, S ̸= ∅
i∈S j ∈S
/
34
CHAPTER 5. MODELING III: FORMULATIONS
Figure 5.2: Ideal formulation for the set X from Example 5.5.
for all formulations P , whenever x is of finite cardinality, this suggests the following
definition.
Definition 5.9. (i) Given a set X ⊆ Zn , and two formulations P1 and P2 for X, P1 is
a better formulation than P2 if P1 ⊆ P2 .
(ii) If X ⊆ Zn is of finite cardinality, the formulations with the property P = conv(X)
are called ideal formulations.
max{c⊤ x | x ∈ X}
max{c⊤ x | x ∈ P }
Thus, using an ideal formulation, an integer programming instance can become easy to
solve! However, often, this is only a theoretical solution, because in most cases there
is such an enormous (exponential) number of inequalities needed to describe conv(X),
and there is no simple characterization for them. Further, it might be very tedious to
determine conv(X).
35
CHAPTER 5. MODELING III: FORMULATIONS
Example 5.11 (Example 5.7 ctd.). It can be checked that P3 = conv(X) and thus, P3
is an ideal formulation.
The algorithms we will develop later in the course are based on approaches that can be
seen as attempting to successively improve the formulation until it is good enough to be
solved to optimality.
36
II
Optimality, Bounds and
Relaxation
Suppose that we have guessed a feasible solution to an (IP) or (MIP). How can we find
out how good it is? Further, if we are convinced that our solution is good, how can we
determine and prove that it is optimal?
c⊤ x∗ ≥ 7.
We can solve (LP) by the Simplex method which yields the optimal solution (20/7, 3)⊤
with optimal objective value 59/7. This gives an upper boud for the objective value of
37
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
( 20
7 , 3)
T
(2, 1)T
(a) LP (b) IP
Figure 6.1: Graphical solution of the (IP) and its associated relaxed (LP) from Exam-
ple 3.3, see Figure 3.1.
(IP), namely
59
c⊤ x ∗ ≤
.
7
This upper bound can even be improved, since we know that the objective value associated
with an integer solution (when the cost-vector c is integer-valued) has to be integer.
Therefore, we can round down and obtain
⊤ ∗ 59
c x ≤ = 8.
7
Thus, we conclude that c⊤ x∗ ∈ {7, 8} for any optimal solution x∗ of (IP). We do not yet
know if x is optimal but we know that it is not far off from being so (its objective value
is at most 1 less than the optimal one).
Let us address the following question for general IPs: How to prove that a given point x∗ is
optimal? Put differently, we are looking for some optimality conditions that will provide
stopping criteria in an algorithm for IPs. Note that the general strategy for estimating
the quality of a solution of an IP is as demonstrated in Example 6.1.
Consider the problem
(IP ) max{c⊤ x | x ∈ X = P ∩ Zn }
38
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
Thus, we need to find ways of deriving such upper and lower bounds.
Definition 6.2 (Primal and dual bounds). A primal bound for a maximization problem
(IP) is a lower bound z ≤ z ∗ . A dual bound is an upper bound z ∗ ≤ z̄.
In the above example we have determined an upper and a lower bound for the objective
value and from these bounds concluded how good the guessed feasible solution is. More-
over, we need to investigate how to find a feasible solution. This is in particular crucial
to arrive at a lower bound. We will look into this for general IPs below, and also estimate
the quality of a given feasible solution of a general IP.
Example 6.3.
z ∗ = max x1 − 2x2
s. t. 2x1 + x2 ≤ 4,
x1 , x2 ≥ 0 and integer.
Clearly, (1, 1)⊤ is a feasible solution, hence −1 is a lower bound on z ∗ ; and (2, 0)⊤ is also
a feasible solution, and hence 2 is a lower bound on z ∗ . It is better than (1, 1)⊤ in the
sense that it provides a better lower bound.
In general, finding a feasible solution is not always as easy as in Example 6.3. Heuristics
are typically used to overcome this problem. One important such heuristic is the greedy
heuristic. Greedy approaches construct a solution from scratch, and at each step choose
the item which brings the ’best’ immediate result. In other words, the idea of a greedy
algorithm is to take the best element and run. It is very shortsighted. It just chooses
one after the other whichever element gives the maximum profit and still gives a feasible
solution.
39
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
This is a 0-1 knapsack problem for which a feasible solution can be found using the greedy
heuristic: Notice that
c2 8 c3 17 c1 5
= > = > = .
a2 3 a3 7 a1 4
This means that relative to the cost (i. e. constraint restriction) the second variable gives
the largest contribution to the objective value. As the bound on the right-hand side of
the constraint is 9, we may set x2 = 1.
After this, the residual of the right-hand-side is 9 − a2 = 6. So we have to set x3 = 0,
and then we may set x1 = 1. Therefore, by greedy heuristic, we obtain the feasible point
(1, 1, 0)⊤ at which the objective value is 5 + 8 = 13.
Example 6.5. Find a lower bound for the optimal objective value of the problem below
by using the greedy heuristic method
z ∗ = max 5.5x1 + 8.2x2 + 13.4x3 + 21.2x4
s.t. 3x1 + 3x2 + 7.1x3 + 10.8x4 ≤ 30.9
xi ≥ 0 and integer, i ∈ {1, . . . , 4}.
40
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
The next result shows that if a relaxation is tight enough it can either provide a certificate
of infeasibility or optimality for the original problem:
Proposition 6.8. Let X ⊆ Zn and T ⊆ Rn . Let
(RP ) z R = max{f (x) | x ∈ T }
be a relaxation of the following IP
(IP ) z ∗ = max{c⊤ x | x ∈ X}.
Then the following hold.
(i) If x∗ is an optimal solution of (RP) for which x∗ ∈ X and f (x∗ ) = c⊤ x∗ , then x∗ is
an optimal solution for (IP).
(ii) If (RP) is infeasible, then (IP) is infeasible.
Proof. (i) Any feasible solution z ∈ X of (IP ) is a feasible solution of (RP). Thus, as x∗
is an optimal solution of (RP), we know that f (z) ≤ f (x∗ ). As (RP) is a relaxation
of (IP), we know that c⊤ z ≤ f (z). Moreover, by assumption, f (x∗ ) = c⊤ x∗ and
x∗ ∈ X. Hence, c⊤ z ≤ f (z) ≤ f (x∗ ) = c⊤ x∗ for any z ∈ X and whence x∗ is
optimal for (IP ).
(ii) Clear.
41
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
(IP ) max{c⊤ x | x ∈ P ∩ Zn }
z ∗ ≤ z̄,
where z̄ is the optimal objective value of the LP relaxation (LP) and z ∗ the corresponding
value for the original problem (IP). Not using Proposition 6.7, this is intuitively clear,
since dropping the integrality constraints enlarges the feasible set.
The LP can be solved by the simplex method or interior-point method.
We already saw that under an ideal formulation the two values z̄ and z ∗ coincide. For
other formulations this clearly is not the case. (Recall the warning from Section 3.3 that
also rounding does not help here: 1.) It can happen that the rounded solution is infeasible.
2.) Even if one obtains a feasible solution by rounding up or down, the optimal objective
value of the (IP) can be arbitrarily far away from the rounded optimal objective value of
the (LP). 3.) Especially when considering boolean integer problems rounding will not be
helpful.)
The next result shows that better formulations of integer programming problems give
tighter dual bounds.
(IP ) max{c⊤ x | x ∈ X ⊆ Zn }.
be the optimal objective values of the LP relaxations corresponding to these two formula-
tions. Then
z1LP ≤ z2LP ,
i. e. formulation P1 produces a tighter (dual) bound.
Proof. This is immediate as P1 ⊆ P2 and the maximum taken over a larger set is bigger.
42
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
We have seen in Section 3.4 that this relaxed problem can be well solved by dynamic
programming.
Example 6.12 (The traveling salesman problem).
n X
X n
minx cij xij
i=1 j=1
X
s. t. xij = 1, i ∈ {1, . . . , n},
j:j̸=i
X
xij = 1, j ∈ {1, . . . , n},
i:i̸=j
XX
xij ≥ 1, ∀S ⊊ N, S ̸= ∅,
i∈S j ∈S
/
xij ∈ {0, 1} for all i, j ∈ {1, . . . , n}.
Notice that the salesman tours are precisely the assignments containing no subtours.
Thus, if we remove the cut-set constraints the feasible set becomes larger, and we arrive
at the following assignment problem
Xn X n
minx cij xij
i=1 j=1
X
s. t. xij = 1, i ∈ {1, . . . , n},
j:j̸=i
X
xij = 1, j ∈ {1, . . . , n},
i:i̸=j
xij ∈ {0, 1} for all i, j ∈ {1, . . . , n}.
which can be solved efficiently.
43
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
Example 6.13 (The quadratic 0-1 problem). Many combinatorial optimization problems
can be formulated as the maximization of a 0–1 quadratic function subject to linear
constraints.
( n
)
X X
max qij xi xj − pj xj : x ̸= 0, x ∈ {0, 1}n .
i,j:1≤i<j≤n i=1
6.3.3 Duality Relaxation For LP, duality provides a standard way to obtain upper
bounds. It is therefore natural to ask the question: Is it possible to find duals for integer
programs? The important property of a dual is that the value of any feasible solution
provides an upper bound on the objective value. Note that an LP or combinatorial
relaxation first needs to be solved to optimality in order to be certain to have found an
upper bound for the IP.
Definition 6.14. The two problems
(IP ) z ∗ = max{c⊤ x | x ∈ X ⊆ Zn },
(D) s∗ = min{w(u) | u ∈ U ⊆ Rm }
form a weak dual pair if
c⊤ x ≤ w(u) for all x ∈ X, u ∈ U.
In this case (D) is called a weak dual of (IP). If z ∗ = s∗ , then (IP) and (D) are said to
form a strong dual pair and (D) is called a strong dual of (IP).
Note that for a given weak dual pair, strong duality does not always hold.
Dual problems can sometimes allow us to prove optimality, and provide a certificate of
infeasibility for the original problem.
Proposition 6.15. Let (D) be a weak dual of (IP).
(i) If x∗ ∈ X and u∗ ∈ U are such that c⊤ x∗ = w(u∗ ). Then x∗ is optimal for (IP),
and u∗ is optimal for (D).
(ii) If (D) is unbounded (i. e., it has ”optimal” value −∞), then (IP) is infeasible.
Example 6.16 (IP and the dual of its LP relaxation form a weak dual pair). Consider
an (IP) and its LP relaxation (LP):
(IP ) max{c⊤ x | Ax ≤ b, x ∈ Zn+ }
(LP ) max{c⊤ x | Ax ≤ b, x ≥ 0}.
44
CHAPTER 6. BOUNDS AND OPTIMALITY CRITERIA
From the weak duality theorem for linear programs, see Theorem ?? it follows that (IP)
and (D) form a weak dual pair.
Example 6.17 (Lagrangian relaxation). Consider the following IP
(IP ) max{c⊤ x | Ax ≤ b, x ∈ X ⊆ Zn }.
Let
z(u) = max{c⊤ x + u⊤ (b − Ax) | x ∈ X}. (6.1)
Then the following problem
(LD) min{z(u) | u ≥ 0}
is a weak dual of (IP). In fact, for any feasible point x of (IP), and any feasible point u
of (LD), we have
c⊤ x ≤ c⊤ x + u⊤ (b − Ax) ≤ z(u).
The first inequality above follows from the fact u⊤ (b − Ax) ≥ 0 and the second inequality
follows from (6.1).
Such a duality is called Lagrangian duality which will be dealt with in much greater
detail in later lectures, see Chapter 10.
Example 6.18 (Matching and Covering form a weak dual pair). Given an undirected
graph G = (V, E) where V is the set of nodes and E the set of edges,
• a matching M ⊆ E is a set of disjoint edges in E (in the sense that no two edges
share a common endpoint);
• a covering by nodes of G is a subset R ⊆ V of nodes such that every edge in E
has at least one endpoint in R.
Now consider the problem of finding a maximum cardinality matching
(M ) max{|M | | M is a matching},
M ⊆E
Then (M) and (C) form a weak dual pair, which can be seen as follows.
If M is a matching, say
M = {(i1 , j1 ), . . . , (ik , jk )},
then the 2k nodes
{i1 , j1 , . . . , ik , jk }
are distinct, and any covering by nodes R must contain at least one node from each pair
{is , js } for s ∈ {1, . . . , k}. Therefore,
|R| ≥ k = |M |.
45
CHAPTER 7. TOTAL UNIMODULARITY (TU)
Questions:
• When will the LP relaxation have an optimal solution that is integral ?
• Does such a class of well-solved integer programs exist?
Note that if the LP relaxation has an optimal solution that is integral, then Proposition 6.8
yields that IP has an optimal solution and that these two coincide.
From LP theory, we know that basic feasible solutions take the form
∗ −1
∗ xB B b
x = =
x∗N 0
where B is an m × m nonsingular sub-matrix of [A, I] and I is an m × m identity matrix.
Theorem 7.1. Suppose A and b are integral. If the optimal basis B has
|det(B)| = 1,
then the LP relaxation has an integral optimal solution, and therefore the LP relaxation
solves IP.
46
CHAPTER 7. TOTAL UNIMODULARITY (TU)
A particularly lucky situation occurs when |det(B)| = 1 for all submatrices of [A, I]
(corresponding to any set of basic variables) as then for any possible basis B we know
that |det(B)| = 1. The following definition guarantees that this is the case:
Proposition 7.3. All elements of a TU matrix A must be {0, 1, −1}, i. e. aij must be
1, −1 or 0 for any i, j.
Remark 7.5. Clearly, total unimodularity is merely a sufficient criterion for when the
integer program
(IP ) max{c⊤ x | Ax ≤ b, x ∈ Zn+ }
is solved by its LP relaxation
max{c⊤ x | Ax ≤ b, x ≥ 0}.
The following result shows that in some case the converse is also true.
47
CHAPTER 7. TOTAL UNIMODULARITY (TU)
Theorem 7.6. (i) If A is TU, then (LP) solves (IP) for all integer vectors b for which
it has a finite optimal value.
(ii) If (LP) solves (IP) for all integer vectors b for which it has a finite optimal value,
then A is TU.
Proof. Omitted.
When A is TU, strong duality holds and an ideal formulation can be obtained explicitly,
as the following result shows.
Let (D) be the linear programming dual of the LP relaxation (LP) of (IP), i. e.
(LP ) z LP = max{c⊤ x | Ax ≤ b, x ≥ 0}
(D) w∗ = min{b⊤ y | A⊤ y ≥ c⊤ , y ≥ 0}.
Proof. (i) From the strong duality theorem from LP theory and the relaxation rela-
tionship, we know that w∗ = z LP ≥ z ∗ , and by total unimodularity we infer from
Theorem 7.6 that z LP = z ∗ . Thus, w∗ = z ∗ .
(ii) Every extreme point of P is a basic feasible solution of (LP), and hence every
extreme point (vertex) of P can be represented as x = (B −1 b, 0) for some nonsingular
submatrix B of (A, I).
Since A is TU and b ∈ Zm , Theorem 7.6 yields that all extreme points of P are
integral, and hence they all lie in the feasible set X = P ∩ Zn of (IP). Thus P ⊆
conv(X).
On the other hand, it follows from X ⊆ P that conv(X) ⊆ P . Thus, P = conv(X),
as desired.
(i) certain sufficient criteria that can easily be checked and allow us to identify some
important families of TU matrices, or
48
CHAPTER 7. TOTAL UNIMODULARITY (TU)
(ii) rules by which small TU matrices can be assembled into larger ones, and conversely,
we also need the rules about how to decompose a matrix into smaller parts for which
the TU property can be easily verified.
Theorem 7.8. An m × n matrix A is TU if and only if any of the following matrices are
TU,
(i)A⊤
(ii)[A, I]
(iii)[A, − A]
(iv) P A or AQ, where P, Q are m × m, n × n permutation matrices, respectively
A J1 I 0
(v) , with Ji = Pi Qi and where I are identity matrices, 0 blocks of
J2 0 0 0
zeros, and Pi , Qi permutation matrices of appropriate size.
(vi) [A, ei ] where ei denotes a column of identity matrix.
Example 7.9. Verify that the matrix
1 0 0 −1
A = 1 1 −1 −1
0 0 1 0
is TU.
1 0
• Notice that is TU.
1 1
• By Theorem 7.8 (iii), the following is TU
1 0 −1 0
,
1 1 −1 −1
49
CHAPTER 7. TOTAL UNIMODULARITY (TU)
(iii) The set M of rows can be partitioned into (M1 , M2 ) such that each column j con-
taining two nonzero coefficients satisfies
X X
aij − aij = 0.
i∈M1 i∈M2
Proof. Assume that (i)–(iii) are satisfied but that A is not TU. Let B be a smallest square
submatrix of A such that det(B) ∈ / {0, +1, −1}.
Then all columns of B contain exactly two nonzero coefficients (why? If B contains a
column with a single nonzero entry, B would not be minimal)
Because of (iii), adding the rows of B with indices in M1 and subtracting the rows with
indices in M2 yields the zero vector, showing that the rows of B are linearly dependent
and det(B) = 0, which contradicts the choice of B.
Remark 7.11. Condition (iii) means that if the nonzero elements are in rows i and k,
and if aij = −akj , then {i, k} ∈ M1 or {i, k} ∈ M2 , whereas if aij = akj , then i ∈ M1 and
k ∈ M2 , or vice versa.
Example 7.12. (i) We can use Theorem 7.10 to verify that the following matrix is
TU:
1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0
0 0 1 0 0 1 0 0 1
.
1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 1 1 1
(i) and (ii) are apparent and (iii) holds with M1 = {1, 2, 3} and M2 = {4, 5, 6}.
(ii) Consider the LP-relaxation of the assignment problem
n X
X n
min cij xij
i=1 j=1
Xn
xij = 1 for j = 1, ..., n (7.1)
i=1
Xn
xij = 1 for i = 1, ..., n (7.2)
j=1
0 ≤ x ≤ 1. (7.3)
We can write the constraints (7.1) and (7.2) as Ax = e where e = (1, 1, ..., 1)⊤ , and
A is the node-edge incidence matrix. Then A is totally unimodular.
Corollary 7.13. Let A be any m × n matrix with entries taken from {0, +1, −1} with
the property that any column contains at most one +1 and at most one −1. Then A is
totally unimodular.
50
CHAPTER 7. TOTAL UNIMODULARITY (TU)
Proof. Assume that A contains exactly two nonzero entries per column. Then the fact
that A is TU for this case follows from the Theorem 7.10 with M1 = {1, . . . , m} and
M2 = ∅. For the general case, observe that a column with at most one nonzero from
{+1, −1} cannot destroy unimodularity, since we can develop the determinant of square
submatrices by that column.
Theorem 7.14 (A general sufficient condition). A matrix A is totally unimodular if
(i) aij ∈ {0, +1, −1} for all i, j, and
(ii) for any subset M of the rows of A, there exists a partition (M1 , M2 ) of M such that
each column j satisfies
X X
aij − aij ≤ 1.
i∈M1 i∈M2
Let
• xij be the flow along arc (i, j),
• V + (i) = {k | (i, k) ∈ E} denote the set of nodes to which there is an arc from i
(successor nodes),
• V − (i) = {k | (k, i) ∈ E} be the set of nodes from which there is an arc into i
(predecessor nodes).
51
CHAPTER 7. TOTAL UNIMODULARITY (TU)
min c⊤ x
s.t Ax = h, (7.5)
0 ≤ x ≤ b.
equally,
min c⊤ x
A h
s.t −A x ≤ −h , x ≥ 0. (7.6)
I b
E = {(1, 2), (1, 3), (2, 1), (2, 3), (3, 2)}
with capacity
b = (b12 , b13 , b21 , b23 , b32 )⊤ = (3, 2, 4, 3, 6)⊤ ,
transportation costs
52
CHAPTER 7. TOTAL UNIMODULARITY (TU)
Theorem 7.16. The constraint matrix arising in a minimum cost network flow problem
is totally unimodular.
It suffices therefore to show that A is TU. Each column of A has exactly two nonzero
entries, one +1, the other one −1, because each arc leaves one node as an outgoing arc, and
enters in another as an incoming arc. Therefore, the sufficient criterion of Theorem 7.10
is satisfied with M1 = M and M2 = ∅.
Corollary 7.17. In a minimum cost network flow problem, if the production rate vector
b and the capacity vector h are integral, then the following hold.
(i) Each extreme point of the feasible polyhedron is integral.
(ii) If there exists an optimal flow, then there exists an integral optimal flow.
(iii) The constraints of the problem describe the convex hull of the set of integral feasible
flows.
53
CHAPTER 7. TOTAL UNIMODULARITY (TU)
We can model the shortest path problem as a special case of the min-cost network flow
problem with additional integrality constraints:
X
(SP ) z = min cij xij
(i,j)∈E
X X
s.t. xsj − xjs = 1,
j∈V + (s) j∈V − (s)
X X
xtj − xjt = −1,
j∈V + (t) j∈V − (t)
X X
xij − xji = 0, i ∈ V \ {s, t},
j∈V + (i) j∈V − (i)
Because the constraint matrix is TU, and the data b is integral (binary), the LP relaxation
of (SP) is integral and hence solves the shortest path problem. Also we have the following
strong duality theorem for the shortest path problem.
Notice, replacing πj by πj + α for all j ∈ V does not change the dual, so we can set πs = 0
without loss of generality.
Example 7.19. One can use LP to find the length of a shortest path from node s to
node t in the directed graph displayed in Figure 7.2. Note that this LP problem has 16
variables and 8+16 constraints (not counting the non-negativity constraints).
54
CHAPTER 7. TOTAL UNIMODULARITY (TU)
max xts
X X
s.t. xij − xji = 0, i ∈ V,
j∈V + (i) j∈V − (i)
The constraint matrix is TU. And the following is the strong dual problem for the max
flow problem
X
min bij wij
(i,j)∈E
Example 7.20. Use LP to find a maximum s−t flow in the following capacitated network
55
CHAPTER 7. TOTAL UNIMODULARITY (TU)
56
III
Further Methods and Algorithms
z1 > z2 > · · · zs ≥ z ∗
z1 < z2 < · · · zt ≤ z ∗
57
CHAPTER 8. BRANCH AND BOUND
A typical way to represent such a divide and conquer approach is via an enumeration tree.
Example 8.2. Let S ⊆ {0, 1}3 . How do we break the set into smaller sets, and construct
the enumeration tree?
Example 8.3. Let S be the set of feasible tours of the traveling salesman problem on a
network of 4 cities. Let node 1 be the departure city.
• S can be subdivided into the disjoint sets of tours that start with an arc (12), (13)
or (14) respectively, i. e.
S = S(12) ∪ S(13) ∪ S(14),
where S(1i) means the tour starting with arc (1i).
• Each of the sets S(12), S(13) and S(14) can be further subdivided according to the
choice of the second arc, S(12) = S(12)(23) ∪ S(12)(24) etc.
• Each of these sets corresponds to a specific TSP tour and cannot be further subdi-
vided. We have found an enumeration tree of the TSP tours.
We see that S was decomposed on a first level, and then each of the constituent parts
was decomposed on a further level and so on.
Thus, Proposition 8.1 allows us to decompose a hard problem into a possibly large number
of easier branch problems, and to find an optimal solution of S by comparing the solutions
found for the branch problems.
However, for even quite moderately sized problems such a tree can no longer be explicitly
enumerated, as the number of leaves grows exponentially in the problem size.
The idea of implicit enumeration is based on building up the enumeration tree as we
explore it, and to prune certain parts that are not worth looking at before those parts are
even generated.
58
CHAPTER 8. BRANCH AND BOUND
(14)
(12) (13)
z = max{c⊤ x | x ∈ S},
and let
S = S1 ∪ · · · ∪ Sk
be a decomposition of its feasible domain into smaller sets. Let
z j ≤ z j ≤ z̄ j
z j = max{c⊤ x | x ∈ Sj }
We speak of pruning a branch when we detect that we need no longer explore it further.
This can happen for a variety of reasons.
59
CHAPTER 8. BRANCH AND BOUND
36 33
S S
20 25
33 24 33 24
S1 S2 S1 S2
21 25
25 21
Remember that we do not set up the decomposition tree explicitly but use an implicit
enumeration, typically by introducing more and more extra constraints as we trickle down
towards the leaves of the enumeration tree.
As we proceed, the constraints may become incompatible and correspond to an empty set
Sj , but this may not be a-priori obvious and has to be detected.
8.2.3 Pruning by Optimality When z j = z̄ j for some j, then the branch correspond-
ing to Sj no longer has to be considered further, as an optimal solution z j = z j = z̄ j for
this branch is already available.
However, we will not throw this solution away, as it may later turn out to be optimal for
the parent problem S.
36 33
S S
20 27
33 27 33 27
S1 S2 S1 S2
25
25 27 27
60
CHAPTER 8. BRANCH AND BOUND
bounds.
• Use Proposition 8.4 to tighten bounds at the root.
• Prune branches that need not be explored.
If LP relaxation is used for generating dual bounds in a branch-and-bound system, we
speak of LP based branch-and-bound. We will now explore this framework in more detail.
x∗ ←− xj , (update incumbent),
List ←− List \ {Sj }, (prune by optimality),
[1] [2]
. else branch Sj into two subproblems Sj , Sj ,
[1] [2]
List ←− (List \ {Sj }) ∪ {Sj , Sj }.
end.
3.) Stop with incumbent x∗ optimal (x∗ = ∅ is a certificate of infeasibility of the
problem).
A variety of modifications can be made to the above algorithm. For instance, the efficiency
61
CHAPTER 8. BRANCH AND BOUND
of the algorithm can be improved if in step 2.i) we also compute a primal bound for Sj
and use this to update z as well as an upper bound z̄.
Four Questions
1.) How are the bounds to be obtained? This question has been answered already. The
primal (lower) bounds are provided by feasible solutions, and dual (upper) bounds
by relaxation or duality.
2.) How should the feasible region be separated into smaller regions? One simple idea is
to choose an integer variable that is fractional in the linear programming solution,
and split the problem into two about this fractional value. If xj = xej ∈
/ Z, one can
take
S1 = S ∩ {x | xj ≤ ⌊e
xj ⌋}
S2 = S ∩ {x | xj ≥ ⌈e
xj ⌉}
S1 = S ∩ x | xj ≤ 43
= S ∩ {x | xj ≤ 1}
4
S2 = S ∩ x | xj ≥ 3 = S ∩ {x | xj ≥ 2}
(P ) max{c⊤ x | Ax ≤ b, x ≥ 0} and
⊤ ⊤
(D) min{b y | A y ≥ c, y ≥ 0}.
(P ′ ) max{c⊤ x | Ax ≤ b, a⊤
m+1 x ≤ bm+1 , x ≥ 0}
This corresponds to the situation where a new variable appears in the dual
problem
4.) In what order should the subproblems (nodes in enumerate tree) be examined? There
is no single answer that is best for all instances.
62
CHAPTER 8. BRANCH AND BOUND
8.4 An Example
Example 8.7. Solve the following IP.
(i) Bounding: To obtain the first upper bound, solve the LP relaxation
Let x3 , x4 , x5 be slack variables. By the simplex method, the resulting optimal basis
representation is
z̄ = max − 74 x3 − 17 x4 + 59
7
x1 + 71 x3 + 27 x4 = 20
7
x2 +x4 = 3
− 27 x3 + 10 x +x5
7 4
= 23
7
x1 , x 2 , x3 , x4 , x5 ≥ 0
S1 = S ∩ {x | x1 ≤ 2}, S2 = S ∩ {x | x1 ≥ 3}.
We now have the tree in the Figure 8.4. The subproblem (nodes) S1 , S2 that must
still be examined are called active. The node S on the other hand has been
processed and is inactive.
(iii) Choosing an active node: During the run of the algorithm, we maintain a list
of active nodes. This list currently consists of S1 , S2 . Later we will discuss breath-
first versus depth-first choices of the next active node to be processed. For now we
arbitrarily select S1 .
63
CHAPTER 8. BRANCH AND BOUND
59/7
S
-inf
x1>= 3
x1<= 2
S1 S2
z̄ ′ = max − 47 x3 − 17 x4 + 59
7
x1 + 17 x3 + 27 x4 = 20
7
x2 +x4 = 3
− 72 x3 + 10 x +x5
7 4
= 23
7
− 71 x3 2
− 7 x4 +s = − 76
x1 , x 2 , x 3 , x4 , x5 , s ≥ 0
After two simplex pivots, the linear program is reoptimized, giving z̄ 1 = 15/2 and
(x̄11 , x̄12 ) = (2, 12 ).
(v) Branching: S1 is not solved to optimality and cannot be pruned, so using the same
branching rule as before, we have two new nodes
64
CHAPTER 8. BRANCH AND BOUND
59/7
S
-inf
x1<= 2 x1>= 3
15/2
S1 S2
x2>= 1
x2=0
S11 S12
and add them to the node list. The tree is now as shown in Figure 8.5 and the new
list of active nodes is S11 , S12 , S2 .
(vi) Choosing an active node: We arbitrarily choose S2 for processing.
(vii) Bounding: We compute a bound z̄ 2 by solving the LP relaxation
(LP (S2 )) z = max 4x1 − x2
s.t. 7x1 − 2x2 ≤ 14
x2 ≤ 3
2x1 − 2x2 ≤ 3
x1 ≥ 3
x ∈ R2+
of the problem
(IP2 ) z 1 = max{c⊤ x | x ∈ S2 }.
To solve LP (S2 ), we use the dual simplex algorithm in the same way as above. The
constraint x1 ≥ 3 is first written as x1 − t = 3, t ≥ 0 which expressed in terms of
the nonbasic variables becomes:
1 2 1
x3 + x4 + t = − .
7 7 7
The resulting linear program is
z̄ = max − 74 x3 − 17 x4 + 59
7
x1 + 17 x3 + 27 x4 = 20
7
x2 +x4 = 3
− 27 x3 + 10 x +x5
7 4
= 23
7
1
x
7 3
+ 72 x4 +t = − 17
x1 , x 2 , x 3 , x4 , x5 , t ≥ 0.
65
CHAPTER 8. BRANCH AND BOUND
59/7
S
x1<= 2 x1>= 3
-INF
15/2
S1 S2
x2>= 1
x2=0
6 7
S11 S12
7
66
CHAPTER 8. BRANCH AND BOUND
LP or IP models can often be simplified by reducing the number of variables and con-
straints (e.g. eliminating the redundant constraints), and IP models can be tightened
before any actual branch-and-bound computations are performed. All the commercial
branch-bound systems carry out such a check, called preprocessing.
max 2x1 + x2 − x3
s. t. 5x1 − 2x2 + 8x3 ≤ 15,
8x1 + 3x2 − x3 ≥ 9,
x1 + x2 + x3 ≤ 6,
0 ≤ x1 ≤ 3,
0 ≤ x2 ≤ 1,
1 ≤ x3 .
67
CHAPTER 8. BRANCH AND BOUND
• As some of the bounds have changed after the first sweep, we may now go back to
the first constraint and tighten the bounds yet further. Isolating x3 , we obtain
9 101
x1 + x2 + x3 ≤ +1+ < 6,
5 64
which shows that this constraint x1 + x2 + x3 ≤ 6 is redundant and can be omitted from
the problem. The remaining problem is
max 2x1 + x2 − x3
5x1 − 2x2 + 8x3 ≤ 15,
8x1 + 3x2 − x3 ≥ 9,
7/8 ≤ x1 ≤ 9/5,
0 ≤ x2 ≤ 1,
1 ≤ x3 ≤ 101/64.
Example 8.8 shows how to simplify linear programming instances. In the preprocessing
of IPs we have further possibilities:
(i) For all xj with an integrality constraint xj ∈ Z any bounds lj ≤ xj ≤ uj can be
tightened to ⌈lj ⌉ ≤ xj ≤ ⌊uj ⌋.
(ii) For binary variables new logical or Boolean constraints can be derived that tighten
the formulation and hence lead to fewer branching nodes in a branch-and-bound
procedure.
68
CHAPTER 8. BRANCH AND BOUND
Example 8.9. Consider a binary IP instance whose feasible set is defined by the following
constraints,
8.5.4 Generating logical inequalities The first constraint shows that x1 = 1 implies
x3 = 1, which can be written as
x1 ≤ x3 .
x1 ≤ x4 .
x1 + x2 ≤ 1.
8.5.5 Combining pairs of logical inequalities We now consider pairs involving the
same variables.
x1 ≤ x3 and x1 ≥ x3 =⇒ x1 = x3 .
x1 + x2 ≤ 1 and x2 ≤ x1 =⇒ x2 = 0,
and then
x2 + x4 ≥ 1 =⇒ x4 = 1.
69
CHAPTER 8. BRANCH AND BOUND
Indeed, both points satisfy all four constraints and are binary. Thus,
8.6.2 Bredth-First search strategy In the breadth-first strategy, one ensures that
the associated tree is as short as possible, by branching all problems on the current level
of the tree first, before turning to the next level of the tree.
8.6.3 Best-Node-First To minimise the total number of nodes processed during the
run of the algorithm, the optimal strategy is to always choose the node with the largest
upper bound, i.e., Sj such that
z̄ j = max{z i : Si ∈ List}.
With such a rule, we will never branch on a node St whose upper bound z̄ ⊤ is smaller
than the optimal value z of S. This is called a best-node-first strategy. The depth-first
and best-node-first strategies are usually mutually contradictory, so a compromise has to
be reached. Usually, depth-first is used initially until a feasible solution is found and a
lower bound z is established.
70
CHAPTER 9. CUTTING PLANE ALGORITHM
8.7.1 Most Fractional Variable Let C be the set of fractional variables of the solution
x∗ to a LP relaxation. The most fractional variable approach is to branch on the variable
that corresponds to the index
8.7.2 Branching by Priorities In this approach the user can indicate a priority of
importance for the decision variables to be integer. The system will then branch on the
fractional variable with highest priority.
Suppose the underlying optimization problem is a minimization problem. Since rounding
up the variables yj corresponding to large fixed costs fj changes the objective function
more severely than rounding yj corresponding to small fixed costs, we prioritise the yj in
order of decreasing fixed costs fj .
• Finding feasible solutions via heuristics. This is hard, but when it succeeds, it
generates lower bounds z that allow pruning by bounding.
• Finding better dual bounds by using
– combinatorial relaxation (see Section 6.3.2),
– duality (see Section 6.3.3),
• Tightening the formulation of the IP, having the effect that fewer branching nodes
are needed until an optimal solution is found. This can be achieved
– by cutting plane methods
– or by Lagrangian relaxation.
71
CHAPTER 9. CUTTING PLANE ALGORITHM
X = P ∩ Zn
where
P = {x ∈ R4+ | 13x1 + 20x2 + 11x3 + 6x4 ≥ 72}.
Dividing by 11 gives the valid inequality for P ,
13 20 6 72
x1 + x 2 + x3 + x4 ≥ .
11 11 11 11
Since x is nonnegative, rounding up the coefficients on the left to the nearest integer leads
to
72
2x1 + 2x2 + x3 + x4 ≥ ,
11
which is a weaker valid inequality for P .
Notice that x ∈ X is integer, and that the coefficients are integer. Thus, the left-hand-
side of the above inequality must be integer. An integer that is greater than or equal
to 72
11
must be at least 7. So rounding the right up to the nearest integer gives the valid
inequality for X:
2x1 + 2x2 + x3 + x4 ≥ 7.
Similarly, we can verify that all the following inequalities are valid inequalities for X:
x1 + x2 + x3 + x4 ≥ 4,
x1 + 2x2 + x3 + x4 ≥ 6,
3x1 + 4x2 + 2x3 + x4 ≥ 12.
Clearly,
• there is no point in X satisfying x2 = x4 = 0. So all feasible solutions satisfy
x2 + x4 ≥ 1
72
CHAPTER 9. CUTTING PLANE ALGORITHM
Since
X = {(0, 0), (x, 1) with 0 ≤ x ≤ 5},
it is easily checked that the inequality
x ≤ 5y
is valid.
Example 9.5 (Mixed integer set). Consider the set
Clearly,
x ≤ b − α(K − y),
where
b b
K= , α=b− − 1 C.
C C
b
K −1< ≤K
C
i. e.,
C(K − 1) < b ≤ KC.
For y ∈ {0, 1, ..., K − 1}, define
73
CHAPTER 9. CUTTING PLANE ALGORITHM
Further, define
SK = {(x, y) | y ≥ K, y ∈ Z, 0 ≤ x ≤ b}.
It is easy to see that
All feasible points of X are above the straight line crossing the two points:
x = αy + β.
α = b − (K − 1)C, β = b − αK.
x ≤ αy + b − αK
= b − α(K − y).
X = P ∩ (Z4 × R1 )
where
P = {(y, s) ∈ R4+ × R1+ | 13y1 + 20y2 + 11y3 + 6y4 + s ≥ 72}.
It is not difficult to prove that
72 − s
≥ 7 − αs for some α.
11
by 11 yields
13 20 6 72 − s
y1 + y2 + y3 + y4 ≥
11 11 11 11
which suggests that
72 − s 1
2y1 + 2y2 + y3 + y4 ≥ ≥7− s
11 6
is a valid inequality for X.
74
CHAPTER 9. CUTTING PLANE ALGORITHM
• The inequality
n
X
waj xj ≤ wb
j=1
Pn
is valid for P as w ≥ 0 and j=1 aj xj ≤ b.
75
CHAPTER 9. CUTTING PLANE ALGORITHM
• The inequality
n
X
⌊waj ⌋xj ≤ wb
j=1
is valid for P as x ≥ 0.
• The inequality
n
X
⌊waj ⌋xj ≤ ⌊wb⌋
j=1
Pn
is valid for X as x is integer, and thus j=1 ⌊waj ⌋xj is integer.
This simple procedure is sufficient to generate all valid inequalities for an integer program.
Theorem 9.10. Every valid inequality for X (the feasible set of integer programs) can be
obtained by applying the Chvátal-Gomory procedure a finite number of times.
P = {x | Ax ≤ b, x ≥ 0}
of
X = P ∩ Zn ,
find a set of valid inequalities Qx ≤ q for X, add these to the formulation, immediately
giving a new formulation
P ′ = {x | Ax ≤ b, Qx ≤ q, x ≥ 0}
with X = P ′ ∩ Zn . Since P ′ ⊆ P , P ′ is better than P . Then one can apply the branch-
and-bound method or whatever, to the formulation P ′ .
76
CHAPTER 9. CUTTING PLANE ALGORITHM
where n
X
P = {(x, y) | 0 ≤ xi ≤ 1, i ∈ {1, . . . , n}, xi ≤ ny, 0 ≤ y ≤ 1}.
i=1
For a point (x, y) ∈ X notice that ni=1 xi ≤ ny and x ≥ 0. Thus, if there exists an
P
i ∈ {1, . . . , n} with xi > 0, then necessarily y = 1. Thus, the following inequalities
xi ≤ y, i ∈ {1, . . . , n}
are valid inequalities for X. Adding all these valid inequalities to P yields the formulation
n
X
P ′ = {(x, y) | 0 ≤ xi ≤ 1, xi ≤ y, i ∈ {1, . . . , n}, xi ≤ ny, 0 ≤ y ≤ 1}.
i=1
Since
xi ≤ y, i ∈ {1, . . . , n}
implies that
n
X
xi ≤ ny
i=1
this inequality becomes redundant, and can be removed from the system. Thus, we obtain
9.3.3 Gomory’s Fractional Cutting Plane Algorithm The fundamental idea be-
hind the cutting plane method is to add constraints to a linear program cutting off its
optimal solutions until the optimal basic feasible solution takes on integer values.
Of course, we have to be careful which constraints we add: we would not want to change
the problem by adding additional constraints. Here, we will add a special type of con-
straint called a cut.
A cut relative to a current fractional solution satisfies the following criteria:
77
CHAPTER 9. CUTTING PLANE ALGORITHM
• Every feasible integer point is feasible for the cut (therefore, a cut must be a valid
inequality for the integer feasible set), and
• the current fractional solution is not feasible for the cut.
Two ways to generate cuts:
• The first, called Gomory cuts, generates cuts from any linear programming tableau.
This has the advantage of “solving” any problem but has the disadvantage that the
method can be very slow.
• The second approach is to use the structure of the problem to generate very good
cuts. This approach needs a problem-by-problem analysis, but can provide very
efficient solution techniques.
In what follows, we focus our attention on the first approach.
Example 9.12. Consider the integer set
X = {(x, y) | x ≤ 10y, 0 ≤ x ≤ 14, x ∈ Z1+ , y ∈ {0, 1, 2, 3}}.
Find a cut cutting off the point (14, 1.4).
Consider the IP problem:
max{c⊤ x | Ax = b, x ≥ 0 and integer}.
The idea is to first solve the associated linear programming relaxation and find an optimal
basis, choose a basic variable that is not integer, and then generate a Chvátal-Gomory
inequality on the constraint associated with this basic variable so as to cut off the linear
programming solution. We suppose, given an optimal basis, that the problem is rewritten
in the form:
X
Maximize f0 + c̄j xj
j∈N
X
subject to xB + āj xj = b̄
j∈N
x ≥ 0 and integer
where b̄ = B −1 b, aj is a column vector, xj (j ∈ N ) are nonbasic variables, and xB is the
vector of basic variables.
If the basic optimal solution x∗ is not integer, there exists some row i with b̄i ∈
/ Z.
Choosing such a row, the Chvátal-Gomory inequality for row i is
!
X
(xB )i + ⌊āj ⌋xj ≤ ⌊b̄i ⌋,
j∈N i
where ⌊aj ⌋ = (⌊aj,1 ⌋, . . . , ⌊aj,k ⌋) for aj = (aj,1 , . . . , aj,k ). Combining this inequality with
previous equation leads to
!
X
(āj − ⌊āj ⌋)xj ≥ b̄i − ⌊b̄i ⌋,
j∈N i
78
CHAPTER 9. CUTTING PLANE ALGORITHM
i. e., X
(āji − ⌊āji ⌋)xj ≥ b̄i − ⌊b̄i ⌋,
j∈N
As x∗j = 0 for all nonbasic variables j ∈ N in the optimal LP solution, x∗ does not satisfy
the Chvátal-Gomory cut, and hence x∗ will be cut off.
Based on these considerations, we now describe a basic cutting plane algorithm for (IP):
max{c⊤ x | x ∈ X = P ∩ Zn }.
z̄ ⊤ = max{c⊤ x | x ∈ P k }.
P k+1 = P k ∩ {x | (uk )⊤ x ≤ γ k },
and augment k.
Stop.
The general idea of a cutting-plane algorithm is as follows:
(i) We find an optimal solution x∗ for the linear program max{c⊤ x | x ∈ P }. This can
be done by any linear programming algorithm (possibly a solver that is available
only as a black-box).
(ii) If x∗ is integral, we already have an optimal solution to the IP and we can terminate.
(iii) Otherwise, we search our family (or families) of valid inequalities for inequalities
which are violated by x∗ , that is, w⊤ x∗ > d where w⊤ x ≤ d is valid for P .
(iv) We add the inequalities found to our LP-relaxation and resolve to find a new optimal
solution x∗∗ of the improved formulation. This procedure is continued.
(v) If we are fortunate, we terminate with an optimal integral solution.
79
CHAPTER 9. CUTTING PLANE ALGORITHM
(vi) If we are so lucky, we still have gained something. Namely, we have found a new
formulation for our initial problem which is better than the original one (since we
have cut off some non-integral points). The formulation obtained upon termination
gives an upper bound z̄ for the optimal objective function value z ∗ which is no worse
than the initial one (and usually is much better). We can now use z̄ in a branch
and bound algorithm.
Remark 9.13. If the algorithm terminates without finding an integer solution for (IP),
The next two examples show how the Gomory’s cutting plane algorithm works.
z = max 4x1 − x2
s.t. 7x1 − 2x2 ≤ 14,
x2 ≤ 3,
2x1 − 2x2 ≤ 3,
x1 , x2 ≥ 0 and integer.
Adding slack variables x3 , x4 , x5 , observe that as the constraint data is integer, the slack
variables must also take integer values. Now solving as a linear program gives:
59
z = max 7
− 47 x3
− 71 x4
x1 + 17 x3
+ 72 x4 = 20
7
,
x2 +x4 = 3,
2 10 23
− 7 x3 + 7 x4 +x5 = 7
,
x1 , x 2 , x3 , x4 , x5 ≥ 0
so we use the first row, in which the basic variable x1 is fractional, to generate the Gomory
cut:
1 2 6
x 3 + x4 ≥ .
7 7 7
By adding slack variable s ≥ 0, this can be written as
1 2 6
s = x3 + x4 − .
7 7 7
80
CHAPTER 9. CUTTING PLANE ALGORITHM
Adding this cut to the above LP problem, and reoptimizing it leads to the new optimal
tableau:
z = max 15 2
− 21 x5 −3s
x1 +s = 2,
x2 − 21 x5 +s = 12 ,
x3 −x5 −5s = 1,
x4 + 12 x5 +6s = 52 ,
x1 , x 2 , x 3 , x 4 , x5 , s ≥ 0
Now the new optimal linear programming solution
⊤
1 5
x = 2, , 1, , 0
2 2
is still not integer, as the variable x2 is fractional. The Gomory fractional cut on row 2 is
1 1
x5 ≥
2 2
or
1 1
− x5 + t = −
2 2
with t ≥ 0. Adding the constraint and reoptimizing, we obtain
z = max 7 −3s −t
x1 +s = 2,
x2 +s −t = 1,
x3 −5s −2t = 2,
x4 +6s +t = 2,
x5 −t = 1,
x1 , x 2 , x 3 , x 4 , x 5 s, t, ≥ 0
Now the linear programming solution is integer, thus optimal, and whence x∗ = (2, 1)⊤
solves the original integer program.
Example 9.15 (Gomory method – Understand the Gomory cut in different way). Con-
sider the following integer program:
z = max 7x1 + 9x2
s.t. −x1 + 3x2 ≤ 6,
7x1 + x2 ≤ 35,
x1 , x2 ≥ 0 and integer.
First ignoring the integrality condition and solving its LP relaxation by the simplex
method yields the following optimal tableau, where s1 and s2 are slack variables
x1 x2 s1 s2 RHS
x2 0 1 7/22 1/22 7/2
x1 1 0 −1/22 3/22 9/2
−z 0 0 28/11 15/11 63
81
CHAPTER 9. CUTTING PLANE ALGORITHM
Thus,
1 21 3
x1 − s 1 − 4 = − s1 − s2
2 22 22
giving the constraint
1 21 3
− s1 − s2 ≤ 0,
2 22 22
which is a Gomory cut.
82
IV
More on Relaxation and
Applications
10 – Lagrangian relaxation
Lagrangian relaxation provides an improvement to LP-relaxation for certain IP and MIP
problems. Consider integer programming problems of the following type:
(IP ) z = max c⊤ x
s. t. Ax ≤ a,
Dx ≤ d,
x ∈ Zn+ .
Suppose that the constraints Ax ≤ a are “nice” (e.g., A is totally unimodular) in the sense
that an IP with just these constraints is easy to solve. Thus if we drop the complicating
constraints “Dx ≤ d”, we have the following relaxation
max c⊤ x
s. t. Ax ≤ a
x ∈ Zn+ .
For such problems we will now derive a family of relaxations that generate stronger bounds
than LP relaxations. Consequently, branch-and-bound systems built around these relax-
ations are often more efficient than an LP-based approach.
We write (IP) in the slightly more general form
(IP ) z = max c⊤ x
s .t. Dx ≤ d (or Dx = d)
x ∈ X,
83
CHAPTER 10. LAGRANGIAN RELAXATION
Proof. Assume that the constraints are all inequality “≤”type. The equality case “=”
can be proved in a similar way.
• The feasible region of (IP (u)) is larger than that of (IP ),
{x ∈ X | Dx ≤ d} ⊆ X.
• For all x feasible to (IP ), the objective function of (IP (u)) is at least as large as
that of (IP), in fact,
c⊤ x + u⊤ (d − Dx) ≥ c⊤ x
due to u⊤ (d − Dx) ≥ 0.
We have infinitely many Lagrangian relaxations (IP (u)) to choose from. So, how should
we fix the vector u of Lagrange multipliers?
Since (IP (u)) is a relaxation of (IP ), we have z ≤ z(u). Therefore, to find the least upper
bound of this kind, we have to solve the Lagrangian dual problem
Proof. Note that the conditions in both (i) and (ii) imply that z(u) = c⊤ x(u). Since
(IP (u)) is a relaxation of (IP ) we can apply Proposition 6.8, which tells us that x(u) is
optimal for (IP ).
84
CHAPTER 10. LAGRANGIAN RELAXATION
Example 10.4 (UFL). Consider the uncapacitated facility location problem from Chap-
ter 4,
XX X
(IP ) z = max cij xij − fj y j
i∈M j∈N j∈N
X
s. t. xij = 1 (i ∈ M )
j∈N
xij − yj ≤ 0 (i ∈ M, j ∈ N )
|M |×|N |
x ∈ R+ , y ∈ {0, 1}|N | .
where
• M is the set of customer locations,
• N is the set of potential facility locations,
• fj are the fixed costs for opening facility j,
• and where we replaced the original servicing costs cij with −cij to turn the problem
into a maximisation problem.
P
Dualising the demand constraints j∈N xij = 1, we find the Lagrangian relaxation
XX X X
(IP (u))) z(u) = max (cij − ui )xij − fj yj + ui
i∈M j∈N j∈N i∈M
s.t. xij − yj ≤ 0 (i ∈ M, j ∈ N )
|M |×|N |
x ∈ R+ , y ∈ {0, 1}|N | .
Now note that the constraint that linked the different facility locations to one another
has been subsumed in the objective function, so that (IP (u)) decouples,
X X
z(u) = zj (u) + ui
j∈N i∈M
85
CHAPTER 10. LAGRANGIAN RELAXATION
(IP ) z = max c⊤ x
s .t. Dx ≤ d,
x ∈ X.
where Dx ≤ d are m constraints that make the problem difficult, while optimising over
X is easy.
The Lagrangian dual of (IP) is
where
(IP (u)) z(u) = max{c⊤ x + u⊤ (d − Dx) | x ∈ X}.
We saw that (LD) is a dual of (IP), so that z ≤ wLD . Solving (LD) is thus a good way of
generating a hopefully quite tight bound on z, but note that z(u) is generally a nonlinear
function, and we do not know yet how to solve such problems efficiently.
We would like to answer the questions:
(i) How good is the upper bound obtained by solving the Lagrangian dual?
(ii) More specifically, does the Lagrangian dual give a stronger bound on (IP) than the
respective LP relaxation?
(iii) How can one solve the Lagrangian dual?
The strength of the Lagrangian dual. To answer the aforementioned questions, we
must understand the structure of (LD) better.
Let us assume that X contains a large but finite number of points X = {x[1] , . . . , x[T ] }.
Then
wLD = min η
s. t. η ≥ c⊤ x[t] + u⊤ (d − Dx[t] ) (for t ∈ {1, . . . , T })
u ∈ Rn+ , η ∈ R.
86
CHAPTER 10. LAGRANGIAN RELAXATION
The result still holds true for arbitrary X = {x ∈ Zn+ | Ax ≤ b}, not only when X is
finite.
Theorem 10.5.
wLD = max{c⊤ x | Dx ≤ d, x ∈ conv(X)}.
Corollary 10.6. (i) If {x ∈ Rn+ | Ax ≤ b} is an ideal formulation of X, then
conv(X) = {x ∈ Rn | Ax ≤ b, x ≥ 0},
and hence, wLD coincides with the bound produced by the LP relaxation of (IP),
conv(X) ⊂ {x ∈ Rn+ | Ax ≤ b}
and wLD is generally a tighter bound than the one given by the LP relaxation of
(IP).
Some remarks:
• Situation (i) of Corollary 2 happens quite often, because
max{c⊤ x | x ∈ X}
87
CHAPTER 10. LAGRANGIAN RELAXATION
Let us first assume again that X is of finite cardinality and write X = {x[1] , . . . , x[T ] }.
• It is not very difficult to prove that z(u) is a piecewise linear function
c⊤ x[t] + u⊤ (d − Dx[t] )
u 7−→ max (10.1)
t=1,...,T
• Furthermore, this function is convex, since the linear functions u 7−→ c⊤ x[t] +u⊤ (d−
Dx[t] ) are convex and the pointwise maximum of a set of convex functions is again
convex.
• The Lagrangian dual function z(u) is not differentiable at the breakpoints where
the maximum in (10.1) is achieved by more than one index.
To solve (LD) we need to minimise z(u) over u ≥ 0.
Convex nondifferentiable functions can often be reasonably well solved by the subgradient
algorithm which we will introduce next.
c⊤ x[t] + v ⊤ (d − Dx[t] )
z(v) = max
t=1,...,T
∗ ∗
≥ c⊤ x[t ] + v ⊤ (d − Dx[t ] )
∗ ∗ ∗
= c⊤ x[t ] + u⊤ (d − Dx[t ] + (v − u)⊤ (d − Dx[t ] )
∗
= z(u) + (v − u)⊤ (d − Dx[t ] ),
∗
which implies that d − Dx[t ] is a subgradient of z(u) at u.
88
CHAPTER 11. FURTHER APPLICATIONS OF INTEGER PROGRAMMING
End.
Some remarks:
• The motivation of the algorithm is very simple: in each iteration of the main loop
the Lagrange multiplier vector is improved by correcting it in a direction that makes
the objective function z(u) decrease.
• Note that we built in a mechanism that prevents individual components of u[k+1]
to become negative. For Lagrange multipliers corresponding to equality constraints
Dx = d, this safeguard is of course unnecessary.
• The implementation of the subgradient algorithm is very simple too, but the diffi-
culty lies in choosing the step length µk , as the following theorem shows.
(ii) If µk = µ0 ρk for some fixed ρ ∈ (0, 1), then z(uk] ) → wLD if µ0 and ρ are sufficiently
large.
(iii) If w̄ ≥ wLD ,
εk (z(u[k] − w̄)
µk =
∥d − Dx(u[k] )∥2
where εk ∈ (0, 2) for all k, then either z(u[k] ) → wLD for k → ∞, or else w̄ ≥
z(u[k] ) ≥ wLD occurs for some finite k.
89
CHAPTER 11. FURTHER APPLICATIONS OF INTEGER PROGRAMMING
the cost of producing x units of a specific product might consist of a fixed cost of setting
up the equipment and a variable cost per unit produced on the equipment.
Assume that the equipment has a capacity of N units. Define y to be a binary variable
that indicates when the fixed cost is incurred, so that y = 1 when x > 0 and y = 0 when
x = 0. Then the contribution to cost due to x may be written as
Ky + cx,
x = δ1 + δ2 + δ3 ,
where
0 ≤ δ1 ≤ 4, 0 ≤ δ2 ≤ 6, 0 ≤ δ3 ≤ 5; (11.1)
and the total variable cost is given by:
90
CHAPTER 11. FURTHER APPLICATIONS OF INTEGER PROGRAMMING
If we let
( (
1 if δ1 is at its upper bound, 1 if δ2 is at its upper bound,
w1 = w2 =
0 otherwise, 0 otherwise,
to ensure that the proper conditional constraints hold. Note that if w1 = 0, then w2 = 0,
to maintain feasibility for the constraint imposed upon δ2 , and (11.2) reduces to
0 ≤ δ1 ≤ 4, δ2 = 0, and δ3 = 0.
δ1 = 4, 0 ≤ δ2 ≤ 6, and δ3 = 0.
δ1 = 4, δ2 = 6, and 0 ≤ δ3 ≤ 5.
Hence, we observe that there are three feasible combinations for the values of w1 and w2 :
w1 = 0, w2 = 0 corresponding to 0 ≤ x ≤ 4 since δ2 = δ3 = 0;
w1 = 1, w2 = 0 corresponding to 4 ≤ x ≤ 10 since δ1 = 4 and δ3 = 0;
w1 = 1, w2 = 1 corresponding to 10 ≤ x ≤ 15 since δ1 = 4 and δ2 = 6.
The same general technique can be applied to piecewise linear curves with any number of
segments. The general constraint imposed upon the variable δj for the jth segment will
read:
Lj wj ≤ δj ≤ Lj wj−1 ,
where Lj is the length of the segment.
91