0% found this document useful (0 votes)
42 views53 pages

Wisdom of Crowds Intro

The document discusses optimization problems and linear programs. It introduces constrained optimization problems with an objective function, constraints, and feasible sets. Linear programs are a special case where the objective and constraints are linear. The document outlines the standard form of a linear program with decision variables, cost vector, constraints matrix, and right hand side vector. It also discusses how to transform between general and standard forms.

Uploaded by

Mallikarjun Rb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views53 pages

Wisdom of Crowds Intro

The document discusses optimization problems and linear programs. It introduces constrained optimization problems with an objective function, constraints, and feasible sets. Linear programs are a special case where the objective and constraints are linear. The document outlines the standard form of a linear program with decision variables, cost vector, constraints matrix, and right hand side vector. It also discusses how to transform between general and standard forms.

Uploaded by

Mallikarjun Rb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Optimization

Felix Fischer
[email protected]

April 30, 2014


Contents

1 Introduction and Preliminaries 1


1.1 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Review: Unconstrained Optimization and Convexity . . . . . . . . . . . 3

2 The Method of Lagrange Multipliers 5


2.1 Lagrangian Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Using Lagrangian Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Shadow Prices and Lagrangian Duality 9


3.1 Shadow Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Conditions for Strong Duality 13


4.1 Supporting Hyperplanes and Convexity . . . . . . . . . . . . . . . . . . 13
4.2 A Sufficient Condition for Convexity . . . . . . . . . . . . . . . . . . . . 15

5 Solutions of Linear Programs 17


5.1 Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Extreme Points and Optimal Solutions . . . . . . . . . . . . . . . . . . . 19
5.3 A Naive Approach to Solving Linear Programs . . . . . . . . . . . . . . 20

6 Linear Programming Duality 21


6.1 The Relationship between Primal and Dual . . . . . . . . . . . . . . . . 21
6.2 Necessary and Sufficient Conditions for Optimality . . . . . . . . . . . . 23

7 The Simplex Method 25


7.1 The Simplex Tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 Using The Tableau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

8 The Two-Phase Simplex Method 29

9 Non-Cooperative Games 33
9.1 Games and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.2 The Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

iii
iv

10 The Minimum-Cost Flow Problem 37


10.1 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.2 Minimum-Cost Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.3 Sufficient Conditions for Optimality . . . . . . . . . . . . . . . . . . . . 38
10.4 The Transportation Problem . . . . . . . . . . . . . . . . . . . . . . . . 39

11 The Transportation Algorithm 41


11.1 Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
11.2 The Simplex Method for the Transportation Problem . . . . . . . . . . . 41

12 The Maximum Flow Problem 45


12.1 The Max-Flow Min-Cut Theorem . . . . . . . . . . . . . . . . . . . . . . 45
12.2 The Ford-Fulkerson Algorithm . . . . . . . . . . . . . . . . . . . . . . . 46
12.3 The Bipartite Matching Problem . . . . . . . . . . . . . . . . . . . . . . 47
List of Figures

1.1 Geometric interpretation of a linear program . . . . . . . . . . . . . . . 2


1.2 Example of a convex set and a non-convex set . . . . . . . . . . . . . . . 3
1.3 Example of a convex function and a concave function . . . . . . . . . . . 3

4.1 Geometric interpretation of the dual . . . . . . . . . . . . . . . . . . . . 14

5.1 Illustration of linear programs with one optimal solution and an infinite
number of optimal solutions . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.1 Geometric interpretation of primal and dual linear programs . . . . . . . 22

9.1 Representation of the prisoner’s dilemma as a normal-form game . . . . 34


9.2 The game of chicken, where players can chicken out or dare . . . . . . . 34

10.1 Representation of flow conservation constraints by an instance of the


transportation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

11.1 Initial basic feasible solution of an instance of the transportation problem


and a cycle along which the overall cost can be decreased . . . . . . . . 42

12.1 An instance of the maximum flow problem . . . . . . . . . . . . . . . . . 47

v
1 Introduction and Preliminaries

1.1 Constrained Optimization


We consider constrained optimization problems of the form

minimize f(x)
subject to h(x) = b (1.1)
x ∈ X.

Such a problem is given by a vector x ∈ Rn of decision variables, an objective function


f : Rn → R, a functional constraint h(x) = b where h : Rn → Rm and b ∈ Rm , and a
regional constraint x ∈ X where X ⊆ Rn . The set X(b) = { x ∈ X : h(x) = b } is called
the feasible set, and a problem is called feasible if X(b) is non-empty and bounded if
f(x) is bounded from below on X(b). A solution x∗ is called optimal if it feasible and
minimizes f among all feasible solutions.
The assumption that the functional constraint holds with equality is without loss
of generality: an inequality constraint like h(x) 6 b can be re-written as h(x) + z = b,
where z is a new slack variable with the additional regional constraint z > 0.
Since minimization of f(x) and maximization of −f(x) are equivalent, we will often
concentrate on one of the two.

1.2 Linear Programs


The special case where the objective function and constraints are linear is called a linear
program (LP). In matrix-vector notation we can write an LP as

minimize cT x
subject to aTi x > bi , i ∈ M1
aTi x 6 bi , i ∈ M2
aTi x = bi , i ∈ M3
xj > 0, j ∈ N1
xj 6 0, j ∈ N2
where c ∈ Rn is a cost vector, x ∈ Rn is a vector of decision variables, and constraints
are given by ai ∈ Rn and bi ∈ R for i ∈ {1, . . . , m}. Index sets M1 , M2 , M3 ⊆ {1, . . . , m}
and N1 , N2 ⊆ {1, . . . , n} are used to distinguish between different types of contraints.
An equality constraint aTi x = bi is equivalent to the pair of constraints aTi 6 bi and
aTi x > bi , and a constraint of the form aTi x 6 bi can be rewritten as (−ai )T x > −bi .
Each occurrence of an unconstrained variable xj can be replaced by x+ −
j + xj , where xj
+

1
2 1 · Introduction and Preliminaries

x1 = 0
x1 + x2 = 5

x1 − x2 = 3

A E x2 = 0
D
x1 + 2x2 = 6
c x1 + x2 = 2

x1 + x2 = 0
F

Figure 1.1: Geometric interpretation of the linear program in Example 1.1

and x− + −
j are two new variables with xj > 0 and xj 6 0. We can thus write every linear
program in the general form

min { cT x : Ax > b, x > 0 } (1.2)

where x, c ∈ Rn , b ∈ Rm , and A ∈ Rm×n . Observe that constraints of the form xj > 0


and xj 6 0 are just special cases of constraints of the form aTi x > bi , but we often
choose to make them explicit.
A linear program of the form

min { cT x : Ax = b, x > 0 } (1.3)

is said to be in standard form. The standard form is of course a special case of the
general form. On the other hand, we can also bring every general form problem into the
standard form by replacing each inequality constraint of the form aTi x 6 bi or aTi x > bi
by a constraint aTi x + si = bi or aTi x − si = bi , where si is a new slack variable, and
an additional constraint si > 0.
The general form is typically used to discuss the theory of linear programming,
while the standard form is often more convenient when designing algorithms.

Example 1.1. Consider the following linear program, which is illustrated in Figure 1.1:

minimize −(x1 + x2 )
subject to x1 + 2x2 6 6
x1 − x2 6 3
x1 , x2 >0
1.3 · Review: Unconstrained Optimization and Convexity 3

δ=0 T
3
δ= 5
δ=1 y
x y
S x

Figure 1.2: A convex set S and a non-convex set T

f(x)

g(x)

x x

Figure 1.3: A convex function f and a concave function g

Solid lines indicate sets of points for which one of the constraints is satisfied with
equality. The feasible set is shaded. Dashed lines, orthogonal to the cost vector c,
indicate sets of points for which the value of the objective function is constant.

1.3 Review: Unconstrained Optimization and Convexity


Consider a function f : Rn → R, and let x ∈ Rn . A necessary condition for x to
minimize f over Rn is that ∇f(x) = 0, where
 T
∂f ∂f
∇f = ,...,
∂x1 ∂xn
is the gradient of f. A general function f may have many local minima on the feasible
set X, which makes it difficult to find a global minimum. However, if X is convex, and
f is convex on X, then any local minimum of f on X is also a global minimum on X.
Let S ⊆ Rn . S is called a convex set if for all δ ∈ [0, 1], x, y ∈ S implies that
δx + (1 − δ)y ∈ S. An illustration in show in Figure 1.2. A function f : S → R is called
convex function if the set of points above its graph is convex, i.e., if for all x, y ∈ S
and δ ∈ [0, 1], δf(x) + (1 − δ)f(y) > f(δx + (1 − δ)y). Function f is concave if −f is
convex. An illustration is shown in Figure 1.3.
If f is twice differentiable, it is convex on a convex set S if its Hessian
 2 
∂ f
Hf =
∂xi ∂xj ij
is positive semidefinite on S. A symmetric n×n matrix A is called positive semidefinite
if vT Av > 0 for all v ∈ Rn , or equivalently, if all eigenvalues of A are non-negative.
4 1 · Introduction and Preliminaries

Theorem 1.2. Let X ⊆ Rn be convex, f : Rn → R twice differentiable on X. Let


∇f(x∗ ) = 0 for x∗ ∈ X and Hf(x) positive semidefinite for all x ∈ X. Then x∗
minimizes f(x) subject to x ∈ X.

It is easy to see that in the case of LPs, the feasible set is convex and the objective
function is both convex and concave. But even when these two conditions are satisfied,
the above theorem cannot generally be used to solve constrained optimization problems,
because the gradient might not be zero anywhere on the feasible set.
2 The Method of Lagrange Multipliers
A well-known method for solving constrained optimization problems is the method
of Lagrange multipliers. The idea behind this method is to reduce constrained opti-
mization to unconstrained optimization, and to take the (functional) constraints into
account by augmenting the objective function with a weighted sum of them. To this
end, define the Lagrangian associated with (1.1) as

L(x, λ) = f(x) − λT (h(x) − b), (2.1)

where λ ∈ Rm is a vector of Lagrange multipliers.

2.1 Lagrangian Sufficiency


The following result provides a condition under which minimizing the Lagrangian,
subject only to the regional constraints, yields a solution to the original constrained
problem. The result is easy to prove, yet extremely useful in practice.

Theorem 2.1 (Lagrangian Sufficiency Theorem). Let x ∈ X and λ ∈ Rm such that


L(x, λ) = inf x 0 ∈X L(x 0 , λ) and h(x) = b. Then x is an optimal solution of (1.1).

Proof. We have that

min f(x 0 ) = 0min[f(x 0 ) − λT (h(x 0 ) − b)]


x 0 ∈X(b) x ∈X(b)

> min
0
[f(x 0 ) − λT (h(x 0 ) − b)]
x ∈X

= f(x) − λT (h(x) − b) = f(x).

Equality in the first line holds because h(x 0 ) − b = 0 when x 0 ∈ X(b). The inequality
on the second line holds because the minimum is taken over a larger set. In the third
line we finally use that x minimizes L and that h(x) = b.

Two remarks are in order. First, a vector λ of Lagrange multipliers satisfying the
conditions of the theorem is not guaranteed to exist in general, but it does exist for a
large class of problems. Second, the theorem appears to be useful mainly for showing
that a given solution x is optimal. In certain cases, however, it can also be used to find
an optimal solution. Our general strategy in these cases will be to minimize L(x, λ) for
all values of λ, in order to obtain a minimizer x∗ (λ) that depends on λ, and then find
λ∗ such that x∗ (λ∗ ) satisfies the constraints.

5
6 2 · The Method of Lagrange Multipliers

2.2 Using Lagrangian Sufficiency


We begin by applying Theorem 2.1 to a concrete example.

Example 2.2. Assume that we want to

minimize x1 − x2 − 2x3
subject to x1 + x2 + x3 = 5
x21 + x22 = 4.

The Lagrangian of this problem is

L(x, λ) = x1 − x2 − 2x3 − λ1 (x1 + x2 + x3 − 5) − λ2 (x21 + x22 − 4)


     
= (1 − λ1 )x1 − λ2 x21 + (−1 − λ1 )x2 − λ2 x22 + (−2 − λ1 )x3 + 5λ1 + 4λ2 .

For a given value of λ, we can minimize L(x, λ) by independently minimizing the


terms in x1 , x2 , and x3 , and we will only be interested in values of λ for which the
minimum is finite.
The term (−2 − λ1 )x3 does not have a finite minimum unless λ1 = −2. The terms
in x1 and x2 then have a finite minimum only if λ2 < 0, in which case an optimum
occurs when
∂L
= 1 − λ1 − 2λ2 x1 = 3 − 2λ2 x1 = 0 and
∂x1
∂L
= −1 − λ1 − 2λ2 x2 = 1 − 2λ2 x2 = 0,
∂x2
i.e., when x1 = 3/(2λ2 ) and x2 = 1/(2λ2 ). The optimum is indeed a minimum, because
   
∂2 L ∂2 L
∂x ∂x ∂x1 ∂x2 
−2λ 2 0
HL =  12 1 2
= ,
∂ L ∂ L
∂x2 ∂x1 ∂x2 ∂x2
0 −2λ2

is positive semidefinite when λ2 < 0.


Let Y be the set of values of λ such that L(x, λ) has a finite minimum, i.e.,

Y = {λ ∈ R2 : λ1 = −2, λ2 < 0}.

For every λ ∈ Y, the unique optimum of L(x, λ) occurs at x∗ (λ) = (3/(2λ2 ), 1/(2λ2 ), x3 )T,
and we need to find λ ∈ Y such that x∗ (λ) is feasible to be able to apply Theorem 2.1.
Therefore,
9 1
x21 + x22 = 2 + 2 = 4
4λ2 4λ2
p
and thus λ2 = − 5/8. We can now use Theorem p 2.1 to conclude
p that the minimization
problem
p has an optimal solution at x1 = −3 2/5, x2 = − 2/5, and x3 = 5 − x1 − x2 =
5 + 4 2/5.
2.3 · Complementary Slackness 7

Let us formalize the strategy we have used to find x and λ satisfying the conditions
of Theorem 2.1 for a more general problem. To

minimize f(x) subject to h(x) 6 b, x ∈ X (2.2)

we proceed as follows:
1. Introduce a vector z of slack variables to obtain the equivalent problem

minimize f(x) subject to h(x) + z = b, x ∈ X, z > 0.

2. Compute the Lagrangian L(x, z, λ) = f(x) − λT (h(x) + z − b).


3. Define the set
Y = {λ ∈ Rm : inf x∈X,z>0 L(x, z, λ) > −∞}.

4. For each λ ∈ Y, minimize L(x, z, λ) subject only to the regional constraints, i.e.,
find x∗ (λ), z∗ (λ) satisfying

L(x∗ (λ), z∗ (λ), λ) = inf x∈X,z>0 L(x, z, λ). (2.3)

5. Find λ∗ ∈ Y such that (x∗ (λ∗ ), z∗ (λ∗ )) is feasible, i.e., such that x∗ (λ∗ ) ∈ X,
z∗ (λ∗ ) > 0, and h(x∗ (λ∗ )) + z∗ (λ∗ ) = b. By Theorem 2.1, x∗ (λ∗ ) is optimal
for (2.2).

2.3 Complementary Slackness


It is worth pointing out a property known as complementary slackness, which follows
directly from (2.3): for every λ ∈ Y and i = 1, . . . , m,

(z∗ (λ))i 6= 0 implies λi = 0 and


λi 6= 0 implies (z∗ (λ))i = 0.

Indeed, if the conditions were violated for some i, then the value of the Lagrangian
could be reduced by reducing (z∗ (λ))i , while maintaining that (z∗ (λ))i > 0. This
would contradict (2.3). Further note that λ ∈ Y requires for each i = 1, . . . , m either
that λi 6 0 or that λi > 0, depending on the sign of bi . In the case where where λi 6 0,
we for example get that

(h(x∗ (λ∗ )))i < bi implies λ∗i = 0 and


λ∗i < 0 implies (h(x∗ (λ∗ )))i = bi .

Slack in the corresponding inequalities (h(x∗ (λ∗ )))i 6 bi and λ∗i 6 0 has to be comple-
mentary, in the sense that it cannot occur simultaneously in both of them.
8 2 · The Method of Lagrange Multipliers

Example 2.3. Consider the problem to

minimize x1 − 3x2
subject to x21 + x22 6 4
x1 + x2 6 2.

By adding slack variables, we obtain the following equivalent problem:

minimize x1 − 3x2
subject to x21 + x22 + z1 = 4
x1 + x2 + z2 = 2
z1 > 0, z2 > 0.

The Lagrangian of this problem is

L(x, z, λ) = x1 − 3x2 − λ1 (x21 + x22 + z1 − 4) − λ2 (x1 + x2 + z2 − 2)


   
= (1 − λ2 )x1 − λ1 x21 + (−3 − λ2 )x2 − λ1 x22 − λ1 z1 − λ2 z2 + 4λ1 + 2λ2 .

Since z1 > 0 and z2 > 0, the terms −λ1 z1 and −λ2 z2 have a finite minimum only if
λ1 6 0 and λ2 6 0. In addition, the complementary slackness conditions λ1 z1 = 0 and
λ2 z2 = 0 must hold at the optimum.
Minimizing L(x, z, λ) in x1 and x2 yields

∂L
= 1 − λ2 − 2λ1 x1 = 0 and
∂x1
∂L
= −3 − λ2 − 2λ1 x2 = 0,
∂x2
and we indeed obtain a minimum, because
   
∂2 L ∂2 L
∂x1 ∂x1 ∂x1 ∂x2
−2λ1 0
HL =  = 
∂2 L ∂2 L
∂x2 ∂x1 ∂x2 ∂x2
0 −2λ1

is positive semidefinite when λ1 6 0.


Setting λ1 = 0 leads to inconsistent values for λ2 , so we must have λ1 < 0, and,
by complementary slackness, z1 = 0. Also by complementary slackness, there are now
two more cases to consider: the one where λ2 < 0 and z2 = 0, and the one where
λ2 = 0.p The former case pleads to a contradiction, the latter to the unique minimum at
x1 = − 2/5 and x2 = 3 2/5.
3 Shadow Prices and Lagrangian Duality

3.1 Shadow Prices


A more intuitive understanding of Lagrange multipliers can be obtained by view-
ing (1.1) as a family of problems parameterized by b ∈ Rm , the right hand side of
the functional constraints. To this end, let φ(b) = inf{f(x) : h(x) = b, x ∈ Rn }. It
turns out that at the optimum, the Lagrange multipliers equal the partial derivatives
of φ with respect to its parameters.

Theorem 3.1. Suppose that f and h are continuously differentiable on Rn , and that
there exist unique functions x∗ : Rm → Rn and λ∗ : Rm → Rm such that for each
b ∈ Rm , h(x∗ (b)) = b, λ∗ (b) 6 0 and f(x∗ (b)) = φ(b) = inf{f(x) − λ∗ (b)T (h(x) − b) :
x ∈ Rn }. If x∗ and λ∗ are continuously differentiable, then

∂φ
(b) = λ∗i (b).
∂bi
Proof. We have that

φ(b) = f(x∗ (b)) − λ∗ (b)T (h(x∗ (b)) − b)


= f(x∗ (b)) − λ∗ (b)T h(x∗ (b)) + λ∗ (b)T b.

Taking partial derivatives of each term,

∂f(x∗ (b)) X ∂f ∗
n
∂x∗j
= (x (b)) (b),
∂bi j=1
∂xj ∂b i

∂λ∗ (b)T h(x∗ (b)) ∂h(x∗ (b)) ∂λ∗ (b)T


= λ∗ (b)T + h(x∗ (b))
∂bi ∂bi ∂bi
X n   ∗ !
∂h ∂x j ∂λ∗ (b)T
= λ∗ (b)T (x∗ (b)) (b) + h(x∗ (b)) ,
j=1
∂xj ∂b i ∂b i

∂λ∗ (b)T b ∂b λ∗ (b)T


= λ∗ (b)T +b .
∂bi ∂bi ∂bi
By summing and re-arranging,

∂φ(b) X
n
∂x∗j
 
∂f ∗ ∂h ∗
= (x (b)) − λ∗ (b)T (x (b)) (b)
∂bi j=1
∂xj ∂xj ∂bi
∂λ∗ (b)T ∂b
− (h(x∗ (b)) − b) + λ∗ (b)T .
∂bi ∂bi

9
10 3 · Shadow Prices and Lagrangian Duality

The first term on the right-hand side is zero, because x∗ (b) minimizes L(x, λ∗ (b)) and
thus
∂L(x∗ (b), λ∗ (b))
 
∂f ∗ ∗ T ∂h ∗
= (x (b)) − λ (b) (x (b)) = 0
∂xj ∂xj ∂xj
for j = 1, . . . , n. The second term is zero as well, because x∗ (b) is feasible and thus
(h(x∗ (b)) − b)k = 0 for k = 1, . . . , m, and the claim follows.
It should be noted that the result also holds when the functional constraints are
inequalities: if the ith constraint does not not hold with equality, then λ∗i = 0 by
complementary slackness, and therefore also ∂λ∗i /∂bi = 0.
The Lagrange multipliers are also known as shadow prices, due to an economic
interpretation of the problem to
maximize f(x)
subject to h(x) 6 b
x ∈ X.
Consider a firm that produces n different goods from m different raw materials. Vector
b ∈ Rm describes the amount of each raw material available to the firm, vector x ∈ Rn
the quantity produced of each good. Functions h : Rn → Rm and f : Rn → R finally
describe the amounts of raw material required to produce, and the profit derived from
producing, particular quantities of the goods. The goal of the above problem thus is to
maximize the profit of the firm for given amounts of raw materials available to it. The
shadow price of raw material i then is the price the firm would be willing to pay per
additional unit of this raw material, which of course should be equal to the additional
∂φ
profit derived from it, i.e., to ∂bi
(b).

3.2 Lagrangian Duality


Another useful concept that arises from Lagrange multipliers is that of a dual problem.
Again consider the optimization problem (1.1) and the Lagrangian (2.1), and define
the (Lagrange) dual function g : Rm → R as the minimum value of the Lagrangian
over X, i.e.,
g(λ) = inf L(x, λ). (3.1)
x∈X
As before, let Y be the set vectors of Lagrange multipliers for which the Lagrangian
has a finite minimum, i.e., Y = {λ ∈ Rm : inf x∈X L(x, λ) > −∞}.
It is easy to see that the maximum value of the dual function provides a lower bound
on the minimum value of the original objective function. This property is known as
weak duality.
Theorem 3.2 (Weak duality). If x ∈ X(b) and λ ∈ Y, then g(λ) 6 f(x), and in
particular,
sup g(λ) 6 inf f(x). (3.2)
λ∈Y x∈X(b)
3.2 · Lagrangian Duality 11

Proof. Let x ∈ X(b) and λ ∈ Y. Then,


g(λ) = inf
0
L(x 0 , λ)
x ∈X

6 L(x, λ)
= f(x) − λT (h(x) − b)
= f(x).
Equality on the first and third line holds by definition of g and L, the inequality on
the second line because x ∈ X. The last equality holds because x ∈ X(b) and therefore
h(x) − b = 0.
In light of this result, it is interesting to choose λ in order to make this lower bound
as large as possible, i.e., to
maximize g(λ)
subject to λ ∈ Y.
This problem is known as the dual problem, and (1.1) is in this context referred to
as the primal problem. If (3.2) holds with equality, i.e., if there exists λ ∈ Y such
that g(λ) = inf x∈X(b) f(x), the problem is said to satisfy strong duality. The cases
where strong duality holds are those that can be solved using the method of Lagrange
multipliers.
Example 3.3. Again consider the minimization problem of Example 2.2, and recall
that Y = {λ ∈ R2 : λ1 = −2, λ2 < 0} and that for each λ ∈ Y the minimum occurred at
x∗ (λ) = (3/(2λ2 ), 1/(2λ2 ), x3 ). Thus,
10
g(λ) = inf L(x, λ) = L(x∗ (λ), λ) = + 4λ2 − 10,
x∈X 4λ2
so the dual problem is to
10
maximize + 4λ2 − 10 subject to λ2 < 0.
4λ2
p
It should not come as a surprise that the maximum is attained for√λ2 = − 5/8, and
that the primal and dual have the same optimal value, namely p −2( 10 + 5). Note that
it is not actually necessary to solve the dual to see that λ2 = − 5/8 is an optimizer, it
suffices that the value of the dual function at this point equals the value of the objective
function of the primal at some point in the feasible set of the primal.
There are several reasons why the dual is interesting. Any feasible solution of the
dual provides a succinct certificate that the optimal solution of the primal is bounded
by a certain value. In particular, a pair of solutions of the primal and dual that yield
the same value must be optimal. If strong duality holds, the optimal value of the primal
can be determined by solving the dual, which in some cases may be easier than solving
the primal. In a later lecture we will express two quantities as the optimal solutions of
a pair of a primal and a dual that satisfy strong duality, thereby showing that the two
quantities are equal.
4 Conditions for Strong Duality
While we have already solved a few optimization problems using the method of La-
grange multipliers, it was not clear a priori whether each individual problem satisfied
strong duality and whether our attempt to solve it would ultimately be successful. Our
goal in this lecture will be to identify general conditions that guarantee strong duality,
and classes of problems that satisfy these conditions.

4.1 Supporting Hyperplanes and Convexity


To this end, we again consider the function φ that describes how the optimal value
behaves as we vary the right-hand side of the constraints. Fix a particular b ∈ Rm
and consider φ(c) as a function of c ∈ Rm . Further consider the hyperplane given by
α : Rm → R with
α(c) = β + λT (c − b).

This hyperplane has intercept β at b and slope λ. We can now try to find φ(b) as
follows:
1. For each λ, find βλ = sup{β : β + λT (c − b) 6 φ(c) for all c ∈ Rm }.
2. Choose λ to maximize βλ .
This approach is illustrated in Figure 4.1. We always have that βλ 6 φ(b). In the
situation on the left of Figure 4.1, this condition holds with equality because there is a
tangent to φ at b. In fact,

g(λ) = inf L(x, λ)


x∈X

= inf f(x) − λT (h(x) − b)



x∈X

= infm inf f(x) − λT (h(x) − b)



c∈R x∈X(c)

= infm φ(c) − λT (c − b)

c∈R

= sup β : β + λT (c − b) 6 φ(c) for all c ∈ Rm
= βλ

We again see the weak duality result as maxλ βλ 6 φ(b), but we also obtain a
condition for strong duality. Call α : Rm → R a supporting hyperplane to φ at b if
α(c) = φ(b) + λT (c − b) and φ(c) > φ(b) + λT (c − b) for all c ∈ Rm .

Theorem 4.1. Problem (1.1) satisfies strong duality if and only if there exists a
(non-vertical) supporting hyperplane to φ at b.

13
14 4 · Conditions for Strong Duality

φ(c)
φ(c)

α(c) = βλ + λT (c − b) α(c) = βλ + λT (c − b)
βλ
βλ

b c b c

Figure 4.1: Geometric interpretation of the dual with optimal value g(λ) = βλ . In the
situation on the left strong duality holds, and βλ = φ(b). In the situation on the right,
strong duality does not hold, and βλ < φ(b).

Proof. Suppose there exists a (non-vertical) supporting hyperplane to φ at b. This


means that there exists λ ∈ Rm such that for all c ∈ Rm ,

φ(b) + λT (c − b) 6 φ(c).

This implies that

φ(b) 6 infm φ(c) − λT (c − b)



c∈R

f(x) − λT (h(x) − b)

= infm inf
c∈R x∈X(c)

= inf L(x, λ)
x∈X

= g(λ).

On the other hand, φ(b) > g(λ) by Theorem 3.2, so φ(b) = g(λ) and strong duality
holds.
Now suppose that the problem satisfies strong duality. Then there exists λ ∈ Rm
such that for all c ∈ Rm

φ(b) = g(λ) = inf L(x, λ)


x∈X

6 inf L(x, λ)
x∈X(c)

= inf (f(x) − λT (h(x) − b))


x∈X(c)

= φ(c) − λT (c − b),

and thus
φ(b) + λT (c − b) 6 φ(c).
This describes a (non-vertical) supporting hyperplane to φ at b.

A sufficient condition for the existence of a supporting hyperplane is provided by


the following basic result, which we state without proof.
4.2 · A Sufficient Condition for Convexity 15

Theorem 4.2 (Supporting Hyperplane Theorem). Suppose that φ : Rm → R is convex


and b ∈ Rm lies in the interior of the set of points where φ is finite. Then there
exists a (non-vertical) supporting hyperplane to φ at b.

4.2 A Sufficient Condition for Convexity


We now know that convexity of φ guarantees strong duality for every constraint vec-
tor b, but it is not clear how to recognize optimization problems that have this property.
The following result identifies a sufficient condition.

Theorem 4.3. Consider the problem to

minimize f(x)
subject to h(x) 6 b
x ∈ X,

and let φ be given by φ(b) = inf x∈X(b) f(x). Then, φ is convex if X, f, and h are
convex.

Proof. Consider b1 , b2 ∈ Rm such that φ(b1 ) and φ(b2 ) are defined, and let δ ∈ [0, 1]
and b = δb1 + (1 − δ)b2 . Further consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x =
δx1 + (1 − δ)x2 . Then convexity of X implies that x ∈ X, and convexity of h that

h(x) = h(δx1 + (1 − δ)x2 )


6 δh(x1 ) + (1 − δ)h(x2 )
6 δb1 + (1 − δ)b2
= b.

Thus x ∈ X(b), and by convexity of f,

φ(b) 6 f(x) = f(δx1 + (1 − δ)x2 ) 6 δf(x1 ) + (1 − δ)f(x2 ).

This holds for all x1 ∈ X(b1 ) and x2 ∈ X(b2 ), so taking infima on the right hand
side yields
φ(b) 6 δφ(b1 ) + (1 − δ)φ(b2 ).

Note that an equality constraint h(x) = b is equivalent to the pair of constraints


h(x) 6 b and −h(x) 6 −b. In this case, the above result requires that X, f, h, and −h
are all convex, which in particular requires that h is linear. Indeed, in the case with
equality constraints, convexity of f and h does not suffice for convexity of φ. To see
this, consider the problem to

minimize f(x) = x2 subject to h(x) = x3 = b


16 4 · Conditions for Strong Duality

for some b > 0. Then φ(b) = b2/3 , which is not convex. The Lagrangian is L(x, λ) =
x2 − λ(x3 − b) = (x2 − λx3 ) + λb, and has a finite minimum if and only if λ = 0. The
dual thus has an optimal value of 0, which is strictly smaller than φ(b) if b > 0.
Linear programs satisfy the conditions, both for equality and inequality constraints.
We thus have the following.

Theorem 4.4. If a linear program is feasible and bounded, it satisfies strong du-
ality.
5 Solutions of Linear Programs
In the remaining lectures, we will concentrate on linear programs. We begin by studying
the special structure of the feasible set and the objective function in this case, and how
it affects the set of optimal solutions.

5.1 Basic Solutions


In the LP of Example 1.1, the optimal solution happened to lie at an extreme point of
the feasible set. This was not a coincidence. Consider an LP in general form,

maximize cT x subject to Ax 6 b, x > 0. (5.1)

The feasible set of this LP is a convex polytope in Rn , i.e., an intersection of half-spaces.


Each level set of the objective function cT x, i.e., each set Lα = {x ∈ Rn : cT x = α} of
points for which the value of the objective function is equal to some constant α ∈ R, is
a k-dimensional flat for some k 6 n. The goal is to find the largest value of α for which
Lα (f) intersects with the feasible set. If such a value exists, the intersection contains
either a single point or an infinite number of points, and it is guaranteed to contain an
extreme point of the feasible set. This fact is illustrated in Figure 5.1, and we will give
a proof momentarily.
Formally, x ∈ S is an extreme point of a convex set S if it cannot be written as a
convex combination of two distinct points in S, i.e., if for all y, z ∈ S and δ ∈ (0, 1),
x = δy+(1−δ)z implies that x = y = z. Since this geometric characterization of extreme
points is hard to work with, we consider an alternative, algebraic characterization. To
this end, consider the following LP in standard form, which can be obtained from (5.1)
by introducing slack variables:

maximize cT x subject to Ax = b, x > 0, (5.2)

where A ∈ Rm×n and b ∈ Rm . Call a solution x ∈ Rn of the equation Ax = b basic


if at most m of its entries are non-zero, i.e., if there exists a set B ⊆ {1, . . . , n} with
|B| = m such that xi = 0 if i ∈/ B. The set B is then called basis, and variable xi is
called basic if i ∈ B and non-basic if i ∈
/ B. A basic solution x that also satisfies x > 0
is a basic feasible solution (BFS) of (5.2).
We will henceforth make the following assumptions:
(i) the rows of A are linearly independent,
(ii) every set of m columns of A are linearly independent, and
(iii) every basic solution is non-degenerate, i.e., has exactly m non-zero variables.

17
18 5 · Solutions of Linear Programs

f(x) = α∗

f(x) = α f(x) = α∗ f(x) = α

Figure 5.1: Illustration of linear programs with one optimal solution (left) and an
infinite number of optimal solutions (right)

Assumptions (i) and (ii) are without loss of generality: if a set of rows are linearly
dependent, one of the corresponding constraints can be removed without changing the
feasible set; similarly, if a set of columns are linearly dependent, one of the correspond-
ing variables can be removed. Extra care needs to be taken to handle degeneracies, but
this is beyond the scope of this course.
If the above assumptions are satisfied, setting any subset of n − m variables to zero
uniquely determines the value of the remaining, basic variables. Computing the set of
basic feasible solutions is thus straightforward.

Example 5.1. Again consider the LP of Example 1.1. By adding slack variables x3 > 0
and x4 > 0, the functional constraint can be written as

 
x1
!  !
1  x2 
2 1 0 = 6
 .
1 −1 0 1  x3  3
 

x4

4

The problem has the following six basic solutions corresponding to the 2
possible
ways to choose a basis, which are labeled A through F in Figure 1.1:

x1 x2 x3 x4 f(x)
A 0 0 6 3 0
B 0 3 0 6 3
C 4 1 0 0 5
D 3 0 3 0 3
E 6 0 0 −3 6
F 0 −3 12 0 −3
5.2 · Extreme Points and Optimal Solutions 19

5.2 Extreme Points and Optimal Solutions


It turns out that the basic feasible solutions are precisely the extreme points of the
feasible set.

Theorem 5.2. A vector is a basic feasible solution of Ax = b if and only if it is


an extreme point of the set X(b) = {x : Ax = b, x > 0}.

Proof. Consider a BFS x and suppose that x = δy + (1 − δ)z for y, z ∈ X(b) and
δ ∈ (0, 1). Since y > 0 and z > 0, x = δy + (1 − δ)z implies that yi = zi whenever
xi = 0. By (iii), y and z are basic solutions with the same basis, i.e., both have exactly
m non-zero entries, which occur in the same rows. Moreover, Ay = b = Az and thus
A(y − z) = 0. This yields a linear combination of m columns of A that is equal to zero,
which by (ii) implies that y = z. Thus x is an extreme point of X(b).
Now consider a feasible solution x ∈ X(b) that is not a BFS. Let i1 , . . . , ir be the
rows of x that are non-zero, and observe that r > m. This means that the columns
ai1 , . . . , air , where ai = (a1i , . . . , ami )T , have to be linearly dependent, i.e., there has to
exist a collection of r non-zero numbers yi1 , . . . , yir such that yi1 ai1 + · · · + yir air = 0.
Extending y to a vector in Rn by setting yi = 0 if i ∈ / {i1 , . . . , ir }, we have Ay =
i1 ir
yi1 a + · · · + yir a and thus A(x ± y) = b for every  ∈ R. By choosing  > 0 small
enough, x ± y > 0 and thus x ± y ∈ X(b). Moreover x = 1/2(x − y) + 1/2(x + y),
so x is not an extreme point of X(b).

We are now ready to show that an optimum occurs at an extreme point of the
feasible set.

Theorem 5.3. If the linear program (5.2) is feasible and bounded, then it has an
optimal solution that is a basic feasible solution.

Proof. Let x be an optimal solution of (5.2). If x has exactly m non-zero entries, then
it is a BFS and we are done. So suppose that x has r non-zero entries for r > m, and
that it is not an extreme point of X(b), i.e., that x = δy + (1 − δ)z for y, z ∈ X(b)
with y 6= z and δ ∈ (0, 1). We will show that there must exist an optimal solution with
strictly fewer than r non-zero entries; the claim then follows by induction.
Since cT x > cT y and cT x > cT z by optimality of x, and since cT x = δcT y+(1−δ)cT z,
we must have that cT x = cT y = cT z, so y and z are optimal as well. As in the proof
of Theorem 5.2, xi = 0 implies that yi = zi = 0, so y and z have at most r non-zero
entries, which must occur in the same rows as in x. If y or z has strictly fewer than
r non-zero entries, we are done. Otherwise let x 0 = δ 0 y + (1 − δ 0 )z = z + δ 0 (y − z),
and observe that x 0 is optimal for every δ 0 ∈ R. Moreover, y − z 6= 0, and all non-zero
entries of y − z occur in rows where x is non-zero as well. We can thus choose δ 0 ∈ R
such that x 0 > 0 and such that x 0 has strictly fewer than r non-zero entries.

The result can in fact be extended to show that the maximum of a convex function
f over a compact convex set X occurs at an extreme point of X. In this case any
20 5 · Solutions of Linear Programs

P
point x ∈ X can be written as a convex combination x = ki=1 δi xi of extreme points
P
x1 , . . . , xk ∈ X, where δ ∈ Rk>0 and ki=1 δi = 1. Convexity of f then implies that

X
k
f(x) 6 δi f(xi ) 6 max f(xi ).
16i6k
i=1

5.3 A Naive Approach to Solving Linear Programs


Since there are only finitely many basic solutions, a naive approach to solving an LP
would be to go over all basic solutions and pick one that optimizes the objective. The
problem with this approach is that it would not in general be efficient, as the number
of basic solutions may grow exponentially in the number of variables. By contrast,
a large body of work on the theory of computational complexity typically associates
efficient computation with methods that for every problem instance can be executed
in a number of steps that is at most polynomial in the size of that instance.
In one of the following lectures we will study a well-known method for solving linear
programs, the so-called simplex method, which explores the set of basic solutions in
a more organized way. It is usually very efficient in practice, but may still require
an exponential number of steps for some contrived instances. In fact, no approach
is currently know that solves linear programs by inspecting only the boundary of the
feasible set and is efficient for every conceivable instance of the problem. There are,
however, so-called interior-point method that traverse the interior of the feasible set in
search of an optimal solution and are very efficient both in theory and in practice.
6 Linear Programming Duality
Consider the linear program (1.2) and introduce slack variables z to turn it into

min { cT x : Ax − z = b, x, z > 0 }.

We have X = {(x, z) : x > 0, z > 0} ⊆ Rm+n . The Lagrangian is given by

L((x, z), λ) = cT x − λT (Ax − z − b) = (cT − λT A)x + λT z + λT b

and has a finite minimum over X if and only if

λ ∈ Y = { µ : cT − µT A > 0, µ > 0 }.

For λ ∈ Y, the minimum of L((x, z), λ) is attained when both (cT − λT A)x = 0 and
λT z = 0, and thus
g(λ) = inf L((x, z), λ) = λT b.
(x,z)∈X

We obtain the dual


max { bT λ : AT λ 6 c, λ > 0 }. (6.1)
The dual of (1.3) can be determined analogously as

max { bT λ : AT λ 6 c }.

The dual is itself a linear program, and its dual is in fact equivalent to the primal.
Theorem 6.1. In the case of linear programming, the dual of the dual is the primal.
Proof. The dual can be written equivalently as

min { −bT λ : −AT λ > −c, λ > 0 }.

This problem has the same form as the primal (1.2), with −b taking the role of c, −c
taking the role of b, and −AT the role of A. Taking the dual again we thus return to
the original problem.

6.1 The Relationship between Primal and Dual


Example 6.2. Consider the following pair of a primal and dual LP, with slack variables
z1 and z2 for the primal and µ1 and µ2 for the dual.

maximize 3x1 + 2x2 minimize 4λ1 + 6λ2


subject to 2x1 + x2 + z1 = 4 subject to 2λ1 + 2λ2 − µ1 = 3
2x1 + 3x2 + z2 = 6 λ1 + 3λ2 − µ2 = 2
x1 , x2 , z1 , z2 > 0 λ1 , λ2 , µ 1 , µ 2 > 0

21
22 6 · Linear Programming Duality

x1 = 0 λ1 = 0

2x1 + 3x2 = 6

E x1 − x2 = 3 C
D
x2 = 0 E λ2 = 0
C D
A B A B F
2x1 + x2 = 4 x1 + 2x2 = 6

Figure 6.1: Geometric interpretation of primal and dual linear programs in Example 6.2

To see that these LPs are indeed dual to each other, observe that the primal has the
form (1.2), and the dual the form (6.1), with

! ! !
3 2 1 4
c=− , A=− , b=− .
2 2 3 6

As before, we can compute all basic solutions of the primal by setting any set of
n − m = 2 variables to zero in turn, and solving for the values of the remaining m = 2
variables. Given a particular basic solution of the primal, the corresponding dual
solution can be found using the complementary slackness conditions λ1 z1 = 0 = λ2 z2
and µ1 x1 = 0 = µ2 x2 . These conditions identify, for each non-zero variable of the
primal, a dual variable whose value has to be equal to zero. By solving for the remaining
variables, we obtain a solution for the dual, which is in fact a basic solution. Repeating
this procedure for every basic solution of the primal, we obtain the following pairs of
basic solutions of the primal and dual:

x1 x2 z1 z2 f(x) λ1 λ2 µ1 µ2 g(λ)
A 0 0 4 6 0 0 0 −3 −2 0
3
B 2 0 0 2 6 2
0 0 − 12 6
3 5
C 3 0 −2 0 9 0 2
0 2
9
3 13 5 1 13
D 2
1 0 0 2 4 4
0 0 2
2
E 0 2 2 0 4 0 3
− 35 0 4
F 0 4 0 −6 8 2 0 1 0 8

Labels A through F refer to Figure 6.2, which illustrates the feasible regions of the
primal and the dual. Observe that there is only one pair such that both the primal and
the dual solution are feasible, the one labeled D, and that these solutions are optimal.
6.2 · Necessary and Sufficient Conditions for Optimality 23

6.2 Necessary and Sufficient Conditions for Optimality


In the above example, primal feasibility, dual feasibility, and complementary slackness
together imply optimality. It turns out that this is true in general, and the condition
is in fact both necessary and sufficient for optimality.

Theorem 6.3. Let x and λ be feasible solutions for the primal (1.2) and the
dual (6.1), respectively. Then x and λ are optimal if and only if they satisfy
complementary slackness, i.e., if

(cT − λT A)x = 0 and λT (Ax − b) = 0.

Proof. If x and λ are optimal, then

cT x = λ T b
cT x 0 − λT (Ax 0 − b)

= inf
0
x ∈X

6 cT x − λT (Ax − b)
6 cT x.

Since the first and last term are the same, the two inequalities must hold with equality.
Therefore, λT b = cT x − λT (Ax − b) = (cT − λT A)x + λT b, and thus (cT − λT A)x = 0.
Furthermore, cT x − λT (Ax − b) = cT x, and thus λT (Ax − b) = 0.
If on the other hand (cT − λT A)x = 0 and λT (Ax − b) = 0, then

cT x = cT x − λT (Ax − b) = (cT − λT A)x + λT b = λT b,

and by weak duality x and λ must be optimal.

While the result has been formulated here for the primal LP in general form and
the corresponding dual, it is true, with the appropriate complementary slackness con-
ditions, for any pair of a primal and dual LP.
7 The Simplex Method
Let A ∈ Rm×n , b ∈ Rm . Further let B be a basis, i.e., a set B ⊆ {1, . . . , n} with |B| = m,
corresponding to a choice of m non-zero variables. Let x ∈ Rn such that Ax = b. Then
we have
AB xB + AN xN = b,
where AB ∈ Rm×m and AN ∈ Rm×(n−m) respectively consist of the columns of A
indexed by B and those not indexed by B, and xB and xN respectively consist of the
rows of x indexed by B and those not indexed by B. Moreover, if x is a basic solution,
then there is a basis B such that xN = 0 and AB xB = b, and if x is a basic feasible
solution, there is a basis B such that xN = 0, AB xB = b, and xB > 0.

7.1 The Simplex Tableau


For every x with Ax = b and every basis B, we have that xB = A−1
B (b − AN xN ), and
thus
f(x) = cT x = cTB xB + cTN xN
= cTB A−1 T
B (b − AN xN ) + cN xN
= cTB A−1 T T −1
B b + (cN − cB AB AN )xN

Suppose that we want to maximize cT x and find that

cTN − cTB A−1


B AN 6 0 and A−1
B b > 0. (7.1)

Then, for any feasible x ∈ Rn , it holds that xN > 0 and therefore f(x) 6 cTB A−1 B b.
∗ ∗ −1 ∗
The basic solution x with xB = AB b and xN = 0, on the other hand, is feasible and
satisfies f(x∗ ) = cTB A−1
B b. It must therefore be optimal.
If alternatively (cTN − cTB A−1
B AN )i > 0 for some i, then we can increase the value of
the objective by increasing (xN )i . Either this can be done indefinitely, which means
that the maximum is unbounded, or the constraints force some of the variables in the
basis to become smaller and we have to stop when the first one reaches zero. In that
case we have found a new BFS with a larger value and can repeat the process.
Assuming that the LP is feasible and has a bounded optimal solution, there exists
a basis B∗ for which (7.1) is satisfied. The basic idea behind the simplex method is to
start from an initial BFS and then move from basis to basis until B∗ is found. The in-
formation required for this procedure can conveniently be represented by the so-called
simplex tableau. For a given basis B, it takes the following form:1
1
The columns of the tableau have been permuted such that those corresponding to the basis appear
on the left. This has been done just for convenience: in practice we will always be able to identify the
columns corresponding to the basis by the embedded identity matrix.

25
26 7 · The Simplex Method

m n−m 1
z }| {z }| {z }| {
B N

m A−1
B AB = I A−1
B AN A−1
B b

1 cTB − cTB A−1
B AB = 0 cTN − cTB A−1
B AN −cTB A−1
B b

The first m rows consist of the matrix A and the column vector b, multiplied by the
inverse of AB . It is worth pointing out that for any basis B, the LP with constraints
A−1 −1
B Ax = AB b is equivalent to the one with constraints Ax = b. The first n columns
of the last row are equal to cT − λT A for λT = cTB A−1
B . The vector λ can be interpreted
as a solution, not necessarily feasible, to the dual problem. In the last column of the
last row we finally have the value −f(x), where x is the BFS given by xB = A−1 B b and
xN = 0.
We will see later that the simplex method always maintains feasibility of this so-
lution x. As a consequence it also maintains complementary slackness for x and
λT = cTB A−1 T
B : since we work with an LP in standard form, λ (Ax − b) = 0 follows
automatically from the feasibility condition, Ax = b; the condition (cT − λT A)x = 0
holds because xN = 0 and cTB − λT AB = cTB − cTB A−1 B AB = 0. What it then means
T T
for (7.1) to become satisfied is that c − λ A 6 0, i.e., that λ is a feasible solution for
the dual. Optimality of x is thus actually a consequence of Theorem 6.3.

7.2 Using The Tableau


Consider a tableau of the following form, where the basis can be identified by the
identity matrix embedded in (aij ):

(aij ) ai0

a0j a00

The simplex method then takes the following steps:


1. Find an initial BFS with basis B.
2. Check whether a0j 6 0 for every j. If yes, the current solution is optimal, so stop.
3. Choose j such that a0j > 0, and choose i ∈ {i 0 : ai 0 j > 0} to minimize ai0 /aij .
If aij 6 0 for all i, then the problem is unbounded, so stop. If multiple rows
minimize ai0 /aij , the problem is degenerate.
4. Update the tableau by multiplying row i by 1/aij and adding a −(akj /aij ) mul-
tiple of row i to each row k 6= i. Then return to Step 2.
7.2 · Using The Tableau 27

We will now describe the different steps of the simplex method in more detail and
illustrate them using the LP of Example 1.1.

Finding an initial BFS


Finding an initial BFS is very easy when the constraints are of the form Ax 6 b for
b > 0. We can then add a vector z of slack variables and write the constraints as
Ax + z = b, z > 0 and get a BFS by setting x = 0 and z = b. This can alternatively
be thought of as extending x to (x, z) and setting (xB , xN ) = (z, x) = (b, 0). We then
have A−1
B = I and cB = 0, and the entries in the tableau become AN and cN for the
T

variables x1 and x2 that are not in the basis, and b and 0 in the last column. For the
LP of Example 1.1 we obtain the following tableau, where rows and columns have been
labeled with the names of the corresponding variables:

x1 x2 z1 z2 ai0
z1 1 2 1 0 6
z2 1 −1 0 1 3
a0j 1 1 0 0 0

If the constraints do not have this convenient form, finding an initial BFS requires
more work. We will discuss this case in the next lecture.

Choosing a pivot column


If a0j 6 0 for all j > 1, the current solution is optimal. Otherwise we can choose a
column j such that a0j > 0 as the pivot column and let the corresponding variable enter
the basis. If multiple candidate columns exist, choosing any one of them will work, but
we could for example break ties toward the one that maximizes a0j or the one with
the smallest index. The candidate variables in our example are x1 and x2 , so let us
choose x1 .

Choosing the pivot row


If aij 6 0 for all i, then the problem is unbounded and the objective can be increased
by an arbitrary amount. Otherwise we choose a row i ∈ {i 0 : ai 0 j > 0} that minimizes
ai0 /aij . This row is called the pivot row, and aij is called the pivot. If multiple rows
minimize ai0 /aij , the problem has a degenerate BFS. In our example there is a unique
choice, corresponding to variable z2 .

Pivoting
The purpose of the pivoting step is to get the tableau into the appropriate form for the
new BFS. For this, we multiply row i by 1/aij and add a −(akj /aij ) multiple of row i
28 7 · The Simplex Method

to each row k 6= i, including the last one. Our choice of the pivot row as a row that
minimizes ai0 /aij turns out to be crucial, as it guarantees that the solution remains
feasible after pivoting. In our example, we need to subtract the second row from both
the first and the last row, after which the tableau looks as follows:

x1 x2 z1 z2 ai0
z1 0 3 1 −1 3
x1 1 −1 0 1 3
a0j 0 2 0 −1 −3

Note that the second row now corresponds to variable x1 , which has replaced z2 in the
basis.
We are now ready to choose a new pivot column. In our example, one further
iteration yields the following tableau:

x1 x2 z1 z2 ai0
1
x2 0 1 3
− 13 1
1 2
x1 1 0 3 3
4
2
a0j 0 0 −3 − 13 −5

This corresponds to the BFS where x1 = 4, x2 = 1, and z1 = z2 = 0, with an objective


of −5. All entries in the last row are non-positive, so this solution is optimal.
8 The Two-Phase Simplex Method
The LP we solved in the previous lecture allowed us to find an initial BFS very easily.
In cases where such an obvious candidate for an initial BFS does not exist, we can solve
a different LP to find an initial BFS. We will refer to this as phase I. In phase II we
then proceed as in the previous lecture.
Consider the LP to
minimize 6x1 + 3x2
subject to x1 + x2 > 1
2x1 − x2 > 1
3x2 62
x1 , x2 > 0.
We change from minimization to maximization and introduce slack variables to obtain
the following equivalent problem:

maximize −6x1 − 3x2


subject to x1 + x2 − z1 =1
2x1 − x2 − z2 =1
3x2 + z3 =2
x1 , x2 , z1 , z2 , z3 > 0.

Unfortunately, the basic solution with x1 = x2 = 0, z1 = z2 = −1, and z3 = 2 is


not feasible. We can, however, add an artificial variable to the left-hand side of each
constraint where the slack variable and the right-hand side have opposite signs, and
then minimize the sum of the artificial variables starting from the obvious BFS where
the artificial variables are non-zero instead of the corresponding slack variables. In the
example, we
minimize y1 + y2
subject to x1 + x2 − z1 + y1 =1
2x1 − x2 − z2 + y2 =1
3x2 + z3 =2
x1 , x2 , z1 , z2 , z3 , y1 , y2 > 0,
and the goal of phase I is to solve this LP starting from the BFS where x1 = x2 = z1 =
z2 = 0, y1 = y2 = 1, and z3 = 2. If the original problem is feasible, we will be able
to find a BFS where y1 = y2 = 0. This automatically gives us an initial BFS for the
original problem.
In summary, the two-phase simplex method proceeds as follows:

29
30 8 · The Two-Phase Simplex Method

1. Bring the constraints into equality form. For each constraint in which the slack
variable and the right-hand side have opposite signs, or in which there is no slack
variable, add a new artificial variable that has the same sign as the right-hand
side.
2. Phase I: minimize the sum of the artificial variables, starting from the BFS where
the absolute value of the artificial variable for each constraint, or of the slack
variable in case there is no artificial variable, is equal to that of the right-hand
side.
3. If some artificial variable has a positive value in the optimal solution, the original
problem is infeasible; stop.
4. Phase II: solve the original problem, starting from the BFS found in phase I.

While the original objective is not needed for phase I, it is useful to carry it along
as an extra row in the tableau, because it will then be in the appropriate form at
the beginning of phase II. In the example, phase I therefore starts with the following
tableau:

x1 x2 z1 z2 z3 y1 y2
y1 1 1 −1 0 0 1 0 1
y2 2 −1 0 −1 0 0 1 1
z3 0 3 0 0 1 0 0 2
II −6 −3 0 0 0 0 0 0
I 3 0 −1 −1 0 0 0 2

Note that the objective for phase I is written in terms of the non-basic variables. This
can be achieved by first writing it in terms of y1 and y2 , such that we have −1 in the
columns for y1 and y2 and 0 in all other columns because we are maximizing −y1 −y2 ,
and then adding the first and second row to make the entries for all variables in the
basis equal to zero.
Phase I now proceeds by pivoting on a21 to get

x1 x2 z1 z2 z3 y1 y2
3 1
0 2
−1 2
0 1 − 12 1
2
1 − 12 0 − 12 0 0 1
2
1
2
0 3 0 0 1 0 0 2
II 0 −6 0 −3 0 0 3 3
3 1
I 0 2
−1 2
0 0 − 32 1
2
31

and on a14 to get


x1 x2 z1 z2 z3 y1 y2
0 3 −2 1 0 2 −1 1
1 1 −1 0 0 1 0 1
0 3 0 0 1 0 0 2
II 0 3 −6 0 0 6 0 6
I 0 0 0 0 0 −1 −1 0

Note that we could have chosen a12 as the pivot element in the second step, and would
have obtained the same result.
This ends phase I as y1 = y2 = 0, and we have found a BFS for the original problem
with x1 = z2 = 1, z3 = 2, and x2 = z1 = 0. After dropping the columns for y1 and y2
and the row corresponding to the objective for phase I, the tableau is in the right form
for phase II:
x1 x2 z1 z2 z3
0 3 −2 1 0 1
1 1 −1 0 0 1
0 3 0 0 1 2
0 3 −6 0 0 6
By pivoting on a12 we obtain the following tableau, corresponding to an optimal solu-
tion of the original problem with x1 = 2/3, x2 = 1/3, and value −5:

x1 x2 z1 z2 z3
0 1 − 23 1
3
0 1
3
1 0 − 13 − 13 0 2
3
0 0 2 −1 1 1
0 0 −4 −1 0 5

It is worth noting that the problem we have just solved is the dual of the LP in
Example 1.1, which we solved in the previous lecture, augmented by the constraint
3x2 6 2. Ignoring the column and row corresponding to z3 , the slack variable for this
new constraint, the final tableau is essentially the negative of the transpose of the final
tableau we obtained in the previous lecture. This makes sense because the additional
constraint is not tight in the optimal solution, as we can see from the fact that z3 6= 0.
9 Non-Cooperative Games
The theory of non-cooperative games studies situations in which multiple self-interested
entities, or players, simultaneously and independently optimize different objectives and
outcomes must therefore be self-enforcing.

9.1 Games and Solutions


The central object of study in non-cooperative game theory are normal-form games.
We restrict our attention to two-player games, but note that most concepts extend in a
straightforward way to games with more than two players. A two-player game with m
actions for player 1 and n actions for player 2 can be represented by a pair of matrices
P, Q ∈ Rm×n , where pij and qij are the payoffs of players 1 and 2 when player 1 plays
action i and player 2 plays action j. Two-player games are therefore sometimes referred
to as bimatrix games, and players 1 and 2 as the row and column player, respectively.
We will assume that players can choose their actions randomly and denote the set
of possible strategies of the two players by X and Y, respectively, i.e., X = {x ∈ Rm >0 :
Pm Pn
i=1 xi = 1} and Y = {y ∈ R>0 :
n
i=1 yi = 1}. A pure strategy is a strategy that
chooses some action with probability one, and we make no distinction between pure
strategies and the corresponding actions. A profile (x, y) ∈ X × Y of strategies induces
a lottery over outcomes, and we write p(x, y) = xT Py for the expected payoff of the
row player in this lottery.
Consider for example the well-known prisoner’s dilemma, involving two suspects
accused of a crime who are being interrogated separately. If both remain silent, they
walk free after spending a few weeks in pretrial detention. If one of them testifies against
the other and the other remains silent, the former is released immediately while the
latter is sentenced to ten years in jail. If both suspects testify, each of them receives a
five-year sentence. A representation of this situation as a two-player normal-form game
is shown in Figure 9.1.
It is easy to see what the players in this game should do, because action T yields a
strictly larger payoff than action S for every action of the respective other player. More
generally, for two strategies x, x 0 ∈ X of the row player, x is said to (strictly) dominate
x 0 if for every strategy y ∈ Y of the column player, p(x, y) > p(x 0 , y). Dominance
for the column player is defined analogously. Strategy profile (T, T ) in the prisoner’s
dilemma is what is called a dominant strategy equilibrium, a profile of strategies that
dominate every other strategy of the respective player. The source of the dilemma is
that outcome resulting from (T, T ) is strictly worse for both players than the outcome
resulting from (S, S). More generally, an outcome that is weakly preferred to another
outcome by all players, and strictly preferred by at least one player is said to Pareto

33
34 9 · Non-Cooperative Games

S T

S (2, 2) (0, 3)
T (3, 0) (1, 1)

Figure 9.1: Representation of the prisoner’s dilemma as a normal-form game. The


matrices P and Q are displayed as a single matrix with entries (pij , qij ), and players 1
and 2 respectively choose a row and a column of this matrix. Action S corresponds to
remaining silent, action T to testifying.

C D

C (2, 2) (1, 3)
D (3, 1) (0, 0)

Figure 9.2: The game of chicken, where players can chicken out or dare

dominate that outcome. An outcome that is Pareto dominated is clearly undesirable.


In the absence of dominant strategies, it is less obvious how players should proceed.
Consider for example the game of chicken illustrated in Figure 9.2. This game has its
origins in a situation where two cars drive towards each other on a collision course.
Unless one of the drivers yields, both may die in a crash. If one of them yields while
the other goes straight, however, the former will be called a “chicken,” or coward. It is
easily verified that this game does not have any dominant strategies.
The most cautious choice in a situation like this would be to ignore that the other
player is self-interested and choose a strategy that maximizes the payoff in the worst
case, taken over all of the other player’s strategies. A strategy of this type is known
as a maximin strategy, and the payoff thus obtained as the player’s security level. It
is easy to see that it suffices to maximize the minimum payoff over all pure strategies
P
of the other player, i.e., to choose x such that minj∈{1,...,n} m i=1 xi pij is as large as
possible. Maximization of this minimum can be achieved by maximizing a lower bound
that holds for all j = 1, . . . , n, so a maximin strategy and the security level for the row
player can be found as a solution of the following linear program with variables v ∈ R
and x ∈ Rm :
maximize v
X m
subject to xi pij > v for j = 1, . . . , n
i=1 (9.1)
X m
xi = 1
i=1

x > 0.

The unique maximin strategy in the game of chicken is to yield, for a security level
9.2 · The Minimax Theorem 35

of 1. This also illustrates that a maximin strategy need not be optimal: assuming that
the row player yields, the optimal action for the column player is in fact to go straight.
Formally, strategy x ∈ X of the row player is a best response to strategy y ∈ Y of the
column player if for all x 0 ∈ X, p(x, y) > p(x 0 , y). The concept of a best response for
the column player is defined analogously. A pair of strategies (x, y) ∈ X × Y such that x
is a best response to y and y is a best response to x is called an equilibrium. Equilibria
are also known as Nash equilibria, because their universal existence was shown by John
Nash.

Theorem 9.1 (Nash, 1951). Every bimatrix game has an equilibrium.

It is easily verified that both (C, D) and (D, C) are equilibria of the game of chicken,
and there is one more equilibrium, in which both players randomize uniformly between
their two actions. The proof of Theorem 9.1 is beyond the scope of this course, but
we show the result for the special case when the players have diametrically opposed
interests.

9.2 The Minimax Theorem


A two-player game is called zero-sum game if qij = −pij for all i = 1, . . . , m and
j = 1, . . . , n. A game of this type is sometimes called a matrix game, because it can be
represented just by the matrix P containing the payoffs of the row player. Assuming
invariance of utilities under positive affine transformations, results for zero-sum games
in fact apply to the larger class of constant-sum games, in which the payoffs of the two
players always sum up to the same constant. For games with more than two players,
these properties are far less interesting, as one can always add an extra player who
“absorbs” the payoffs of the others.
It turns out that in zero-sum games, maximin strategies are optimal.
P
Theorem 9.2 (von Neumann, 1928). Let P ∈ Rm×n , X = {x ∈ Rm : m i=1 xi = 1},
Pn >0
Y = {y ∈ R>0 : i=1 yi = 1}. Then,
n

max min p(x, y) = min max p(x, y).


x∈X y∈Y y∈Y x∈X

Proof. Again consider the linear program (9.1), and recall that the optimum value
of this linear program is equal to maxx∈X miny∈Y p(x, y). By adding a slack variable
z ∈ Rn with z > 0 we obtain the Lagrangian

X
n X
m  X
m 
L(v, x, z, w, y) = v + yj xi pij − zj − v − w xi − 1
j=1 i=1 i=1
 X
n  X
m X
n  X
n
= 1− yj v + pij yj − w xi − yj zj + w,
j=1 i=1 j=1 j=1
36 9 · Non-Cooperative Games

where w ∈ R and y ∈ Rn . The Lagrangian has a finite maximum for v ∈ R and x ∈ Rm


P Pn
with x > 0 if and only if n j=1 yj = 1, j=1 pij yj 6 w for i = 1, . . . , m, and y > 0. In
the dual of (9.1) we therefore want to

minimize w
Xn
subject to pij yj 6 w for i = 1, . . . , m
j=1
X
n
yj = 1
j=1

y > 0.

It is easy to see that the optimum value of the dual is miny∈Y maxx∈X p(x, y), and the
theorem follows from strong duality.

The number maxx∈X miny∈Y p(x, y) = miny∈Y maxx∈X p(x, y) is also called the
value of the matrix game with payoff matrix P.
It is now easy to show that every matrix game has an equilibrium, and that the
above result in fact characterizes the set of equilibria of such games.

Theorem 9.3. A pair of strategies (x, y) ∈ X × Y is an equilibrium of the matrix


game with payoff matrix P if and only if

min p(x, y 0 ) = max min p(x 0 , y 0 ) and


y 0 ∈Y 0 0
x ∈X y ∈Y

max
0
p(x 0 , y) = min
0
max
0
p(x 0 , y 0 ).
x ∈X y ∈Y x ∈X
10 The Minimum-Cost Flow Problem
The remaining lectures will be concerned with optimization problems on networks, in
particular with flow problems.

10.1 Networks
A directed graph, or network, G = (V, E) consists of a set V of vertices and a set
E ⊆ V × V of edges. When the relation E is symmetric, G is called an undirected
graph, and we can write edges as unordered pairs {u, v} ∈ E for u, v ∈ V. The degree
of vertex u ∈ V in graph G is the number |{v ∈ V : (u, v) ∈ E or (v, u) ∈ E}| of other
vertices connected to it by an edge. A walk from u ∈ V to w ∈ V is a sequence of
vertices v1 , . . . , vk ∈ V such that v1 = u, vk = w, and (vi , vi+1 ) ∈ E for i = 1, . . . , k − 1.
In a directed graph, we can also consider an undirected walk where (vi , vi+1 ) ∈ E or
(vi+1 , vi ) ∈ E for i = 1, . . . , k − 1. A walk is a path if v1 , . . . , vk are pairwise distinct,
and a cycle if v1 , . . . , vk−1 are pairwise distinct and vk = v1 . A graph that does not
contain any cycles is called acyclic. A graph is called connected if for every pair of
vertices u, v ∈ V there is an undirected path from u to v. A tree is a graph that is
connected and acyclic. A graph G 0 = (V 0 , E 0 ) is a subgraph of graph G = (V, E) if
V 0 ⊆ V and E 0 ⊆ E. In the special case where G 0 is a tree and V 0 = V, it is called a
spanning tree of G.

10.2 Minimum-Cost Flows


Consider a network G = (V, E) with |V| = n, and let b ∈ Rn . Here, bi denotes the
amount of flow that enters or leaves the network at vertex i ∈ V. If bi > 0, we say
that i is a source supplying bi units of flow. If bi < 0, we say that i is a sink with a
demand of |bi | units of flow. Further let C, M, M ∈ Rn×n , where cij denotes the cost
associated with one unit of flow on edge (i, j) ∈ E, and mij and mij respectively denote
lower and upper bounds on the flow across this edge. The minimum-cost flow problem
then asks for flows xij that conserve the flow at each vertex, respect the upper and
lower bounds, and minimize the overall cost. Formally, x ∈ Rn×n is a minimum-cost
flow of G if it is an optimal solution of the following optimization problem:
X
minimize cij xij
(i,j)∈E
X X
subject to bi + xji = xij for all i ∈ V,
j:(j,i)∈E j:(i,j)∈E

mij 6 xij 6 mij for all (i, j) ∈ E.

37
38 10 · The Minimum-Cost Flow Problem

The minimum-cost flow problem is a linear programming problem, with constraints of


the form Ax = b where


 kth edge starts at vertex i,
 1
aik = −1 kth edge ends at vertex i,


 0 otherwise.

P
Note that i∈V bi = 0 is required for feasibility, and that a problem satisfying this
condition can be transformed into an equivalent problem where bi = 0 for all i by
introducing an additional vertex, and new edges from each sink to the new vertex and
from the new vertex to each of the sources with upper and lower bounds equal to the
flow that should enter the sources and leave the sinks. The latter problem is known
as a circulation problem, because flow does not enter or leave the network but merely
circulates. We can further assume without loss of generality that the network G is
connected. Otherwise the problem can be decomposed into several smaller problems
that can be solved independently.
An important special case is that of uncapacitated flow problems, where mij = 0
and mij = ∞ for all (i, j) ∈ E. Clearly, an uncapacitated flow problem is either
unbounded, or has an equivalent problem with finite capacities.

10.3 Sufficient Conditions for Optimality


The Lagrangian of the minimum-cost circulation problem is
 
X X X X X
L(x, λ) = cij xij − λi  xij − xji  = (cij − λi + λj )xij .
(i,j)∈E i∈V j:(i,j)∈E j:(j,i)∈E (i,j)∈E

If the Lagrangian is minimized subject to the regional constraints mij 6 xij 6 mij for
(i, j) ∈ E, Theorem 2.1 yields a set of conditions that are sufficient for optimality. It
will be instructive to prove this result directly.

Theorem 10.1. Consider a feasible flow x ∈ Rn×n for a circulation problem, and
let λ ∈ Rn such that

cij − λi + λj > 0 implies xij = mij ,


cij − λi + λj < 0 implies xij = mij , and
mij < xij < mij implies cij − λi + λj = 0.

Then x is optimal.
10.4 · The Transportation Problem 39

Proof. For (i, j) ∈ E, let c̄ij = cij − λi + λj . Then, for every feasible flow x 0 ,
 
X X X X X
0 0 0 0 
cij xij = cij xij − λi  xij − xji
(i,j)∈E (i,j)∈E i∈V j:(i,j)∈E j:(j,i)∈E
X
0
= c̄ij xij
(i,j)∈E
X X
> c̄ij mij + c̄ij mij
(i,j)∈E (i,j)∈E
c̄ij <0 c̄ij >0
X X
= c̄ij xij = cij xij
(i,j)∈E (i,j)∈E

The Lagrange multiplier λi is also referred to as a node number, or as a potential


associated with vertex i ∈ V. Since only the difference between pairs of Lagrange
multipliers appears in the optimality conditions, we can set λ1 = 0 without loss of
generality.

10.4 The Transportation Problem


An important special case of the minimum-cost flow problem is the transportation
problem, where we are given a set of suppliers i = 1, . . . , n producing si units of a good
P Pm
and a set of consumers j = 1, . . . , m with demands dj such that n i=1 si = j=1 dj .
The cost of transporting one unit of the good from supplier i to consumer j is cij ,
and the goal is to match supply and demand while minimizing overall transportation
cost. This can be formulated as an uncapacitated minimum-cost flow problem on a
bipartite network, i.e., a network G = (S ] C, E) with S = {1, . . . , n}, C = {1, . . . , m},
and E ⊆ S × C. As far as optimal solutions are concerned, edges not contained in E are
equivalent to edges with a very large cost. We can thus restrict our attention to the
case where E = S × C, known as the Hitchcock transportation problem:

X
n X
m
minimize cij xij
i=1 j=1

X
m
subject to xij = si for i = 1, . . . , n
j=1

X
n
xij = dj for j = 1, . . . , m
i=1

x > 0.

It turns out that the transportation problem already captures the full expressiveness
of the minimum-cost flow problem.
40 10 · The Minimum-Cost Flow Problem

P
i k:(i,k)∈E mik − bi
0
mij i, j cij
P
j k:(j,k)∈E mjk − bj

Figure 10.1: Representation of flow conservation constraints by an instance of the


transportation problem

Theorem 10.2. Every minimum-cost flow problem with finite capacities or non-
negative costs has an equivalent transportation problem.

Proof. Consider a minimum-cost flow problem for a network (V, E) and assume without
loss of generality that mij = 0 for all (i, j) ∈ E. If this is not the case, we can instead
consider the problem obtained by setting mij to zero, mij to mij − mij , and replacing
bi by bi − mij and bj by bj + mij . A solution with flow xij for the new problem
then corresponds to a solution with flow xij + mij for the original problem. We can
further assume that all capacities are finite: if some edge has infinite capacity but costs
are non-negative then setting the capacity of this edge to a large enough number, for
P
example i∈V |bi |, does not affect the optimal solution of the problem.
We now construct an instance of the transportation problem as follows. For every
P
vertex i ∈ V, we add a consumer with demand k mik − bi . For every edge (i, j) ∈ E,
we add a supplier with supply mij , an edge to vertex i with cost cij,j = 0, and an edge
to vertex j with cost cij,j = cij . The situation is shown in Figure 10.1.
We now claim that there exists a direct correspondence between feasible flows of the
two problems, and that these flows have the same costs. To see this, let the flows on
edges (ij, i) and (ij, j) be mij −xij and xij , respectively. The total flow into vertex i then
P P P
is k:(i,k)∈E (mik − xik ) + k:(k,i)∈E xki , which must be equal to k:(i,k)∈E mik − bi .
P P
This is the case if and only if bi + k:(k,i)∈E xki − k:(i,k)∈E xik = 0, which is the flow
conservation constraint for vertex i in the original problem.
11 The Transportation Algorithm
The particular structure of basic feasible solutions in the case of the transportation
problem gives rise to a special interpretation of the simplex method. This special form
is sometimes called the transportation algorithm.

11.1 Optimality Conditions


The Lagrangian of the transportation problem can be written as
X
n X X X X X
m n m
! m n
!
L(x, λ, µ) = cij xij + λ i si − xij − µ j dj − xij
i=1 j=1 i=1 j=1 j=1 i=1
Xn X m X
n X
m
= (cij − λi + µj )xij + λ i si − µ j dj ,
i=1 j=1 i=1 j=1

where λ ∈ Rn and µ ∈ Rm are Lagrange multipliers for the suppliers and consumers,
respectively. Subject to xij > 0, the Lagrangian has a finite minimum if and only if
cij − λi + µj > 0 for i = 1, . . . , n and j = 1, . . . , m,

and at the optimum,

(cij − λi + µj )xij = 0 for i = 1, . . . , n and j = 1, . . . , m.


Together with feasibility of x, these dual feasibility and complementary slackness con-
ditions are necessary and sufficient for optimality of x.
Note that the sign of the Lagrange multipliers can be chosen arbitrarily, and that
this choice determines the form of the optimality conditions. The above choice is
consistent with viewing demands as negative supplies.

11.2 The Simplex Method for the Transportation Problem


In solving instances of the transportation problem with the simplex method, a tableau
of the following form will be useful:
µ1 ··· µm
λ1 x11 ··· x1m s1
c11 ··· c1m
.. .. .. .. ..
. . .. . .. . .. .
. . .
λn xn1 ··· xnm sn
cn1 ··· cnm
d1 ··· dm

41
42 11 · The Transportation Algorithm

6−θ
6 1 6 1 6
θ
8 1 2 8 1 2+θ
3 2 5 3−θ 2 5
10 2 7 10 2 7
1 3 8 1 3 8
9 3 8 9 3 8
4 8 4 8

Figure 11.1: Initial basic feasible solution of an instance of the transportation problem
(left) and a cycle along which the overall cost can be decreased (right)

Consider for example the Hitchcock transportation problem with three suppliers
and four consumers given by the following tableau:

8
5 3 4 6
10
2 7 4 1
9
5 6 2 4
6 5 8 8

Finding an initial BFS


An initial BFS can be found by iteratively considering pairs (i, j) of supplier i and
consumer j, increasing xij until either the supply si or the demand dj is satisfied, and
moving to the next supplier in the former case or to the next consumer in the latter.
P P
Since i si = j dj , this process is guaranteed to find a feasible solution. If at some
intermediate step both supply and demand are satisfied at the same time, the resulting
solution is degenerate. In general, degeneracies occur when a subset of the consumers
can be satisfied exactly by a subset of the suppliers. In the example, we can start by
setting x11 = min{s1 , d1 } = 6, moving to consumer 2 and setting x12 = 2, moving to
supplier 2 and setting x22 = 3, and so forth. The resulting flows are shown on the left
of Figure 11.1.
Note that the initial BFS can be associated with a spanning tree (V, T ) of the flow
network where T is the set of edges visited by the above procedure. It then holds that
xij = 0 when (i, j) ∈/ T , and complementary slackness dictates that λi − µj = cij when
(i, j) ∈ T . By setting λ1 = 0, we obtain a system of n + m − 1 linear equalities with
n + m − 1 variables: each equality corresponds to an edge in T , each variable to a
vertex in (S \ {1}) ] C. This system of equalities has a unique solution, allowing us to
compute the values of the dual variables. We will see momentarily that every BFS can
be associated with a spanning tree in this way. To verify dual feasibility, it will finally
11.2 · The Simplex Method for the Transportation Problem 43

be convenient to write down λi − µj for (i, j) ∈ / T , and we do so in the upper right


corner of the respective cells. For our example, we obtain the following tableau:

−5 −3 0 −2
0 2
0 6 2 8
5 3 4 6
9 6
4 3 7 10
2 7 4 1
7 5
2 1 8 9
5 6 2 4
6 5 8 8

Pivoting
If cij > λi − µj for all (i, j) ∈
/ T , the current flow is optimal. Assume on the other hand
that dual feasibility is violated for some edge (i, j) ∈ / T , and observe that this edge and
the edges in T together form a unique cycle. In the absence of degeneracies the regional
constraints for edges in T are not tight, so we can push flow around this cycle in order
to increase xij and decrease the value of the Lagrangian. Due to the special structure
of the network, this will alternately increase and decrease the flow for edges along the
cycle until xi 0 j 0 becomes zero for some (i 0 , j 0 ) ∈ T . We thus obtain a new BFS, and a
new spanning tree in which (i 0 , j 0 ) has been replaced by (i, j).
In our example dual feasibility is violated, for example, for i = 2 and j = 1. Edge
(2, 1) forms a unique cycle with the spanning tree T , and we would like to increase x21
by pushing flow along this cycle. In particular, increasing x21 by θ will increase x12
and decrease x11 and x22 by the same amount. The situation is shown on the right of
Figure 11.1. If we increase x21 by the maximum amount of θ = 3 and re-compute the
values of the dual variables λ and µ, we obtain the following tableau:

−5 −3 −7 −9
7 9
0 3 5
5 3 4 6
0 6
−3 3 7
2 7 4 1
0 −2
−5 1 8
5 6 2 4
Now, c24 < λ2 − µ4 , and we can increase x24 by 7 to obtain the following tableau,
which satisfies cij > λi − µj for all (i, j) ∈
/ T and therefore yields an optimal solution:

−5 −3 −2 −4
2 4
0 3 5
5 3 4 6
0 −1
−3 3 7
2 7 4 1
5 3
0 8 1
5 6 2 4
44 11 · The Transportation Algorithm

Let us summarize what we have done:


1. Find an initial BFS, and let T be the edges of the corresponding spanning tree.
2. Choose λ and µ such that λ1 = 0 and cij − λi + µj = 0 for all (i, j) ∈ T .
3. If cij − λi + µj > 0 for all (i, j) ∈ E, the solution is optimal; stop.
4. Otherwise pick (i, j) ∈ E such that cij − λi + µj < 0, and push flow along the
unique cycle in (V, T ∪ {(i, j)}) until xi 0 j 0 = 0 for some edge (i 0 , j 0 ) in the cycle.
Set T to (T \ {(i 0 , j 0 )}) ∪ {(i, j)} and go to Step 2.
12 The Maximum Flow Problem

Consider a flow network (V, E) with a single source 1, a single sink n, and finite capac-
ities mij = Cij for all (i, j) ∈ E. We will also assume for convenience that mij = 0 for
all (i, j) ∈ E. The maximum flow problem then asks for the maximum amount of flow
that can be sent from vertex 1 to vertex n, i.e., the goal is to

maximize δ 

 if i = 1
X X  δ
subject to xij − xji = −δ if i = n

 (12.1)
j:(i,j)∈E j:(j,i)∈E  0 otherwise
0 6 xij 6 Cij for all (i, j) ∈ E.

This problem is in fact a special case of the minimum-cost flow problem. To see
this, set cij = 0 for all (i, j) ∈ E, and add an edge (n, 1) with infinite capacity and
cost cn1 = −1. Since the new edge (n, 1) has infinite capacity, any feasible flow of
the original network is also feasible for the new network. Cost is clearly minimized by
maximizing the flow across the edge (n, 1), which by the flow conservation constraints
for vertices 1 and n maximizes flow through the original network.

12.1 The Max-Flow Min-Cut Theorem

Consider a flow network G = (V, E) with capacities Cij for (i, j) ∈ E. A cut of G is a
partition of V into two sets, and the capacity of a cut is defined as the sum of capacities
of all edges across the partition. Formally, for S ⊆ V, the capacity of the cut (S, V \ S)
is given by
X
C(S) = Cij .
(i,j)∈S×(V\S)

Assume that x is a feasible flow vector that sends δ units of flow from vertex 1 to
vertex n. It is easy to see that δ is bounded from above by the capacity of any cut S
with 1 ∈ S and n ∈ V \ S. Indeed, for X, Y ⊆ V, let

X
fx (X, Y) = xij .
(i,j)∈E∩(X×Y)

45
46 12 · The Maximum Flow Problem

Then, for any S ⊆ V with 1 ∈ S and n ∈ V \ S,


 
X X X
δ=  xij − xji 
i∈S j:(i,j)∈E j:(j,i)∈E

= fx (S, V) − fx (V, S) (12.2)


= fx (S, S) + fx (S, V \ S) − fx (V \ S, S) − fx (S, S)
= fx (S, V \ S) − fx (V \ S, S)
6 fx (S, V \ S) 6 C(S).

The following result states that this upper bound in fact tight, i.e., that there exists
a flow of size equal to the minimum capacity of a cut that separates vertex 1 from
vertex n.

Theorem 12.1 (Max-flow min-cut theorem). Let δ be the optimal solution of (12.1)
for a network (V, E) with capacities Cij for all (i, j) ∈ E. Then,

δ = min { C(S) : S ⊆ V, 1 ∈ S, n ∈ V \ S } .

Proof. It remains to be shown that there exists a cut that separates vertex 1 from vertex
n and has capacity equal to δ. Consider a feasible flow vector x. A path v0 , v1 , . . . , vk is
called an augmenting path for x if xvi−1 vi < Cvi−1 vi or xvi vi−1 > 0 for every i = 1, . . . , k.
If there exists an augmenting path from vertex 1 to vertex n, then we can push flow
along the path, by increasing the flow on every forward edge and decreasing the flow
on every backward edge along the path by the same amount, such that all constraints
remain satisfied and the amount of flow from 1 to n increases.
Now assume that x is optimal, and let

S = {1} ∪ { i ∈ V : there exists an augmenting path for x from 1 to i }.

By optimality of x, n ∈ V \ S. Moreover,

δ = fx (S, V \ S) − fx (V \ S, S) = fx (S, V \ S) = C(S).

The first equality holds by (12.2). The second equality holds because xij = 0 for
every (i, j) ∈ E ∩ ((V \ S) × S). The third equality holds because xij = Cij for every
(i, j) ∈ E ∩ (S × (V \ S)).

12.2 The Ford-Fulkerson Algorithm


The Ford-Fulkerson algorithm attempts to find a maximum flow by repeatedly push-
ing flow along an augmenting path, until such a path can no longer be found:
1. Start with a feasible flow vector x.
2. If there is no augmenting path for x from 1 to n, stop.
12.3 · The Bipartite Matching Problem 47

a 1 b
5 1
4
s t
5 5
c 2 d

Figure 12.1: An instance of the maximum flow problem

3. Otherwise pick some augmenting path from 1 to n, and push a maximum amount
of flow along this path without violating any constraints. Then go to Step 2.
Consider for example the flow network in Figure 12.1. Pushing one unit of flow along
the path s, a, b, t, four units along the path s, a, d, t, and one more unit along the path
s, c, d, t yields a maximum flow, and the fact that this flow is optimal is witnessed by
the cut ({s, a, b, c, d}, {t}), which has capacity 6.
If all capacities are integral and if we start from an integral flow vector, e.g., the
flow vector x such that xij = 0 for all (i, j) ∈ E, then the Ford-Fulkerson algorithm
maintains integrality and increases the overall amount of flow by at least one unit in
each iteration. The algorithm is therefore guaranteed to find a maximum flow after
a finite number of iterations. Clearly, the latter also holds when all capacities are
rational.

12.3 The Bipartite Matching Problem


A matching of a graph (V, E) is a set of edges that do not share any vertices, i.e., a set
M ⊆ E such for all (s, t), (u, v) ∈ M, s 6= u and s 6= v. Matching M is called perfect
if it covers every vertex, i.e., if |M| = |V|/2. A graph is k-regular if every vertex has
degree k. Using flows it is easy to show that every k-regular bipartite graph, for k > 1,
has a perfect matching. For this, consider a k-regular bipartite graph (L ] R, E), orient
all edges from L to R, and add two new vertices s and t and new edges (s, i) and (j, t)
for every i ∈ L and j ∈ R. Finally set the capacity of every new edge to 1, and that of
every original edge to infinity. We can now send |L| units of flow from s to t by setting
the flow to 1 for every new edge and to 1/k for every original edge. The Ford-Fulkerson
algorithm is therefore guaranteed to find an integral solution with at least the same
value, and it is easy to see that such a solution corresponds to a perfect matching.
This result is a special case of a well-known characterization of the bipartite graphs
that have a perfect matching. It should not come as a surprise that this characterization
can be obtained from the max-flow min-cut theorem as well.
Theorem 12.2 (Hall’s Theorem). A bipartite graph G = (L ] R, E) with |L| = |R| has
a perfect matching if and only if |N(X)| > |X| for every X ⊆ L, where N(X) = {j ∈
R : i ∈ X, (i, j) ∈ E}.

You might also like